Battlefield servers demand design for heat, spares, and application portabilityStory
November 24, 2017
Freezing conditions at 40,000 feet; scorching heat in the Middle East. These are the world's battlefields for rugged servers.
Today’s battlefield environments range from land to sea to air, but even in these diverse domains, three key design elements are necessary to keep battlefield servers operational where lives are at stake.
First, high-performance servers require innovative heat management to achieve maximum system performance without CPU throttling, even in the hottest desert conditions or in cramped, sealed environments. Second, reliability is critical, making line replacement units (LRUs) a key element that must be considered at the early design stage. Finally, application code reuse requires a modular approach so that the same application software is portable across many different server types based upon the installation as a way of saving money.
Heat is a server killer
On land, battlefield servers are commonly deployed in fixed-building platforms such as CONUS [Continental United States]; in tents or trailers used for quasi-fixed, behind-front-lines operations centers; or in mobile vehicles such as Humvees, MRAPs [mine-resistant ambush-protected vehicles], or Strykers. But heat is a killer for servers, especially in today’s desert battlefields. Commercial temperature components operate at 0 °C to 70 °C and their performance suffers (or results in failure) when they get too hot. When Intel processors get close to their maximum 100 °C temperature, the CPU throttles, slowing down the clock to lower the workload and the device temperature. When this happens, the server slows down, its performance suffers, and under battlefield conditions, the slowdown could result in loss of life.
In buildings, tents, and trailers, servers are typically air-cooled (via convection) rackmount equipment, 19 inches wide and stacked with other gear such as RAID [redundant array of independent disks] drives, power supplies, Ethernet switches, and sometimes rackmount radios. Convection servers use fans and pure commercial-temperature components. These are often the same equipment used in enterprise installations, which work in an air-conditioned server room (Figure 1) but not in burned-out battlefield command post buildings or mobile operations tents. In these locations, large, portable air conditioners are required to keep servers operating without overheating.
Figure 1: A modern air-conditioned server room is dramatically different than a battlefield tent. (Image via Wiki Commons, courtesy of CSIRO.)
When rackmount servers operate without air conditioners, throttling can only be avoided with efficient air flow across the system and by effectively moving heat from components such as the processors onto the heat sinks. An effective approach is to use two hot-swappable fan tray assemblies that each contain six independently controlled fans. At 10,000 rpm per fan, hundreds of CFM [cubic feet per minute] are available to the entire 19-inch chassis to keep the system cool. To get the air to the heat sinks requires a very large assembly – as much as the full surface of a 6U VPX motherboard – plus vertical fins (Figure 2).
Figure 2: Close-up of dual-processor server heat sink. This OpenVPX-based motherboard in the GMS S2U “King Cobra” server is engineered with wider fins for cooler inlet air, and narrow fins for more cooling when the air is warmer.
An additional approach uses one set of fans to push air across the heat sink assembly, while a second set pulls air out of another part of the system, intermixing cooler inlet air to counterbalance the warmer air moving across the heat sink. Individual fan control can be used to monitor multiple in-system temperature sensors so air flow can be tuned for maximum cooling.
In vehicles, servers might be rackmounted and installed in suitcase-like transit cases, but increasingly they are conduction-cooled small-form-factor (SFF) sealed chassis that are more robust and purpose-built to handle Xeon-class workloads. These systems may require new cooling technologies, such as the use of a viscous metallic bath in which the processor’s contact slug sits, creating a very low thermal path from the hot processor package to the final air-cooled heat sink. The result is less than a 10-degree heat rise from the hot die to the heat sink (Figure 3). This efficient thermal path means that more than 90 percent of the heat from the processor makes it to the heat sink and into the air stream, which makes it useful for an air-cooled server on a battlefield without air conditioning. For conduction-cooled battlefield servers, this technology moves heat directly to the box’s mounting cold plate.
Figure 3: A heat sink can maintain less than 10-degree heat rise between the hot CPU and the cooling plate.
Spares can be a logistical nightmare
Battlefield servers have unique requirements in other areas besides environmental. One is reliability: For rackmount servers, the ability to quickly replace a module due to failure or for an upgrade drives the need for modularity and hot-swap line replacement units (LRUs). Every module of the system – from power supply and fan assemblies to VPX-based motherboard and drive assemblies – must be replaceable in seconds. This is the downfall of typical commercial off-the-shelf (COTS) 1U or 2U servers: If there’s a failure, the entire server must be replaced.
One of the biggest users of commercial rackmount servers is the U.S. Navy because the air-conditioned shipboard environment is typically tolerant of commercial equipment. Ships stockpile large volumes of new, brand-name servers that are constantly deployed as spares for the servers that are widely deployed on the ship.
In contrast, battlefield-rugged servers bring higher mean time between failures (MTBF) and can operate longer in environments that experience extreme heat, moisture, shock, and vibration, while remaining competitive with commercial server costs. A modular design for these purpose-built servers enables anything in the system to be swapped out on the battlefield, underway on board the ship, or in the air on a reconnaissance mission. This is particularly important in a submarine, for example, where carrying a few replacement modules is far more practical than hauling around a large quantity of spare servers.
Application code reuse across platforms
Many large defense contractors have multiplatform systems, such as a command module with moving maps, sensor fusion, and database retrieval that overlays data on the unfolding mission scenario. This command system may reside in an air transport rack (ATR) or vetronics chassis mounted in an armored vehicle or widebody aircraft, could be in an air-cooled rack on a ship, or may need to be shoehorned into an SFF system on a multimission ground vehicle.
The same application software must be portable across many different server types, so the customer merely chooses the format of the server based upon the installation. That choice requires rugged servers to be code-compatible within the same processor family using a computer-on-module (COM) engine that houses the processor or processors subsystem, such as an Intel Xeon E5, Xeon D, or future processor types. The engine is the same, whether used in a VPX server blade, a SFF conduction-cooled chassis, an air-cooled 19-inch rackmount, or even sandwiched into a smart-panel PC display.
Purpose-built battlefield servers
Battlefield servers are being designed into a wide range of demanding defense applications. These include forward-deployed operations centers mobile tatical command posts; vehicle-mounted network infrastructure for semipermanent battlefield operations; shipboard systems; widebody command, control, communications, computers, intelligence, surveillance, and reconnaissance (C4ISR) and electronic warfare platforms; and airborne command infrastructure that links to onboard and SATCOM [satellite communication] networks. For each of these diverse domains, a modular, purpose-built design approach ensures operational success for systems where lives are at stake. Key design considerations include innovative approaches for heat dissipation, modular spares to ensure system reliability, and application portability for multiplatform systems.
Chris A. Ciufo is chief technology officer and VP of product marketing at General Micro Systems, Inc. Ciufo is a veteran of the semiconductor, COTS, and defense industries, where he has held engineering, marketing, and executive-level positions. He has published more than 100 technology-related articles. He holds a bachelor’s degree in EE/materials science and participates in defense industry organizations and consortia.
General Micro Systems www.gms4sbc.com