Conduction-cooling advancements complement ultra-compact servers in battle versus excessive heatStory
August 01, 2019
Servers in the defense arena are packing ever-more electronics, posing serious thermal-management challenges. Advancements in conduction cooling, however, can reduce heat and prevent throttling in these small-form-factor designs.
Battles are fought in the real world and with modern electronic warfare systems; the battle is also “fought” internally in systems against heat and its effects on electronics. This situation is especially true in rugged battlefield servers, which mandate high mean-time-between-failure (MTBF) compared to commercial data center equipment. Battlefield servers routinely experience extreme heat in deployed environments like the Middle East, Africa, and even Arizona.
As these high-performance battlefield servers become true mobile data centers with Intel’s latest scalable Xeon processors, it’s become even more challenging to remove the heat. In vehicles, on ships, on wide-body aircraft, and within quasi-fixed, behind-front-lines operation centers, heat can be a true server killer, throttling performance and putting lives at risk.
Removing or at least reducing heat from battlefield servers can be done in multiple ways, from blown air to liquid or conduction cooling. While versions of these approaches – or a combination of the three – have been around for decades, new enhancements offer more innovative conduction-cooled hybrid-style alternatives for servers on the scorching battlefield.
Air cooling by convection
Rackmount server system designers operating in places like command tents often try to use the same cooling techniques as those used in enterprise installations. Unfortunately, air-cooling (via convection) with fans and pure commercial-temperature components isn’t nearly as effective on the battlefield.
Internal design inefficiencies reduce air flow, requiring more air per square inch fed into the server. Individual component heat sinks each indiscriminately heat the air while simultaneously slowing the air’s velocity due to eddies, obstructions, and dust buildup. Upstream components can also inadvertently overheat downstream components. The answer is more air, and air-conditioned (cold) air needs to be fed into the server.
Battlefield “chillers” that cool the server inlet air work well but require some heavyweight logistics. The chiller units can be truck-mounted to cool both the operators and the servers – say, in a command post tent – or rackmounted to cool just the servers. Either way, mammoth generators are also needed to power the chillers. The generators need fuel and are noisy, complicating a force’s deployment logistics and stealth. All this to cool a rackmount server.
The fans used in pure convection-cooled servers are also loud, prone to failure, and transport airborne particles from outside the server chassis to the inside (Figure 1). A 1U server is only 1.75 inches tall yet must dissipate as much as 1.5 KW out the exhaust ports, equivalent to the heat of a hair dryer. Accordingly, the server’s small fans scream at high RPM to move the air, forcing operators near the server to wear ear protection.
Figure 1 | A typical rackmount server is densely packed with impediments to airflow and places for particles like dust and smoke to collect.
And if servers are used on the battlefield – in trucks, tents, or in an aircraft parked in a humid environment – the fans also deposit dust, dirt, talc, smoke, and even mold spores and corrosive airborne salt into the cramped chassis. As debris builds up, cooling performance goes down and heat increases. Bad news on the battlefield.
A case for liquid cooling?
Liquid cooling, in contrast, uses toxic (glycol-based), corrosive (salts), or inert liquids to move large amounts of heat through small pipes, card clamps, and hose connectors. As well, flammable liquids with high vapor pressure (such as alcohol) cool well and are low-cost – but a leak can be catastrophic.
While liquid cooling is used increasingly in radar, electronic warfare, and signals intelligence to cool exceptionally high-heat sensors and specialty electronics like transmitters and emitters, this approach has traditionally been very expensive, can be unreliable, and adds complex plumbing that doesn’t fit with small-form factor or low-profile systems like servers. Moreover, liquid cooling isn’t passive; it requires a pump to move the fluid plus a liquid-to-air exchanger with a fan, adding yet another potential failure point.
The latest deployed rugged server installations could benefit from a hybrid approach composed of some form of air, liquid, and/or conduction cooling. The hybrid approach can be more efficient and reliable in removing heat as designers try to fit as much power as possible into increasingly smaller rackmount systems that are not very deep and only 1U or 2U high.
Advances in conduction cooling solve the challenges
The most effective hybrid approach uses new-to-servers conduction-cooling techniques that conduct heat away from hot spots in the server to a central “radiator” core plenum that’s essentially a whole-system cooling plate. This method efficiently moves heat from all components at once and replaces processors’ heat sinks in three dimensions (up, across laterally, and out), and then exhausts hot air through the back of the system. The conduction-cooled approach is not unlike that of an air-cooled server, but air moves only through the large central plenum into which all the system’s heat is conducted. This approach differs from the use of individual heat sinks, in which every component is attached to a whole-system heat sink through or across which air or liquid moves. Typically, air is the preferred fluid for the reasons mentioned above.
A hybrid conduction-cooled system connects all hot spots with conduction plates, allowing heat to move through the cold plates from hot areas to cooler areas. Heat is then removed from the metal surfaces and dispersed in the ambient air at the system level. The primary difference is that the air or liquid is cooling not individual components and hot spots, but an entire cold-plate assembly for the whole system.
Every component, board, or subsystem is conductively cooled using the cold-plate mechanism, with heat conducted away from each component’s or subsystem’s heat sinks to the whole system’s combined heat sink assembly.
The benefits of the hybrid conduction-cooled server method are many:
- Carries out thermal management in multiple dimensions from top to bottom in the server design.
- Directly moves heat from high-wattage components like processors, GPUs, FPGAs, and specialty devices.
- Moves heat to a single, large central core plenum for air-to-conduction transfer.
- Optimizes the blown air and is less likely to clog than individual heat sink fins.
- Allows for a sealed system where only the central plenum is open to the environment, minimizing dust and moisture ingress and accentuating EMI mitigation.
Moving heat from components to cold plates
The hybrid conduction-cooled server works because heat sinks – essentially one continuous heat sink – are applied to all components, regardless of their location in the airflow. Heat is conducted through the system’s internal heat plate to the central air-to-conduction plenum.
The approach uses a corrugated alloy slug with an extremely low thermal resistance that acts as a heat spreader at the processor die; and once the heat is spread over a larger area, a special compound in a sealed chamber transfers the heat from the spreader to the internal cold plate. This system from General Micro Systems, called RuggedCool, has been used in high-heat, high-wattage small-form-factor systems for nearly 20 years (Figure 2). It works equally well in rackmount servers.
Figure 2 | A graphic view of how conduction cooling used in small-form-factor systems can work equally well in ultra-compact server designs. The cold plate on the bottom is now mounted internally to the server. Air is blown through the cold plate’s central air-to-conduction plenum (Image: General Micro Systems).
Unlike other cooling systems, which can’t conduct heat as efficiently and might have as much as 25 °C of temperature rise from the component to the cold plate, this way of cooling can lower the temperature delta to less than 10 °C from the CPU core to the cold plate. The less of a thermal “resistor” there is, the cooler the CPU will run without the thermal throttling that slows the CPU down to avoid damage from excess heat.
Processors without throttling = reliable servers
As stated above, using conduction cooling in rackmount servers enables components to operate at their maximum potential without throttling at higher temperatures. For example, the processor – whether a CPU, GPU, or FPGA – and an intelligent peripheral such as an Ethernet switch or a Thunderbolt 3 controller are typically the hottest parts of the system.
In other words, designers must ensure that servers operate below their maximum thermal design power (TDP) without any internal throttling of the processor or peripherals. Intel processors – designed not to exceed about 105 °C – start to throttle when they get close to their maximum temperature, a process that essentially puts the server into a “limp” mode. Here, the new conduction-cooling techniques can play a vital role in keeping the processors and peripherals running at maximum performance without throttling by enabling servers using Intel-based CPUs with a TjMax of 105 °C to operate in military environments at full operational load without throttling the processor.
Powerful CPUs in compact chassis
More innovative thermal management is becoming imperative for battlefield servers operating outside air-conditioned server rooms. Servers with fully sealed and conduction-cooled chassis can withstand the harsh rigors of the battlefield, have higher reliability and MTBF, and achieve superior EMI capability.
Applied to 19-inch rackmount servers, this hybrid approach to thermal management – focused on conduction cooling moving heat directly to the server’s mounting cold plate and processors – can enable the servers to run cooler, be more reliable, and avoid throttling. Now even the scorching heat of the battlefield shouldn’t slow the performance of these powerful servers when computing resources matter most.
Chris A. Ciufo is chief technology officer and VP of product marketing at General Micro Systems, Inc. Ciufo is a veteran of the semiconductor, COTS, and defense industries, where he has held engineering, marketing, and executive-level positions. He has published more than 100 technology-related articles. He holds a bachelor’s degree in EE/materials science and participates in defense industry organizations and consortia.
General Micro Systems