Cool world: A tour of thermal-management approaches for rugged computer systems

Story

July 29, 2019

Jason Shields

Curtiss-Wright

What happens when a CPU gets too hot? Circuitry within the device runs slower, which can lead to poor system performance. The design of rugged mission-critical computer systems must consider thermal management as a system-level issue.

There are usually two levels of protection built into the chip to protect it from overheating. The first is a critical shutdown which, when triggered, will shut down the whole device to prevent physical damage. The second is throttling, where the processor’s clock is simply slowed down. Throttling, which is supported by Intel processors, typically occurs at a lower temperature threshold than shutdown.

For example, Intel core processors automatically throttle their performance based on the processor workload and their thermal environment. In theory, this is a good approach for cooling down a system that heats up after using increased amounts of power. In a mission-critical environment, however, a throttled processor is not desirable. For defense applications, such as electronic warfare (EW) and intelligence, surveillance and reconnaissance (ISR), where consistent, deterministic performance is required, processor throttling can adversely affect mission success.

Processor throttling (also sometimes called dynamic frequency scaling) is used in computer architectures to adjust the clock frequency, or instructions executed per unit of time, of a processor. Throttling back the clock frequency causes a processor to run more slowly, do less work, use less power, and as a consequence generate less heat. As the device’s operating clock gracefully slows down, the temperature goes down, preventing timing errors.

Keeping your cool

Thermal management for traditional servers, desktops, or laptops is fairly straightforward. Typically, system designers can come up with a combination of fans, heat sinks, heat pipe coolers, and other components that keep systems within a relatively cool operating range. Rugged military computers are different, however: The harsh conditions encountered by military platforms in the air, on the ground, or at sea preclude the use of many traditional cooling methods or require substantial changes and/or limitations.

For example, typical cooling fans work by exchanging the air inside the computer with the cooler, ambient air on the outside. But what if that ambient air is full of dust, humidity, salt fog, or smoke? All of these conditions are potentially harmful if introduced into the system. Consider missions that must operate in low-pressure zones (higher altitudes). Sometimes at higher altitudes there will not be enough air available to transfer heat sufficiently.

Each design challenge must consider the entire system, with components and solutions selected to best meet the requirements of the finished product.

For rugged applications, several thermal management techniques are often required to protect a system’s internal components.

Conduction cooling

Conduction cooling is defined as the transfer of heat through solids. A common example is a conduction-cooled chassis mounted onto a cold plate (Figure 1). Heat generated inside the chassis by the electronics flows into the aluminum sidewalls of the chassis and down into the cold plate. Since heat energy wants to move from the source to another medium that’s cooler, the heat is transferred from the chips to the lower-temperature cold plate.

Figure 1 | A conduction-cooled chassis transfers heat from the chips to the lower-temperature cold plate.

At the board level, conduction cooling is done by transferring heat from the components through a conduction frame to the card edge and to the “cold wall” of the chassis. To maximize the heat transfer from the components to the cold wall, it is important to minimize the thermal resistance of this path, which can be done by using materials with low thermal resistance and wedge locks with higher clamping force.

For many years, conduction cooling has been the mainstay of thermal management for rugged systems. Although it still plays a major role, there are limits to how much heat conduction cooling by itself can dissipate. Most traditional conduction-cooling methods are unable to disperse the heat generated by today’s hotter cards: Where once it was commonplace to have 50-watt cards, 120-watt to 200-watt cards are becoming more common.

Convection (air) cooling

Convection cooling uses airflow to transfer heat from a card into the ambient air. With this approach, the air must stay significantly cooler than the card for this approach to work effectively, because air is a poor coolant with low heat capacity. There are two basic types of convection cooling: one that relies on natural airflow and one that requires forced airflow via fans. At the board level, care needs to be taken to ensure that devices further downstream are cooled adequately. Air temperature will rise as it passes over the card, with the result that downstream devices are being cooled with hotter air.

Air-flow-through cooling

As cards require increasing amounts of power, traditional conductive or convective cooling methods become less viable. That’s where air-flow-through (AFT) cooling comes in. AFT technology uses a heat-exchanger frame, which prevents the cooling air from coming in contact with the electronics. On both the inlet and the exhaust sides of the card, a gasket mounted inside the chassis seals the card’s internal air passage to the chassis side walls (Figure 2). These seals prevent air from being blown into the chassis and protect the internal electronics from the harsh external environment.

Figure 2 | AFT technology uses a heat-exchanger frame.

For systems requiring high power densities, AFT cooling is one of the most reliable active cooling solutions. By providing a thermal path of low resistance, an AFT-cooled chassis can deliver cooling capacity of as much as 200 watts per slot, environmental sealing to accommodate the harshest environments, and cooling without exotic materials or fluids associated with liquid or evaporative cooling.

One of the benefits of AFT is that the cooling air is brought in very close proximity to the high-power components on both the base card and mezzanine cards, providing a direct path to the cooling ambient air. Since the air does not come in direct contact with the components, “dirty” air can be used.

Moreover, instead of cards having to share cooling air or share the thermal interface into which they conduct heat, each AFT card has its own inlet and its own exhaust. There is no other cooling path assumed, aside from the cooling air (although in reality there is a parallel conduction path). This setup enables every card to be viewed in isolation from a thermal standpoint. The critical aspect at the system level is to ensure balanced airflow through all of the cards, so that each card has the required amount of cooling air to keep components at their appropriate temperature.

Given the benefits of AFT in simplicity of design, weight efficiency, and low thermal resistance, AFT cooling technology is ideal for high-power applications such as sensor processing.

Liquid-flow-through cooling

For cards with power densities above 200 watts, a different cooling approach is necessary, as air isn’t the most efficient medium for transferring heat. That’s where liquid-flow-through (LFT) cooling comes in. While similar to AFT in some respects, LFT uses a liquid cooling frame (Figure 3) that employs inlet and outlet quick disconnect (QD) liquid connectors, with a liquid pump used at the chassis/system level. While this may add some weight, pumps require less power to operate than fans.

Figure 3 | Liquid-flow-through cooling uses inlet and outlet quick disconnect liquid connectors with a pump at the chassis/system level.

The cooling capacities of LFT are enhanced. For example, a common fluid used in LFT systems is polyalphaolefin (PAO) oil, which conducts heat five times more efficiently than air. This means that LFT-based systems can theoretically cool cards up to 1,000 watts. As processing devices continue to increase in power consumption, LFT may be required to maximize the usefulness of these devices.

Fluid-flow-through cooling

Curtiss-Wright has patented a system known as fluid-flow-through (FFT) cooling (Figure 4). In this case, the fluid can be either liquid or air. The main difference is that FFT uses fixed channels (air-cooled or liquid-cooled) that are built into the chassis. In addition, it uses conduction-cooled modules – conduction frames attached to printed wiring boards (PWB). FFT enables high net airflow volume, since no sealing is required, and supports the use of standard conduction cards.

Figure 4 | In the fluid-flow-through (FFT) system, the fluid can be either liquid or air.

A variety of thermal-management solutions exist (Table 1), each with varying suitability for different power levels. When designing a rugged computer system, it’s important to consider thermal management as a system-level issue, because focusing on thermal management at the card level fails to consider how all the modules installed together in a semi-closed system might affect each other.

Table 1 | The X422 relies on ML and AI to churn through reams of data and turn it into actionable intelligence. Photo courtesy of General Micro Systems.

Jason Shields is Acting Manager of the Advanced Systems Group at Curtiss-Wright Defense Solutions. He has been with C-W for nearly 12 years, during which he has led engineering and product teams.

Curtiss-Wright Defense Solutions
www.curtisswrightds.com