Multicore processor-based 3U architectures reduce SWaP for UAS ISR platformsStory
April 16, 2015
There is a never-ending pursuit of reduced size, weight, and power (SWaP) in intelligence, surveillance, and reconnaissance (ISR) processing. Traditionally, what's been seen in the ISR application space: Complex, specialty-built systems which tended to be very large, very high-powered, and difficult to cool.
Many of these systems were ground-based; over time, a larger number have moved into the air and into a variety of different platforms such as unmanned aircraft systems (UASs). Today's systems need to be deployed on smaller and smaller platforms, making it even more challenging to meet the SWaP targets of traditional ISR platforms.
Another emerging trend is that technical requirements for ISR systems continue to grow and get increasingly complex even as the platforms get smaller. For example, UAS-based imaging systems require higher resolutions and a higher level of processing than in the past, demanding higher throughputs to ingest and analyze data, then store and relay information. The same trend is true for other ISR applications. Combine these factors with ever-shrinking defense budgets and issues such as sequestration, and the result is increased pressure to find ways of cost-effectively reusing and developing IP to leverage existing technologies and architectures.
Today’s higher levels of integration and improved performance are helping to address these requirements. Yesterday’s ISR systems typically contained many different 6U processing boards, each with one or more single-core processor. In the past, the main processing workhorses found in these systems were Power Architecture processors supported with Altivec math libraries, as well as custom DSP-based and FPGA-based systems. Recently the industry has seen a move toward using a common computing platform to provide many of these functions.
These earlier 6U-based multiprocessing ISR systems also tended to use a number of fabrics to tie them together, such as Ethernet, Serial RapidIO (SRIO), VMEbus, PCI, and others. On the software side, a variety of different operating environments have been used, including VxWorks and Linux, with many different application programming interfaces (APIs) for application development. In addition, custom libraries were often tailored to the specific hardware. For these earlier systems, thermal management was a challenge: The more boards and hardware required to get the job done, the more power was required, the more heat generated, and the heavier the solution.
Today, looking at the latest generation of processors from leading vendors such as Intel, Freescale, and AMD, the industry sees the integration of multiple processing cores into a single piece of silicon. Today the “sweet spot” tends to be two-core and four-core processors that are driven by today’s desktop and laptop computing world, but we also see the emergence of processors with many more cores, such as the eight-core hyperthreading Xeon processors from Intel, or Freescale’s 12-core dual-thread processors (with 24 “virtual” core processors) integrated onto a single piece of silicon. The core-to-core interconnect fabric is typically a high-speed bus with multiple levels of processor cache to ensure that the processors are not starved when they share common memory interfaces. With multiple memory channels, the processors are able to access massive amounts of data and feed the processing cores.
Today’s processing cores also have specialized accelerators that can be used for math-intensive processes. For example, Power Architecture processors again feature Altivec processing engines, whereas Intel’s Core i7s feature vector-processing engines using AVX or AVX2 instructions. Even better, current-generation devices also have onboard graphics accelerators, which can also be used as general-purpose GPU (GPGPU) processors offering in excess of 350 GFLOPS of floating-point performance, and upwards of 20 or more GPGPU engines, all on a single piece of silicon. Common to all of these modern processors is the use of PCIe to provide connectivity to the outside world.
A 3U processing board based on today’s leading processing architectures can deliver performance in the range of 173,000 Dhrystone MIPS. This is compared to a previous generation common 6U processor, which might deliver approximately 3,000 MIPS, representing a huge leap in performance. Today’s Core i7 3U boards, for example, also feature built-in Altivec2 processing engines that provide roughly 300 GFLOPS. Their on-chip graphics processing GPU also adds another 350 GFLOPS. The result is that the processing power in just one 3U single-board computer (SBC) is now equivalent to many previous generation 6U boards. What’s more, for board-to-board data communication, the built-in PCIe connectivity on today’s silicon benefits SWaP reduction by providing 8 or 16 Gigabytes/s connectivity using a fabric already built into the device. This eliminates the need for extra fabric interface chips that would otherwise need to fit onto the 3U board.
Example of a small ISR system
An example of a small ISR system based on today’s multicore processor-based 3U boards consists of three 3U VPX3-1258 SBCs powered with the latest fourth-generation Intel Core i7 processors. Using XMC mezzanine modules, two of these boards are used to acquire and digitize sensor inputs, while the third SBC is used to further process the data for analysis, display, and storage. This three-board combination delivers close to 2 TFLOPS of floating-point performance, and because the boards are based on the standard VPX form factor the entire signal-acquisition and processing core occupies an area of only 75 square inches. Thermal management is also much simpler compared to an earlier-generation 6U ISR solution. The power of this example system is less than 200 W, quite manageable in a 3U form factor.
One of the reasons that thermal design of 3U deployable systems is easier compared to 6U-based systems is the proximity of the circuitry to each of the two side cooling walls in the 3U chassis. By using multiple SBCs, each with its own multicore processor, the heat can be better managed across multiple modules in the chassis. By using high-performance fabrics that are already built into the processors (such as PCIe) and optimized software middleware (such as shared memory drivers, OFED, and VISPL libraries for optimized vector processing) it’s now possible to achieve the performance and the ease of development and affordability needed to rapidly develop and deploy today’s demanding ISR applications.
Speeding development to deployment of 3U VPX ISR systems
After selecting the 3U COTS boards to use in an ISR system, one of the biggest challenges that system integrators face in getting their solution from development to deployment is ensuring that the boards will work as intended in a specific design configuration. As a result, integrators must typically focus significant time and effort on developing and executing test software and processes to properly integrate any commercial off-the-shelf (COTS) board into their ISR system.
Figure 1: The Curtiss-Wright MPMC-9351 integrated system, designed for harsh military environments, accommodates high-power 3U cards within a five-slot forced-air enclosure.
(Click graphic to zoom)
A better solution is provided by preconfigured, prepackaged, and pretested 3U VPX ISR solutions. (Figure 1.) Curtiss-Wright Defense Solutions has developed integrated, pretested reference designs that are backed by test support tools and data items and that can be used in existing development programs for a variety of computer-intensive applications, including ISR. Its 3U VPX subsystems have already been deployed on UAVs, including Northrop Grumman’s Global Hawk and Triton. These reference designs are engineered to meet specific key performance parameters (KPPs) and benchmarks and are supported with a suite of software tools for performance testing of the reference designs against program requirements. The key features of this embedded software infrastructure also include a system-level built-in test (BIT) solution, a configurable stress test suite, hardware-based background BIT, and a common test set infrastructure.
Jacob Sealander is Chief Architect, Embedded Systems, at Curtiss-Wright Controls Defense Solutions. He has worked at Curtiss-Wright since 1996 in various design, engineering, and management positions including Engineering Manager of Embedded Systems, Mechanical Engineering Manager, and Manager of Product Line Engineering. Sealander can be reached at [email protected].
Aaron Frank, who joined Curtiss-Wright in January 2010, is the Senior Product Manager for the Intel Single Board Computer product line. His focus includes product development and marketing strategies, technology roadmaps, and serving as a subject matter expert within the sales team. Previous to this role, Frank held the Product Manager role for networking products. He has a bachelor’s of science degree in Electrical Engineering from the University of Waterloo. Readers can reach him at [email protected].