Taking the complexity out of PCI Express configuration to optimize HPEC system design

Story

April 18, 2017

Aaron Frank

Curtiss-Wright

Performance is all about eliminating bottlenecks to minimize latency and maximize throughput. Today's high-performance embedded computing (HPEC) systems integrate powerful processing subsystems, each of which might be a fully functional processing node needing to share data with other processing nodes. To maximize overall system performance requires the fastest, most efficient processor-to-processor data paths. With VPX, embedded systems moved away from the VMEbus shared parallel bus model.

Performance is all about eliminating bottlenecks to minimize latency and maximize throughput. Today’s high-performance embedded computing (HPEC) systems integrate powerful processing subsystems, each of which might be a fully functional processing node needing to share data with other processing nodes. To maximize overall system performance requires the fastest, most efficient processor-to-processor data paths. With VPX, embedded systems moved away from the VMEbus shared parallel bus model.

Compared to today’s serial fabric-based systems, VME performance suffers from low overall throughput. One cause is slow data bus transfer speeds. Another problem is the bus arbitration penalty that occurs when only one node can communicate at a time. For modern serial architecture-based systems like VPX, Ethernet can also be used to pass data from node to node. Switched Ethernet architectures enable nodes to communicate in parallel, all but eliminating the bottlenecks of a shared bus. However, processor speeds and capabilities today far outpace Ethernet speeds, making data paths and the CPU-intensive networking stack a key performance bottleneck yet again.

Other alternative fabrics, such as Serial RapidIO (SRIO) and InfiniBand, have their own limitations: Few silicon vendors support SRIO, which means that its adoption has been hindered by lack of software. Because no common software API [application programming interface] for SRIO has been widely adopted, system designers have typically needed to write their own custom software. Operating system support for SRIO has also been scarce, making SRIO device drivers also a custom development. For its part, InfiniBand also has limited appeal in deployable defense systems due to limited software support for real-time operating systems. It’s also hindered by the high cost of silicon devices and limited support from its single-source vendor.

Today, almost every contemporary processor uses the PCI Express (PCIe) bus as a high-speed interconnect for onboard peripherals. In most processing systems, the PCIe interface also offers the fastest data path to and from the processor. The PCIe interface supports several data rates: Gen1 interfaces run at 2.5 Gbps, with Gen2 interfaces doubling the data rate to 5.0 Gbps. Gen3 interfaces increase this speed to 8.0 Gbps, and by using a more efficient data-coding mechanism, the effective data-transfer rate becomes double that of Gen2. The development of the PCIe Gen4 standard is just about complete, and Gen4 devices will begin to ship from vendors later in 2017, again aiming at doubling performance.

Why use PCIe to bypass Ethernet or other fabric interface devices? Users report lower latency, increased throughput, and additional side benefits such as reduced power dissipation, increased MTBF [mean time between failures], and lower costs. Until recently, however, the benefits of using PCIe technology to support host-to-host communications required complex setup and the configuration of PCIe devices and switches. It also called for custom PCIe shared memory driver software, greatly diminishing its desirability.

Dolphin Interconnect Solutions, known by many for its StarFabric technology, has developed a solution with great promise for HPEC systems. Its eXpressWare software suite uses PCIe connections to create faster and more flexible message and processor-to-processor communications data-transfer mechanism. The software is optimized to take advantage of hardware features such as DMA [direct memory access], PCIe multicast, and multicore processing. The software hides the complexities of PCIe setup, which simplifies the setup and configuration of host-to-host architectures.

When supported with all the required PCIe switch configurations, the software can automatically detect and configure PCIe endpoints as transparent or nontransparent ports, set up message queues and data-transfer windows, and configure and manage data-transfer resources such as DMA engines. In addition, it comes with standard software API interfaces, which enables faster software application development with software paradigms already familiar to most software developers.

Dolphin’s eXpressWare enables HPEC system designers to exploit the highest levels of data fabric performance for the defense industry’s ruggedized equipment needs. To reap the benefits of using PCIe for node-to-node data transfers to embedded system designers, Curtiss-Wright has recently added support for eXpressWare to embedded Intel SBCs and DSP engines running both Linux and Wind River VxWorks operating systems, and have also extended that support to Power Architecture-based boards (Figure 1). Rugged embedded systems depend on high-performance fabrics to reduce latency in data-transfer times. PCIe offers today’s best solution for realizing low latency, high throughput processor-to-processor performance. By providing common software APIs and masking the complex details of programming PCIe devices, eXpressWare delivers a breakthrough for HPEC system designers and brings high-speed, low-latency, peer-to-peer communications to embedded hardware.

Figure 1: Curtiss-Wright’s VPX3-1258 and CHAMP-XD1 are examples of single-board computers that support PCIe.

(Click graphic to zoom by 1.9x)