Network-centric performance: a key to mission-critical effectiveness

Story

October 25, 2022

Kelly Masood

Intilop

Today’s high-demand data environments depend on effective network-centric communications, which can determine mission success and determine the survivability of the warfighter. Conventional approaches have resulted in inefficient systems experiencing bottlenecks and stalled network traffic. A new generation of network acceleration systems is needed to relieve the pain.

Legacy architectures like TCP and UDP [transmission control protocol and user datagram protocol] are pervasive in computer networking. Historically, these protocols have demanded a heavy processing burden on host CPUs and been a key cause of network traffic bottlenecks. As network speeds have increased from 1G to 10G to 40G/100G bps and beyond, host processors continue to be burdened by bottlenecks that throttle critical network traffic resulting in stalled and inefficient systems. These bottlenecks can be detrimental to high-demand, contested environment, mission-critical applications. The problem gets even worse in a burst network traffic environment where servers and clients are separated by layers of switches and routers causing significant delay, jitter, and erratic behavior.

Industry attempts have been focused on relieving network congestion and improving throughput by implementing some functions of the TCP/IP protocol suite in communications controller chips like media access controllers (MACs), integrating CPUs or introducing an ASIC [applications-specific integrated circuit] on the adapter card (NIC) to run the stack. Most of these have been in the form of partial offloads and have resulted in incremental improvements in network utilization from 10% to 30%, which still leaves 60% to 70% unused capacity and efficiency.

Technical advantages in the current landscape

Technology has evolved which supports different approaches to tackle network-based challenges including partial offload capability, which improves performance primarily where TCP connections are held open for a considerable period of time. The CPU software still handles connection setup, retries, and exceptions, causing it to stall execution of other application tasks. It also enables the operating system to move all TCP/IP data segment of traffic to specialized hardware on the network adapter while leaving TCP/IP control decisions to the host server.

The two popular methods to reduce the CPU overhead include the TCP/IP checksum offload, a technique that moves the calculation of the TCP and IP checksum packets from the host CPU to the network adapter, resulting in a reduction in CPU utilization; or the large send offload (LSO), which frees the operating system from segmenting the application’s transmit data into MTU [maximum transmission unit]-sized chunks.

These techniques deliver performance benefits for traffic being sent, although offers little improvement for traffic being received.

Full offload capability

In contrast, consider full offload capability, an approach taken to offload all of the components of the TCP communications stack:

Improves efficiency and data integrity by handling all protocol and data processing-related tasks
Enables multiple concurrent sessions (from 4 to 1,000 or more), extending network bandwidth and serviceability
Minimizes network contention and interrupts, enabling emphasis on improved application I/O transaction performance
Creates network efficiencies by involving the host processor only once for every I/O transfer via memory, substantially reducing the number of requests with no interrupts to the CPU
Improves system performance by copying data directly from assigned buffers into application memory buffers, removing the three main causes of TCP/IP overhead: interrupt processing, memory copies, and protocol processing

This full offload architecture implements innovative approaches in different technology layers, delivering ultra-fast hardware search engine capability specially designed for efficient search of TCP states in a dynamic array; scalable depth/width of the search engine, enabling a greater number of “state fields” to be searched simultaneously, made possible by ultrawide processing paths of the field-programmable gate arrays (FPGAs); and highly parallel pipelined building blocks automatically scaled up or down, depending upon number of active TCP sessions. These approaches achieve no-jitter processing and data delivery.

Putting technology into practice

As industry experience verifies, CPU-centric silicon technology is unable to deliver sufficient CPU power to meet the most demanding network requirements, although this TCP/UDP acceleration technology implemented in high-performance FPGAs has been solving this problem for the last 13-plus years in commercial, industrial, and military applications.

Below are several defense and military application areas and use case examples. Target areas are the digital battlefield, network-centric warfare, ultra-fast mission-critical and precise communications among military command centers, theatre management, satellite base station-to-base station communications, urban warfare, and satellite-to-ground communications.

Real-world customer projects:

Multiple channels receiving large, complex data sets from many sources that need to send/receive real-time, mission-data imagery
Critical transfer of TCP/UDP communication data between ground stations, by multiple simultaneous channels with low levels of nanosecond latency and zero jitter
Transfer of TCP/UDP encrypted data between ground stations requiring network security and high-speed data delivery
Image-transfer applications that require real-time transfer of large images (> G bytes per image) at near 10G or 40G line rates
Ground stations supporting satellite systems distributing data and images live to an active, complex network

In today’s remote work environment where Zoom (or other remote-communication platforms) are used by hundreds of clients to share live data files, conventional TCP network architectures cause jitter and spikes in presentation. Using TCP offload solutions at the server level handling multiple ranges of client interactions can remove jitter and spikes completely, eliminating the incident of loss of data and misinterpretation. Now consider how critical that kind of network fidelity could be to a military operation.

These customer projects have experienced network acceleration performance gains of between five to 60 times over conventional approaches, depending on the complexity of the target server/client network.

Kelly Masood, president and CTO, is the founder of Intilop, a company that develops and provides advanced high-complexity network-acceleration and network-security solutions. Since 2009, Kelly and his team at Intilop have developed and deployed worldwide 11 generations of full TCP and UDP offload IPs and system solutions from 1 G bps to 100 G bps. His industry experience includes Lockheed Martin and also leading projects with companies including L3, General Dynamics, Lockheed Martin, AMD/Xilinx, and Intel/Altera. Readers may reach the author at kelly@intilop.com.

Intilop Corporation https://www.intilop.com