Data-centric architecture for military embedded systems

Story

October 27, 2009

Dr. Stan A. Schneider

Real Time Innovations (RTI)

Publish-subscribe middleware is the key technology that enables data-centric architecture. The DDS standard precisely defines how the middleware controls and transmits information. Successful applications include many shipboard and UAV systems.

Modern networked applications must connect and coordinate many resources. A typical distributed application integrates CPU-intensive servers, data stores, user interfaces, and real-time sensors and actuators. Each of these applications demands a different set of information, at varying rates, with varying urgency, and with varying reliability.

Getting the right data to the right place at the right time in this complex environment is perhaps the greatest integration challenge of the new networked reality. The “traditional” way to connect these disparate systems is to first examine the data requirements of each device or processing endpoint, and then design point-to-point interfaces between those devices. Once the relationship has been established, the data can be passed between those two devices, likely using direct messaging. If many devices need the same information, a client-server design allows distributed access.

However, this design assumes the network is relatively static, servers are always present and accessible, the server/client relationships are clear, clients know where and – most importantly – when to request data, and all nodes have similar delivery requirements. This “server-centric” design quickly breaks down as complexity grows.

Data-centric design offers an alternative. With data-centric design, developers specify only the data requirements, inputs, and outputs of each subsystem. Integration middleware discovers the producers and consumers of information and provides the data immediately when needed. This design greatly simplifies integration of systems with complex data requirements. Driven by the rapid adoption of the Object Management Group (OMG) Data Distribution Service (DDS) standard[1], many fielded systems – including shipboard systems like Aegis, UAVs like Scan Eagle, and base stations like the Advanced Ground Control Station for Predator – are adopting this DDS and data-centric design, empowered by publish-subscribe middleware.

Data-centric design

Instead of focusing on endpoint applications (devices, processing nodes, console applications) and how they individually interact, the data-centric approach begins the design process from an information perspective. What data does this application produce? What does it need? When? These questions decouple the implementation of the application from the other parts of the system. Data-centric design can greatly simplify system concepts.

Thus, a data-centric designer first defines an information model that captures the essential state and events in the system, then creates data input and output specifications, and then develops components that can produce and process that information. Rather than deriving specific data-interface requirements between components, the designer determines how to represent the state of the system and the external or internal events that can affect it. This “data model” captures the essential elements of the physical system as well as the processing logic. The model decouples applications; data can be provided by any (authorized) process and used by any other process. Applications must specify when and how they can supply information, but they do not need to know when or where that data might be used.

The enabling technology for data-centric design is publish-subscribe messaging, often shortened to “pub-sub.” In this model, data sources, or producers, publish data into and subscribe to data from an “information bus” or “cloud” (Figure 1). Note that although there is a concept of data “in the cloud,” it is a virtual concept. The actual data exists only in the publisher and subscriber endpoints. The pub-sub system connects endpoints by sending messages from the publishers to subscribers over a variety of transports, including direct-memory transfers, switched fabrics, or multicast or unicast over Ethernet. Transports, operating systems, and other location details do not need to be known, decoupling the design and allowing adaptation to performance, scalability, and fault-tolerance requirements.[2]

Figure 1: Data-centric design revolves around the information itself. The information model captures the essential state and events in the system. Components are then built to interact with the information model “cloud,” rather than with each other directly. The pub-sub infrastructure connects all the pieces.

(Click graphic to zoom by 1.9x)

On a ship, for example, a GPS receiver can publish position data. Navigation computers and targeting systems can all subscribe to the GPS location data. The publications go directly from the GPS to the targeting system, even though both conceptually simply publish or subscribe to the “cloud.”

By converting data within the middleware, publish-subscribe models can also connect systems with unmatched data formats. By enforcing quality of service parameters such as timing specifications and buffering, pub-sub models can connect systems with disparate delivery requirements, even trading off delivery reliability with timing constraints.

But the most important advantage of the publish-subscribe approach is decoupling. Since only the data interactions are specified, devices can be upgraded or added without the need to change code and exhaustively retest every configuration. If new data is available on the network, other devices might require additional code to make use of that data, but in practice this is significantly simpler than modifying and testing a large number of specific point-to-point connections.

Decoupling also makes distributed applications highly scalable. Because there are no fragile point-to-point data connections and devices can be added with little or no change to underlying code, expanding the application to include a larger network with more endpoints is simplified.

Designing with data-centric principles

Data-centricity provides a guide for how to design distributed applications in general. Many system architects of distributed applications today use procedural or object-oriented principles in creating the fundamental design, often using UML sequence, class, and state diagrams.

These design methodologies tend to treat transporting and consuming data as second-class citizens, focusing instead on the step-by-step processes by which devices make computations and produce actionable results. A data-oriented methodology, on the other hand, focuses on the flow of data through the application.

In general, the tenets of data-oriented programming include the following principles:

Expose the data. Ensure that the data is visible throughout the entire system. Hiding the data makes it difficult for new processing endpoints to identify data needs and gain access to that data.
Hide the code. Conversely, there is no reason for any of the computational endpoints to be cognizant of one another’s code. By abstracting the code, data is free to be used by any process, no matter where it was generated. This allows for data sharing across the distributed application and for the application to be modified and enhanced during its life cycle.
Separate data and code into data-handling and data-processing components. Data handling is required because of differing data formats, persistence, and timeliness, and is likely to change during the application life cycle. Conversely, data processing requirements are likely to remain much more stable. By separating the two, the application becomes easier to maintain and modify over time.
Clarify data interfaces. Interfaces define the data inputs and outputs of a given process. Having well-defined inputs and outputs makes it possible to understand and even automate the interfaces to data processing code. By specifying exactly how and when components produce or consume information, new applications can join and interact without impacting the system function.
Loosely couple all modules. With well-defined interfaces and abstracted computational processes, devices and their computation can be changed with little or no impact on the distributed application as a whole.

These and other principles are summarized in Table 1, along with a comparison with object-oriented development tenets.

Table 1: A comparison of data-oriented programming with object-oriented programming. The data-oriented approach enforces attention on the data rather than on the processes that manipulate the data.

(Click graphic to zoom by 1.9x)

A data-centric approach simplifies design

Data-centric architecture decouples designs. It simplifies communication while increasing capability and easing system evolution. It especially simplifies developing distributed applications with complex components on remote network nodes, integrating new functionality into those applications and maintaining and changing the components independently.

Data-centric design requires a change in perspective. Rather than conceiving the system primarily as a collection of interacting programs, the system must be first regarded as an information model that supports applications that contribute and use information. Networking middleware enables this view by providing information anonymously and consistently with data delivery properties and (importantly) timing. The OMG Data Distribution Service networking standard is serving as the catalyst for adoption of this approach by many fielded systems including shipboard systems, aircraft, and UAV base stations.

References

Gerardo Pardo-Castellote. “OMG Data-Distribution Service: Architectural Overview,” Proceedings of the 23rd International Conference on Distributed Computing Systems, May 2003.
Darby Mitchell. “Applying Publish-Subscribe to Communications-on-the-Move Node Control,” MIT Journal, Volume 16, Number 2, 2007.

Dr. Stan A. Schneider is the CEO of RTI. His expertise is in architectures and tools for real-time systems. Before RTI, Stan managed a Stanford laboratory’s intelligent mechanical systems, developed communication and computer systems, and researched automotive safety. He holds a PhD in EE/CS from Stanford and an MSEE and BS from the University of Michigan. He can be reached at stan@rti.com.

Real-Time Innovations, Inc. 408-990-7415

www.rti.com