Building trust in a model-based automatic code generator

Story

August 16, 2016

S. Tucker Taft

AdaCore

How do you go about building trust in an automatic code generator used for safety-critical systems? For example, given a code generator that takes a real-time model for a flight control system represented in Simulink and Stateflow and turns it into MISRA C or the SPARK subset of Ada, what process could ensure that the [...]

How do you go about building trust in an automatic code generator used for safety-critical systems? For example, given a code generator that takes a real-time model for a flight control system represented in Simulink and Stateflow and turns it into MISRA C or the SPARK subset of Ada, what process could ensure that the generated code is a faithful representation of the original real-time model? The US Federal Aviation Administration (FAA) has a well-defined process for creating a qualified code generator, meaning a code generator whose output can be trusted to match exactly the semantics of the input model, with nothing left out, and nothing added. This process is defined in DO-178C (Software Considerations in Airborne Systems), and its accompanying documents DO-330 (Software Tool Qualification Considerations) and DO-331 (Model-Based Development and Verification).

For a tool like a code generator, which could insert an error into an airborne system, the highest level of tool qualification (Tool Qualification Level 1 (TQL-1)) is required if the tool is to be used for a subsystem whose failure could be catastrophic (a level-A subsystem).

Not surprisingly, this level of tool qualification can involve a great deal of time and effort, often estimated in the hundreds of hours per 1,000 source lines of code (KSLOC) of the tool. This is similar to the level of effort per line required for verifying a level-A, safety-critical embedded software component. But tools can be significantly more lines of code. For example, if the tool were 100 KSLOC, the traditional approach to verification at level A might cost in the ballpark of $5 million. Hence, there is a strong incentive to investigate alternative approaches to testing such a tool, while still achieving the TQL-1 objectives.

Traditional approaches to testing

The traditional approach to verifying a high-integrity application requires the tester to:

Carefully define and validate a set of high-level requirements for the application
Derive module-level requirements, which are specific enough to determine the appropriate implementation, from the high-level requirements
Check each module of the implementation against its low-level requirements using unit testing
Perform integration-level testing of all high-level requirements

Coverage analysis is then performed to ensure that all code is covered by these tests, and to ensure there is no code remaining in the application that might provide extra, undesired functionality.

For an embedded software component, this combination of unit-level testing of each module and integration-level testing of the component as a whole can work well. In particular, unit testing of embedded software modules is practical because, in many cases, the number and complexity of inputs for each module are manageable and the outputs are relatively easily identified and checked. However, for a tool like an automatic code generator, which generally involves multiple phases involving progressive transformation of the input model into the generated code, unit testing can be a real challenge. On the other hand, integration testing is not significantly harder for such a tool, as the number of intermediate phases does not affect the overall inputs and outputs of the tool.

This dichotomy between the complexity of unit testing and the relative ease of integration testing of a multi-phase tool like a code generator is illustrated in Figure 1.

[Figure 1 | Integration testing is preferred to unit testing thanks to ease of use.]

In Figure 1, we show the overall data flow of an optimizing automatic code generator, where the input model is referred to as the “User Language” and the output is referred to as the “Source Code.” Multiple phases are pipelined, with the first phase reading in the original model represented in the User Language (M0), and representing the model in some internal data structure (M1). This is then transformed into lower level representations of the model (M2, M3, etc.), until the final phase produces actual Source Code in the desired programming language. To perform integration testing, one only need prepare a model represented in the User Language using the normal model creation tools, feed it through the code generator, and then examine the generated Source Code to determine whether it satisfies the high-level requirements in terms of form and functionality, using normal compilers, static analysis, and testing tools for that programming language.

By contrast, performing unit testing of each phase of a multi-phase code generator is significantly more complex. An internal data structure must be constructed for each test of a given phase that conforms to the representation used for input to that phase, then the phase needs to be invoked on that input, then the output representation must be checked to see whether it has the expected form and content. Preparing such inputs and checking such outputs requires laborious manual processes or the creation of special tools, which might need qualification themselves.

Integrated unit testing

Given the complexity of unit testing, an alternative approach has been developed called integrated unit testing. Figure 2 illustrates this approach:

[Figure 2 | The integrated unit testing approach is a simpler alternative to unit testing]

In Figure 2, we show a process that embeds unit test requirement monitors and unit test oracles (a checker that “knows” what is the desired output), directly into the structure of the tool. With these monitors and checkers embedded in the tool, we then follow the steps used for normal integration testing, preparing representative models (Test0 through Test4) and feeding them through the code generator. But now, rather than merely waiting for the tool to generate the final output, each embedded unit-test requirement monitor keeps track of whether an input to its associated phases matches its associated unit test, If it does match, it logs that fact and then triggers a corresponding unit test oracle-based checker, which verifies that the output of the phase corresponds to the expected transformations of the input for the particular test pattern.

For example, imagine we have defined a particular transformation of a gain block at the model level into an expression at the code level that multiplies the value of a signal variable by a constant. We would have a unit-test-requirement monitor logging every time a gain block shows up in its model-level input representation, and when it does, trigger the oracle-based checker to look at the code-level output representation to be sure it involves a multiplication of the appropriate signal variable by the appropriate constant. This is a very simple check to perform, and so long as enough models are passed through the tool as a whole, coverage of this particular unit-test pattern can be expected.

After running a number of models through the tool, we can end up with a table like the one in Figure 2. Along the left side we have the models, Test0 to Test4. Along the top we have the pairs of test requirement and test oracle for each distinct phase of the tool. For example, tr0,2 means the test requirement 2 for phase 0, while to2,1 means the test oracle 1 for phase 2. Each time a particular input to a phase satisfies the test pattern associated with some test requirement, we will see a SAT in the requirement’s column at the input model’s row. Each time a test oracle is invoked we will see either a PASS or FAIL in the oracle’s column at the input model’s row. If we end up with an empty column, the test pattern was never encountered (the corresponding low-level requirement was not covered). If we end up with a FAIL in a test-oracle column, that means we have a test failure (the corresponding low-level requirement was not properly implemented). In the table represented in Figure 2, we see that tr0,1 and tr2,0 were not covered, while to0,2 and to2,1 had failures. Such a table documents a thorough unit testing process while avoiding the expense of preparing special inputs for each test pattern.

A trusted code generator

Building trust in a code generator is essential if we are going to rely more and more on such tools to help automate the generation of safety-critical software from higher-level models. However, innovative approaches are needed to manage the potentially prohibitive expense of achieving tool qualification for a modern, optimizing code generator at the highest level of trust, TQL-1. Integrated unit testing is one such approach. When combined with other systematic approaches for specifying requirements formally, and generating components such as requirement monitors and oracles from these requirements, it becomes possible to achieve TQL-1 in a way that not only is more cost effective, but also supports incremental qualification as the tool evolves. AdaCore is in the process of qualifying its QGen code generator using these approaches, and thereby providing the model-based development community with a new tool that can be a trusted part of an overall high-integrity, software-intensive system engineering process.

S. Tucker Taft if the Vice President and Director of Language Research at AdaCore.

AdaCore
www.adacore.com
www.linkedin.com/company/adacore
@AdaCoreCompany
www.youtube.com/user/adacore05