Domain-specific property checking with advanced static analysis

Story

February 17, 2011

Paul Anderson

GrammaTech

Advanced static analysis tools are well known for being good at finding generic defects in programs. They can also be extended by end users to find domain-specific errors unique to a particular application, thanks to customized property checkers.

Advanced static analysis tools use sophisticated whole-program, path-sensitive techniques to find deep semantic problems including safety defects and security vulnerabilities. Several of the defects identified by these tools can be found in the CWE/SANS Top 25 most dangerous programming errors (http://cwe.mitre.org/top25), which lists the most important bugs to avoid in order to reduce security risk.

Specifically, out of the box, advanced static analysis tools can find generic programming defects. A typical generic report from a static analysis tool might show warnings such as uninitialized variables, buffer overruns, unreachable conditionals, and unreachable calls, among many others; however, not all security vulnerabilities or safety bugs are caused by instances of generic problems. Many defects are unique to a particular application.

An often underappreciated aspect of these tools is that they are extensible, so they can often be configured or programmed to find violations of domain-specific rules, too. So if a development team has its own internal rules for using a proprietary API, or requires programmers to use a particular idiom, then it is often possible to write a checker that signals violations of those rules. Hence, programmers can, often with little programming effort, dramatically increase the value they get from the tools.

A common use case is when a bug occurs and triggers a long and painful diagnosis and debugging period. Once the defect has been located, the wise action is to first find other places in the code where the error was repeated, then to take steps to detect the bug if it reoccurs. A custom check to be run on both existing code and new code can be an effective, low-cost way to achieve this. How these tools work and how their checkers can be customized is described using code snippets from CodeSonar, though the principles can be utilized with other advanced static analysis tools, too.

How advanced static analysis works

To understand the different ways in which extensions can be written, it first helps to know what static analysis is, and how static analysis tools work under the hood.

Static analysis tools work much like compilers. They take source code as input, which they then parse and convert to an Intermediate Representation (IR). A block diagram of the architecture of an advanced static analysis tool is shown in Figure 1. Whereas a compiler would use the IR to generate object code, static analysis tools retain the IR, and checkers are usually implemented by traversing or querying the IR, looking for particular properties or patterns that indicate defects.

Figure 1: The architecture of an advanced static analysis tool

(click graphic to zoom by 1.9x)

The advanced tools get their power from sophisticated symbolic execution techniques that explore paths through the control-flow graph. These algorithms keep track of the abstract state of the program and know how to use that state to exclude consideration of infeasible paths.

Checkers for domain-specific properties can access these representations and exploit the analysis algorithms in various ways.

Custom checkers

How does an end user author a custom property checker? This depends strongly on the nature of the property, so several mechanisms are available:

Existing checkers can be extended by adding directives to a configuration file.
The user can add annotations to their code that instruct the analysis to look for certain properties. These annotations can be done on the side in an aspect-oriented fashion if users do not wish to perturb the source code.
An API allows users access to all of the intermediate representations. Typically, a visitor pattern is used that allows extensions to exploit traversals the analysis is already doing.

Let’s take a closer look at these mechanisms.

Configuration files

These advanced static analysis tools implement dozens of checkers. Often, a user needs a checker that is only slightly different from a built-in one, and many of them have been designed to be extensible. One class of checkers finds functions whose use is forbidden. For example, the C library function gets is notoriously insecure. The checker is implemented by a phase that finds references to function names, and then matches them against a set of regular expressions. It is a simple matter to add additional regular expressions to this set by adding lines to a configuration file.

Code annotations

The second way to write checkers is to add annotations to the code.

Suppose, for example, that there is an internal function named foo that takes a single integer parameter, and that a potential security vulnerability is introduced when this parameter is -1. A check for this case could be implemented by adding some code to the body of foo as follows:

void foo(int x)

{

#ifdef __CSURF__

csonar_trigger(x, “==”, -1,

“Dangerous call to foo()”);

#endif __CSURF__

…

}

The #ifdef construct ensures that this new code is not seen by the regular compiler. However, when this code is analyzed by the tool, the call to csonar_trigger is seen. Thus, this call is never actually executed, but is instead interpreted by the tool as a directive to the underlying analysis engine. If the analysis concludes that the trigger condition may be satisfied, then it will issue a warning with the given message.

Most programmers prefer not to clutter their code with such annotations, so there is an alternative way to implement this kind of check that allows it to be written in a separate file. This approach is also appropriate when the source code for foo is not available, such as when it is in a third-party library. To do this, the author of the checker would write a replacement function as follows:

void csonar_replace_foo(int x)

{

csonar_trigger(x, “==”, -1,

“Dangerous call to foo()”);

foo();

}

When the analysis sees the definition of csonar_replace_foo, it treats all calls in the code to foo (except the one inside csonar_replace_foo) as if they were calls to csonar_replace_foo instead.

This trigger idiom is good for checking temporal properties, particularly sequences of function calls. Suppose there is a rule that says that bar should never be called while foo is executing. A check might be implemented as follows:

static int foo_is_executing = 0;

void csonar_replace_foo(int x) {

foo_is_executing = 1;

foo(x);

foo_is_executing = 0;

}

void csonar_replace_bar(void) {

csonar_trigger(foo_is_executing,

“==”, 1,

“Call to bar from foo”);

bar();

}

Note that a global state variable is used to record whether or not foo is active. Before entry to foo it is set to one, and then reset to zero after foo returns. This variable is then checked in the trigger in bar, and if set to one, a warning will be issued.

The previous example shows how a global property might be checked. The same mechanism can be used to write checks for properties of individual objects. Attributes can be attached to objects to track their state.

This approach allows users to author static checks almost as if they were writing dynamic checks. The API for this kind of check is small, and the language is C, so there is a gentle learning curve. This simplicity is deceptive: The technique can be used to implement quite sophisticated checks.

API for intermediate representations

The final way to implement a custom check is to use the API that gives access to the underlying representations (the IR). One company used this API to find violations of a custom idiom for handling hardware errors.

This API can be used for other program analysis tasks, too. For example, a medical device company uses one tool primarily to not only identify potential tasking problems in their code, but also to emit information that allows them to interactively explore properties of stack configurations. Other applications include gathering custom metrics on the code and generating input to program visualization tools.

Many checks can be written using a visitor pattern – a function invoked as a callback on every IR element of a given type. Visitors can be defined for files, identifiers, subprograms, and nodes in the control-flow graph and the syntax tree. The interface to a visitor allows for pattern matching against the construct.

CodeSonar has a special kind of visitor, which is invoked during the path exploration of the control-flow graph. This allows the checker author to write sophisticated checks that leverage the built-in path-sensitive program analysis capabilities of the tool. A key aspect of this kind of checker is that it uses the part of the analysis that eliminates the exploration of infeasible paths, which automatically reduces the probability of false positive results.

Increasing software quality

Advanced static analysis tools have become essential tools for software developers because they are capable of finding serious security and safety defects. Authoring custom checkers is a low-cost way to multiply the effectiveness of these tools.

Paul Anderson is VP of Engineering at GrammaTech, Inc. He received his B.Sc. from Kings College, University of London and his Ph.D. from City University London. He has worked on software analysis and transformation tools for 20 years. He has helped organizations such as NASA, the FAA, Lockheed Martin, and Boeing apply automated code analysis to improve the quality and security of their software. He can be contacted at paul@grammatech.com.

GrammaTech, Inc. 607 273-7340 www.grammatech.com