How good is your compiler (at finding coding defects)?

Story

January 04, 2010

Wojciech Basalaj, Ph.D.

PRQA

Many believe that if source code compiles cleanly, with all warnings activated, then it is ready to move on to a verification stage such as test or code review. However, it is dangerous to assume that if the code has compiled cleanly, then any errors present must have resulted from the interpretation of the requirements and not their implementation. Wojciech empirically evaluates this assumption and proves that the range of warnings provided by any compiler is extremely limited when compared to those produced by a dedicated static analysis and Coding Standards Enforcement (CSE) tool.

It is a commonly held view that if source code compiles cleanly, with all warnings turned on, then it is ready for verification such as test or code review. The danger with this assumption is that if the code has compiled cleanly, then any errors present must be in the interpretation of the requirements and not in their implementation. However, an empirical evaluation of this assumption ultimately shows that the range of warnings provided by any compiler is seriously limited when compared to those produced by a dedicated static analysis and Coding Standards Enforcement (CSE) tool.

The comparisons made herein use the GNU Common C++ “2” version 1.6.3, a real-world code base of around 42,000 lines of code. As this is a cross-platform library, it does not favor any particular compiler and can be used as a representative sample that any compiler might be expected to handle. Its modest size allows all compiler warnings to be manually reviewed for their accuracy, while ensuring at the same time that their diversity and amount is non-trivial.

The four compilers examined are GCC, Visual C++, C++Builder, and Intel C++ Compiler, along with a static analysis and CSE tool to show that if developers rely too heavily on their compiler to identify coding defects, they might find that their code isn’t maintainable, reusable, or portable. In addition, Visual C++ “Team edition” supplements its standard compiler warnings with a “code analysis” feature, the output of which is included in these results.

Warning outputs generated

In practice, every one of the defects missed by one of these four compilers has an impact on the quality of the code base, be that in its maintainability, portability, or reusability. That represents a significant threat when deploying the code, despite the fact that the majority of the sample source code passes the compilers’ architected checking parameters.

As these compilers are based on different front-ends, different warnings can be anticipated from each. Table 1 presents a side-by-side comparison of distinct warnings generated by each compiler, and by the static analysis tool, for the code base used in our comparison: GNU Common C++ “2”. The latest version of each compiler available at the time our results were compiled was used with the maximum warning level enabled. (The header row of Table 1 indicates the exact compiler versions and options used.) Rather than benchmarking these compilers relative to one another, their warning outputs were compared with a static analyzer for C++.

Table 1: Default detection comparison – The basis of comparison and each percentage figure is the ratio between distinct warnings reported by a compiler and a static analysis tool within a given category. The header row details the exact compiler versions and options used to enable the maximum warning level.

As indicated in the last row in the table, the CSE tool generates in excess of 400 warnings, while none of the compilers tested even manage to return 20. In fact, static analysis is empirically shown to identify 25 times more warnings than the best among all four compilers – Visual C++ with Code Analysis enabled (/analyze option). It is interesting to note that without this feature enabled, Visual C++ generates the fewest warnings among all the compilers tested.

The first column of data in Table 1 shows the percentage of warnings generated by each compiler that were also detected by the static analysis tool. Note that the degree of overlap is high, with an average 84 percent of compiler warnings replicated by the CSE tool. This side of the comparison is only given for completeness, as developers would be expected to enable compiler warnings regardless of whether or not static analysis is performed.

The remaining rows of Table 1 show the other side of the comparison: How much of what is statically detectable do compilers flag? It is evident that compiler warnings steer clear of the “Efficiency and use of C++” category. This is expected, as compiler optimizations are performed in the back-end, typically silently. However, it is worth noting that dedicated CSE tools have a range of checks in this category focused on inefficient design, which cannot be corrected automatically, unlike low-level compiler optimizations.

Common warnings missed

Portability is a common warning category missing from a compiler’s arsenal. Only C++Builder generated a single warning that can be classified as a portability issue, compared to 17 flagged by the static analysis tool. These represent those constructs that comply with the ISO C++ Language definition but can cause problems with different compiler implementations. It is not uncommon for compiler vendors to lock developers in by offering extensions to ISO C++, and it is unsurprising that portability is not high on the agenda for them. This represents another aspect of portability concerns, conformance to ISO C++, which is addressed by a separate warning category in the static analysis tool.

To most compiler vendors, ISO C++ compliance boils down to accepting as much valid C++ code as possible, while sidestepping the issue of detecting non-conforming code – often their own language extensions. Detecting ISO C++ non-conformance is one of a CSE tool’s strengths, which is evident from Table 1. It is apparent that most compiler warnings can be classified as (code) “Design Problems” and “Maintainability,” with some so minor as to merit them being demoted to style issues. However, even for these focus areas, the coverage compared to the static analysis tool is far from comprehensive, standing at 7 percent for the best contender – Visual C++ with code analysis feature.

Other warning categories that compilers traditionally avoid include: naming conventions, code layout, complexity metric thresholds, and banning certain keywords (for example, throw) and functions (for example, malloc), with a notable exception of a Visual C++ code analysis feature that has hardwired warnings for use of _alloca, _snprintf, and TerminateThread functions. As this is not as comprehensive as the static analysis tool’s configurable check that allows any function to be specified, half a point was awarded, giving this compiler a score of 10 percent for Local (company specific) Standards enforcement. The primary benefit of enforcing the aforementioned areas is enhanced reusability of code, and it is apparent from Table 1 that this is virtually untapped by compilers.

When comparing the actual warning instances produced by each tool, it would not be particularly enlightening to tabulate raw warning counts, so common C++ Coding Standards will serve as an objective basis of comparison instead. As can be seen from Table 2, none of the compilers offer any noticeable enforcement of High Integrity C++, JSF++, or MISRA C++, compared to the violations recorded by the CSE tool.

Table 2: Coding Standard Enforcement (CSE) comparison

CSE tool: Most comprehensive/transferable route

It is a common misconception that compiler warnings are a sufficient means of statically analyzing source code. The range of warnings available from market-leading compilers is limited compared to a dedicated static analysis and CSE tool like PRQA’s QA•C++. Moreover, the few checks that are available tend to be focused on code misbehavior and maintainability problems, with reusability and portability issues completely overlooked. A dedicated CSE tool offers comprehensive enforcement of all of these areas, while remaining compiler agnostic, so that code bases and development environments do not have to be tied down to a particular compiler and platform.

Dr. Wojciech Basalaj has nine years of technical experience with PRQA in the Consulting Services Group and is the most senior Technical Consultant. Wojciech graduated from King’s College, London with a First Class BSc degree in Computer Science in 1997. As part of the course, he undertook a one-year industrial placement at Lucent Technologies Wireless in Winchester, UK. Wojciech obtained his Ph.D. in the field of Information Visualization at Trinity College, Cambridge. He can be contacted at Wojciech_Basalaj@programmingresearch.com.

PRQA 617-273-8448 www.programmingresearch.com