Print

## Using Design by Contract in C

by Charlie Mills
10/28/2004

Last spring I began a coding project that for various reasons I needed to write mostly in C. I wanted to do as much up-front work as possible to minimize bugs and debugging time later. At about the same time, I picked up The Pragmatic Programmer. This book introduces many important techniques for minimizing bugs and improving the quality of your code. One technique that I found particularly exciting was Design by Contract.

 Related Reading Secure Programming Cookbook for C and C++ Recipes for Cryptography, Authentication, Input Validation & More By John Viega, Matt Messier

### Design by Contract

Chances are you've heard of Design by Contract. If you're coding in Eiffel you're probably already using it. If you haven't heard of Design by Contract, then you're like I was six months ago. To quote Bertrand Meyer, the key concept of Design by Contract is "viewing the relationship between a class and its clients as a formal agreement, expressing each party's right and obligations" [3]. Because this discussion will segue into C coding and C doesn't have classes, we'll talk about functions instead.

#### Preconditions and Postconditions

To use a function (written by you or someone else), you need to understand the range of arguments allowed and the effects of calling the function. This is where contracts come into play. Most languages that support Design by Contract provide two types of statements to express the obligations of the function caller and the callee: preconditions and postconditions. The caller must meet all the preconditions of the function being called, and the callee or the function must meet its own postconditions. The failure of either party to live up to the terms of the contract is a bug in the software [3].

The quintessential examples of preconditions and postconditions are those of the square root function for positive real numbers. The precondition of the square root function is that the argument must be a real number greater than or equal to zero. The postcondition is that the return value is a real number such that its square must equal the argument (ignoring round-off issues).

``````/* square root function for positive real numbers */
double sqrt(double x)
{
/* ... */
}``````

When someone calls `sqrt()` with a value of `x` that is less than zero, `sqrt()` cannot meet its postcondition. Again, this contractual breach indicates a bug in the software. The best course of action usually is to abort [3].

#### Invariants

The third type of statement Design by Contract places at our disposal is the invariant. Invariants describe conditions which must always hold for an object (a structure or type in C). For example, once initialized, any instance of the dynamic array type `DArray_T`, shown below, should have the properties `length >= 0` and `capacity >= length`. These properties are invariants.

darray.c:

``````struct DArray_T {
void  **ptr;
long    length;
long    capacity;
};``````

darray.h:

``typedef struct DArray_T *DArray_T;``

In an object-oriented setting, calling a method on an object causes the object's invariants to be checked. Checking invariants happens before checking preconditions and after postconditions. So whenever someone calls the function

``void *DArray_get(DArray_T ary, long index);``

the following happens:

1. The invariants of `DArray_T` are checked for `ary`.
2. The preconditions of `DArray_get()` are checked.
3. `DArray_get()` executes.
4. The postconditions of `DArray_get()` are checked.
5. The invariants of `DArray_T` are checked for `ary`.

You're probably wondering how I went from discussing methods to the C function `DArray_get()`. As you know, C doesn't truly have objects or methods on objects. Instead it has structures, types, and functions. However, invariants apply just as well in C as they do in an object-oriented language. If a function has write access to a structure or type, then DBC should check the invariants of that structure or type before and after the function executes. The function `DArray_get()` has write access to all its parameters; therefore, the invariants of all its parameters should be checked. Obviously, `long` won't have any invariants, but `DArray_T` will.

The `const` qualifier can specify types as read only, but C's opaque types are a more powerful mechanism. Opaque types are pointers that client code cannot dereference. `DArray_T` as defined above is an opaque type. Defining `struct DArray_T` in darray.c and not darray.h hides its fields from clients. Therefore, only the functions defined in darray.c have access to the fields of ```struct DArray_T```. Only functions in darray.c need to check the invariants of `DArray_T`, eliminating the need to check the invariants of `DArray_T` (or any opaque type) in client code. (However, client code still needs to check the invariants of nonopaque types.)

#### Why Not Assert?

C doesn't provide any Design by Contract language features, but it does provide `assert()`. In the `sqrt()` example, an assert statement works fine for checking the precondition

``````double sqrt(double x)
{
assert(x >= 0);
/* ... */
}``````

Why not just stick with `assert()` and forget all this stuff about Design by Contract? Well, `assert()` does not quite live up to Design by Contract, for the following reasons:

• Side effects--Contract checking should not produce any side effects. The software should behave the same with or without contract checking in place. Unfortunately, adding `assert()` or any other type of condition requires more coding, and more coding introduces the possibility of more errors. A statement such as

``assert(size >= 0 && size = MAX_BUFFER_SIZE);``

is valid C code, but obviously it does not produce the desired effect.

• Lack of robustness--It takes significant extra coding to check postconditions and invariants using `assert()`.
• Lack of expressivity--The facilities for expressing contracts should be rich enough to express a specification [3]. In order to be expressive, assert statements must use C control structures.
• Mixing application code and contracts--Contracts' specifications should go in the function definition, but not as a part of the function definition. Assert statements tend to intermix with application code, mixing the specification and the implementation.
• Redundancy--It is very common to have a structure or type for which certain conditions must always hold. Ensuring these invariants requires checking them throughout your code; using `assert()` for this creates lots of duplicate code.

Because `assert()` can appear in arbitrary locations, assert statements serve better as sanity checks in confusing parts of your code or for checking loop invariants. However, if you have a section of code littered with asserts, it is usually a good sign that you need to do some refactoring [2].

### Design by Contract Meets C

Not satisfied with `assert()` and excited about Design by Contract, I set out to create my own Design by Contract implementation for C. After looking at some of the solutions available for Java [1] I decided to use a subset of the Object Constraint Language to express contracts [4]. Using Ruby and Racc, I created Design by Contract for C, a code generator that turns contracts embedded in C comments into C code to check the contracts.

The front end to this code generator is the `dbcparse.rb` Ruby script. Running `./dbcparse.rb -h` will print instructions on how to use the script. For starters, `./dbcparse.rb darray.c` will print to standard out darray.c with all contracts expanded into C code.

#### Contracts

Below is the contract for and implementation of `DArray_get()`:

darray.c:

``````/**
pre: index >= 0
pre: index < ary->length
post: return == ary->ptr[index]
*/
void *DArray_get(DArray_T ary, long index)
{
return ary->ptr[index];
}``````

Opening the comment with `/**` specifies a contract. Because the `DArray_get()` function directly follows the contract, the contract applies to the `DArray_get()` function (also called the context of the contract). Inside the comment, we have `pre:` and `post:` tags, which specify a precondition and postcondition, respectively. The `return` keyword is simply the value that `DArray_get()` returns.

This function is pretty simple; its preconditions and postconditions describe it nicely. Notice that the postcondition relies on the preconditions being true and that the function should not execute if the preconditions fail.

When defining a contract, each condition must be a Boolean expression; assignment is not allowed. The normal C comparison operators (`<`, `>`, `<=`, `>=`, `==`, and `!=`) all work. Instead of the `&&`, `||`, and `!` operators, we have `and`, `or`, and `not`, respectively. C control structures (`if`, `else`, `for`, `do`, `while`, `?:`) do not appear; instead there is an `implies` operator and two iterating operators, `forall` and `exists`. The iterating operators check a condition over a range; a `forall` statement is true if its condition holds true over the entire range, and an `exists` statement is true if its condition is true at least once in the range. The syntax is as follows:

```forall ( declaration in range | expression )``` ```exists ( declaration in range | expression )```

What do I mean by range? Many languages support a range idiom; I borrowed this syntax from Perl. (Other languages, such as Ruby, also use this syntax.) Ranges come in two forms: inclusive and exclusive. An inclusive range is `start..end`, and an exclusive range is `start...end`. All ranges include the start, but only inclusive ranges include the end.

Here is an example of the contract for a function that creates a shallow copy of a DArray_T:

darray.c:

``````/**
post: return != NULL implies return->length == ary->length and
forall (long i in 0...ary->length | return->ptr[i] == ary->ptr[i])
*/
DArray_T DArray_copy(DArray_T ary)
{
/* ... */
}``````

In words, this contract says that if the function returns a non-null `DArray_T`, then the returned `DArray_T` will have the same length and elements as `ary`.

You're probably wondering how this crazy syntax works. Let me explain. First, the expression `return != NULL` is checked. If it is false, the condition is true and we are done. If not, we must check what ```return != NULL``` implies. According to the condition, it implies:

``````return->length == ary->length and
forall (long i in 0...ary->length | return->ptr[i] == ary->ptr[i])``````

The expression `return->length == ary->length` is self-explanatory. The `forall` statement is more complex. Remember that the syntax of the `forall` statement is

```forall ( declaration in range | expression )```

`long i` is the declaration, `0...ary->length` is the range being iterated over, and ```return->ptr[i] == ary->ptr[i]``` is the expression to check at each iteration.

I mentioned before that all instances of `DArray_T` should have the properties `length >= 0` and ```capacity >= length```. How do we know that the `DArray_T` being passed to `DArray_get()` has these properties? By specifying them as invariants.

darray.c:

``````/**
context DArray_T
inv: self != NULL
inv: self->length >= 0
inv: self->capacity >= self->length
inv: (self->capacity == 0 and self->ptr == NULL)
or (self->capacity > 0 and self->ptr != NULL)
*/
struct DArray_T {
void  **ptr;
long    length;
long    capacity;
};``````

Now, calling every function in darray.c with a parameter of type `DArray_T` (the context) will check these invariants before and after the function executes. Note that defining invariants in a module (.c file) makes the invariants local to that module (as in darray.c), and defining invariants in a header file makes them global, causing the invariants to be checked for all applicable functions.

#### Syntax Review

Contract specifications occur in comments beginning with `/**`. If there is no context in the contract, the context is the next C statement following the contract. Below is the syntax for expressing contracts; statements in angle brackets are optional:

• contract
• <context> condition(s)
• context
• context declaration
• condition
• pre <message>: expression
• post <message>: expression
• inv <message>: expression
• expression
• valid C identifier, macro, or function call
• ( expression )
• forall ( declaration in range | expression )
• exists ( declaration in range | expression )
• not expression
• expression <, >, <=, >=, ==, != expression
• expression in range
• expression and expression
• expression or expression
• expression implies expression
• range
• start..end
• start...end

The comparison operators (`<`, `>`, `<=`, `>=`, `==`, and `!=`) have the same precedence as in C. The logical operators have the following precedence: `and`, `or`, `implies`.

#### Breach of Contract

Finding a contract breach calls `dbc_error()`. This function has the following prototype:

``void dbc_error(const char *message);``

`message` is simply a string with information about the condition that failed. `dbc_error()` should log this information--or print it to standard error--and not return. It is very important that `dbc_error()` not return. In my projects I usually define `dbc_error()` as:

``````#ifdef HAVE_RUBY_H
#include <ruby.h>
#define dbc_error(msg) rb_raise(rb_eFatal, msg)
#else
#include <stdio.h>
#include <stdlib.h>
#define dbc_error(msg) do { (void)fputs(msg, stderr); abort(); } while (0)
#endif``````

It is generally a good idea to stick to the fail-hard principle [3] and call `abort()`. (In Ruby, raising a fatal error is similar to calling `abort()`, but it also gives you a Ruby stack trace.)

### Wrap-Up

DBC for C has come a long way since I decided it would be a worthy hack last spring. The project that originally brought about the need for DBC for C has benefited greatly. The benefits include:

• Improved design--It is difficult to write contracts for code with poorly defined responsibilities. Using Design by Contract encourages code with well-defined responsibilities, which leads to better code and more modular design [3].
• Reduced debugging time--Well-defined contracts coupled with unit testing help find bugs early, and the vast majority of the time they find them at the source. (Remember, any breach of contract is a bug in the software.)
• Up-to-date documentation--Contracts make great documentation. It doesn't stop there, though. Because of testing contracts at runtime, out-of-date documentation is a bug in the software.

Hopefully you're now interested in using contracts in your C code. Here are some features of DBC for C that I will explore in the future:

• Integrating into your build process.
• Enabling different levels of contract checking.
• Unit testing. (Unfortunately I don't have room to discuss this here, but contracts are just fancy documentation unless you unit-test.)
• Generating Doxygen documentation from contracts and adding contracts to your existing Doxygen documentation.
• Accessing the value of a variable before function execution in postconditions using the `@pre` syntax.

In preparation for this article I spent some long nights preparing DBC for C for public consumption; hopefully you're ready to give it a try.

#### Resources

• DBC for C: generates Design by Contract runtime checks and documentation from tags embedded in C comments.
• Doxygen: documentation system for C, C++, IDL, Java, Objective-C, and other languages.
• Racc (Ruby yACC): an LALR parser generator for Ruby.
• Ruby: the powerful and dynamic object-oriented language that "makes programming fun again."
• Ruby Installer: a one-click Ruby installer for Windows. A Mac OS X installer is under development.

Charlie Mills develops data warehousing applications using C and Ruby.