Wednesday, March 21, 2007

Techniques to handle runtime errors in C++

Bail, return, jump, or . . . throw?

The common techniques for handling run-time errors in C leave something
to be desired, like maybe exception handling.
*The common techniques for handling run-time errors in C leave something
to be desired, like maybe exception handling.*

The exception handling machinery in C++ is designed to deal with program
errors, such as a resource allocation failure or a value out of range.
C++ exception handling provides a way to decouple error reporting from
error handling. However, it's not designed to handle asynchronous events
such as hardware interrupts.

C++ exception handling is designed to address the limitations of error
handling in C. In this installment, I'll look at some of the more common
techniques for handling run-time errors in C programs and show you why
these techniques leave something to be desired.

*Error reporting via return values*
Many C functions report failures through their function return values or
arguments. For example, in the Standard C library:

* *malloc* returns a null pointer if it fails to allocate memory.
* *strtoul* returns *ULONG_MAX* and stores the *ERANGE* into the
object designated by *errno* if the converted value can't be
represented as an *unsigned long*.
* *printf* returns a negative value if it can't format and print
every operand specified in its format list.

(The macro *ULONG_MAX* is defined in the standard header *<limits.h>*.
Macros *ERANGE* and *errno* are defined in *<errno.h>*.)

If you want your C code to be reliable, you should write it so that it
checks the return values from calls to all such functions. In some
cases, adding code to check the return value isn't too burdensome. For
example, a typical call to *malloc* such as:

*

p = malloc(sizeof(T));

*becomes: *

p = malloc(sizeof(T));
if (p == NULL)
// cope with the failure

*In other cases, writing a proper check is a bit tricky. For example, a
call to *strtoul* such as: *

n = strtoul(s, &e, 10);

*becomes: *

n = strtoul(s, &e, 10);
if (n == ULONG_MAX
&& errno == ERANGE)
// deal with the overflow

*Having detected an error, you then have to decide what to do about it.

*Bailing*
Some errors, such as a value out of range, might be the result of
erroneous user input. If the input is interactive, the program can just
prod the user for a more acceptable value. With other errors, such as a
resource allocation failure, the system may have little choice other
than to shutdown.

The most abrupt way to bail out is by calling the Standard C *abort*
function, as in:

*

if (/something really bad happened/)
abort();

*Calling *abort* terminates program execution with no promise of
cleaning anything up. Calling the Standard C *exit* function is not
quite as rude: *

if (/something really bad happened/)
exit(EXIT_FAILURE);

*Calling *exit* closes all open files after flushing any unwritten
buffered data, removes temporary files, and returns an integer-valued
exit status to the operating system. The standard header *<stdlib.h>*
defines the macro *EXIT_FAILURE* as the value indicating unsuccessful
termination.

You can use the Standard C *atexit* function to customize *exit* to
perform additional actions at program termination. For example, calling: *

atexit(turn_gizmo_off);

*"registers" the *turn_gizmo_off* function so that a subsequent call to
*exit* will invoke: *

turn_gizmo_off();

*as it terminates the program. The C standard says that *atexit* should
let you register up to 32 functions. Some implementations allow even more.

Embedded systems being as diverse as they are, I suspect that some don't
support either *abort* or *exit*. In those systems, you must use some
other platform-specific function(s) to shut things down.

More commonly, complete shutdown is not the appropriate response to an
error. Rather than shut down, the system should transition to a "safe"
state, whatever that is, and continue running. Here again, the details
of that transition are platform specific.

*Returning*
Some of the code in any embedded system is clearly application specific.
Many systems contain a good chunk of application-independent code as
well. The application-independent code could be from a library shipped
with the compiler or operating system, from a third-party library, or
from something developed in-house.

When an application-specific function detects an error, it can respond
on the spot with a specific action, as in: *

if (/something really bad happened/)
take_me_some_place_safe();

*In contrast, when an application-independent function detects an error,
it can't respond on its own because it doesn't know how the application
wants to respond. (If it did know, it wouldn't be application
independent.) Rather than respond at the point where the error was
detected, an application-independent function can only announce that the
error has occurred and leave the error handling to some other function
further up the call chain. The announcement might appear as a return
value, an argument passed by address, a global object, or some
combination of these. As I described earlier, this is what most Standard
C library functions do.

Although conceptually simple, returning error indicators can quickly
become cumbersome. For example, suppose your application contains a
chain of calls in which *main* calls *f*, which calls *g*, which calls
*h*. Ignoring any concern for error handling, the code would be as shown
in Listing 1.

Now, suppose reality intrudes and function *h* has to check for a
condition it can't handle. In that case, you might rewrite *h* so that
it has a non-*void* return type, such as *int*, and appropriate return
statements for error and normal returns. The function might look like: *

int h(void)
{
if (/something really bad happened/)
return -1;
// do h
return 0;
}

* Now *g* is responsible to heed the return value of *h* and act
accordingly. However, more often than not, functions in the middle of a
call chain, such as *g* and *f*, aren't in the position to handle the
error. In that case, all they can do is look for error values coming
from the functions they call and return them up the call chain. This
means you must rewrite both *f* and *g* to have non-*void* return types
along with appropriate return statements, as in: *

int g(void)
{
int status;
if ((status = h()) != 0)
return status;
// do the rest of g
return 0;
}

int f(void)
{
int status;
if ((status = g()) != 0)
return status;
// do the rest of f
return 0;
}

*Finally, the buck stops with *main*: *

int main()
{
if (f() != 0)
// handle the error
// do the rest of main
return 0;
}

*This approach--returning error codes via return values or
arguments--effectively decouples error detection from error handling,
but the costs can be high. Passing the error codes back up the call
chain increases the size of both the source code and object code and
slows execution time. It's been a while since I've used this approach to
any extent, but my recollection is that the last time I did, it
increased the non-comment source lines in my application by 15 to 20%,
with a comparable increase in the object code. Other programmers have
told me they've experienced increases to the tune of 30 to 40%.

This technique also increases coding effort and reduces readability.
It's usually difficult to be sure that your code checks for all possible
errors. Static analyzers, such as Lint, can tell you when you've ignored
a function's return value, but as far as I know, they can't tell you
when you've ignored the value of an argument passed by address. The
consistent application of this technique can easily break down when the
current maintainer of the code hands it off to a less experienced one.

*Jumping*
We could eliminate much of the error reporting code from the middle
layers of the call chain by transferring control directly from the
error-detection point to the error-handling point. Some languages let
you do this with a non-local goto. If you could do this in C, it might
look like: *

int h(void)
{
if (/something really bad happened/)
goto error_handler;
// do h
return 0;
}

...

int main()
{
f();
// do the rest of main
return 0;
error_handler:
// handle the error
}

*but you can't. It won't compile. However, you can do something similar
using the facilities provided by the standard header *<setjmp.h>*. That
header declares three components: a type named *jmp_buf* and two
functions named *setjmp* and *longjmp*. (Actually, *setjmp* might be a
function-like macro, but for the most part, you can think of it as a
function.)

Calling *setjmp(jb)* stores a "snapshot" of the program's current
calling environment into *jmp_buf jb*. That snapshot typically includes
values such as the program counter, stack pointer, and possibly other
CPU registers that characterize the current state of the calling
environment.

Subsequently, calling *longjmp(jb, v)* (I'll explain *v* shortly)
effectively performs a non-local goto--it restores the calling
environment from snapshot *jb* and causes the program to resume
execution as if it were returning from the call to *setjmp* that took
the snapshot previously. It's like /déjà vu/ all over again.

The function calling *setjmp* can use *setjmp*'s return value to
determine whether the return from *setjmp* is really that, or actually a
return from *longjmp*. When a function directly calls *setjmp(jb)* to
take a snapshot, *setjmp* returns 0. A later call to *longjmp(jb, v)*,
where *v* is non-zero, causes program execution to resume as if the
corresponding call to *setjmp* returned *v*. In the special case where
*v* is equal to 0, *longjmp(jb, v)* causes setjmp to return 1, so that
*setjmp* only returns 0 when called directly.

Listing 2 shows our hypothetical application with a *longjmp* from *h*
to *main*. Since the *longjmp* bypasses *g* and *f*, these two functions
no longer need to check for error return values, thus simplifying the
source code and reducing the object code.

Using *setjmp* and *longjmp* eliminates most, if not all, of the clutter
that accrues from checking and returning error codes. So what's not to
like about them?

The problem is that you must be extremely cautious with them to avoid
accessing invalid data or mismanaging resources. A *jmp_buf* need not
contain any more information than necessary to enable the program to
resume execution as if it were returning from a *setjmp* call. It need
not and probably will not preserve the state of any local or global
objects, files, or floating-point status flags.

Using *setjmp* and *longjmp* can easily lead to resource leaks. For
example, suppose functions *g* and *f* each allocate and deallocate a
resource, as in: *

void g(size_t n)
{
char *p;
if ((p = malloc(n)) == NULL)
// deal with it
h();
// do the rest of g
free(p);
}

void f(char const *n)
{
FILE *f;
if ((f = fopen(n, "r")) == NULL)
// deal with it
g();
// do the rest of f
if (fclose(f) == EOF)
// deal with it
}

*A call to *longjmp* from *h* transfers control to *main*, completely
bypassing the remaining portions of *g* and *f*. When this happens, *g*
misses the opportunity to close its *FILE*, and *f* misses the
opportunity to free its allocated memory.

C++ classes use destructors to provide automatic resource deallocation.
A common practice in C++ is to wrap pointers inside classes, and provide
destructors to ensure that the resources managed via these pointers are
eventually released. Unfortunately, *setjmp* and *longjmp* are strictly
C functions that know nothing about destructors. Calling *longjmp* in a
C++ program can bypass destructor calls, resulting in resource leaks.

*A forward pass*
Using exception handling in C++ can avoid these resource leaks. It
properly executes destructors as it transfers control from an
error-detection point to an error handler, and it will be the subject of
my next column.

No comments: