Wednesday, March 07, 2007

C++ Performance Tips

Introduction

These tips are based mainly on ideas from the book Efficient C++
by Dov Bulka and David Mayhew. For a more thorough treatment of
performance programming with C++, I highly recommend this book.

Constructors and Destructors

* The performance of constructors and destructors is often poor due
to the fact that an object's constructor (destructor) may call the
constructors (destructors) of member objects and parent objects.
This can result in constructors (destructors) that take a long
time to execute, especially with objects in complex hierarchies or
objects that contain several member objects. As long as all of the
computations are necessary, then there isn't really a way around
this. As a programmer, you should at least be aware of this
"silent execution".

If all of the computations mentioned above are not necessary, then
they should be avoided. This seems like an obvious statement, but
you should be sure that the computations performed by the
constructor that you are using is doing only what you need.

* Objects should be only created when they are used. A good
technique is to put off object creation to the scope in which it
is used. This prevents unnecessary constructors and destructors
from being called.
* Using the initializer list functionality that C++ offers is very
important for efficiency. All member objects that are not in the
initializer list are by default created by the compiler using
their respective default constructors. By calling an object's
constructor in the initializer list, you avoid having to call an
object's default constructor and the overhead from an assignment
operator inside the constructor. Also, using the initializer list
may reduce the number of temporaries needed to construct the
object. See the Temporaries section for more information on this.

------------------------------------------------------------------------


Virtual Functions

* Virtual functions negatively affect performance in 3 main ways:
1. The constructor of an object containing virtual functions
must initialize the vptr table, which is the table of
pointers to its member functions.
2. Virtual functions are called using pointer indirection,
which results in a few extra instructions per method
invocation as compared to a non-virtual method invocation.
3. Virtual functions whose resolution is only known at run-time
cannot be inlined. (For more on inlining, see the Inlining
section.
* Templates can be used to avoid the overhead of virtual functions
by using a templated class in place of inheritance. A templated
class does not use the vptr table because the type of class is
known at compile-time instead of having to be determined at
run-time. Also, the non-virtual methods in a templated class can
be inlined.
* The cost of using virtual functions is usually not a factor in
calling methods that take a long time to execute since the call
overhead is dominated by the method itself. In smaller methods,
for example accessor methods, the cost of virtual functions is
more important.

------------------------------------------------------------------------


Return Value

Methods that must return an object usually have to create an object to
return. Since constructing this object takes time, we want to avoid it
if possible. There are several ways to accomplish this.

* Instead of returning an object, add another parameter to the
method which allows the programmer to pass in the object in which
the programmer wants the result stored. This way the method won't
have to create an extra object. It will simply use the parameter
passed to the method. This technique is called Return Value
Optimization (RVO).
* Whether or not RVO will result in an actual optimization is up to
the compiler. Different compilers handle this differently. One way
to help the compiler is to use a computational constructor. A
computational constructor can be used in place of a method that
returns an object. The computational constructor takes the same
parameters as the method to be optimized, but instead of returning
an object based on the parameters, it initializes itself based on
the values of the parameters.

------------------------------------------------------------------------


Temporaries

Temporaries are objects that are "by-products" of a computation. They
are not explicitly declared, and as their name implies, they are
temporary. Still, you should know when the compiler is creating a
temporary object because it is often possible to prevent this from
happening.

* The most common place for temporaries to occur is in passing an
object to a method by value. The formal argument is created on the
stack. This can be prevented by using pass by address or pass by
reference.
* Compilers may create a temporary object in assignment of an
object. For example, a constructor that takes an int as an
argument may be assigned an int. The compiler will create a
temporary object using the int as the parameter and then call
the assignment operator on the object. You can prevent the
compiler from doing this behind your back by using the explicit
keyword in the declaration of the constructor.
* When objects are returned by value, temporaries are often used.
See the Return Value section for more on this.
* Temporaries can be avoided by using = operators. For
example, the code
a = b + c;
could be written as
a=b;
a+=c;

------------------------------------------------------------------------


Inlining

Inlining is one of the easiest optimizations to use in C++ and it can
result in the most dramatic improvements in execution speed. The main
thing to know when using inlining is when you should inline a method and
when you shouldn't inline.

* There is always a trade-off between code size and execution speed
when inlining. In general, small methods (for example, accessors)
should be inlined and large methods should not be inlined.
* If you are not sure of whether or not a given method should be
inlined, the best way to decide is to profile the code. That is,
run test samples of the code, timing inlining and non-inlining
versions.
* Excessive inlining can drastically increase code size, which can
result in increased execution times because of a resulting lower
cache hit rate.
* Watch out for inlined methods that make calls to other inlined
methods. This can make the code size unexpectedly larger.
* Singleton methods, methods that are only called from one place in
a program, are ideal for inlining. The code size does not get any
bigger and execution speed only gets better.
* Using literal arguments with an inlined method allows the compiler
to make significant optimizations. (This is, however, compiler
dependent.)
* The compiler preprocessor can be used to implement conditional
inlining. This is useful so that during testing the code is easier
to debug. But for compiling production code, there are no changes
to be made to the source code. This is implemented by using a
preprocessor macro called INLINE. Inlined code is defined within
#ifdef INLINE ...

No comments: