An Introduction to GCC - Speed-space tradeoffs

6.2 Speed-space tradeoffs

While some forms of optimization, such as common subexpression elimination, are able to increase the speed and reduce the size of a program simultaneously, other types of optimization produce faster code at the expense of increasing the size of the executable. This choice between speed and memory is referred to as a speed-space tradeoff. Optimizations with a speed-space tradeoff can also be used in reverse to make an executable smaller, at the expense of making it run slower.

6.2.1 Loop unrolling

A prime example of an optimization with a speed-space tradeoff is loop unrolling. This form of optimization increases the speed of loops by eliminating the "end of loop" condition on each iteration. For example, the following loop from 0 to 7 tests the condition i < 8 on each iteration:

for (i = 0; i < 8; i++)
  {
    y[i] = i;
  }

At the end of the loop, this test will have been performed 9 times, and a large fraction of the run time will have been spent checking it.

A more efficient way to write the same code is simply to unroll the loop and execute the assignments directly:

y[0] = 0;
y[1] = 1;
y[2] = 2;
y[3] = 3;
y[4] = 4;
y[5] = 5;
y[6] = 6;
y[7] = 7;

This form of the code does not require any tests, and executes at maximum speed. Since each assignment is independent, it also allows the compiler to use parallelism on processors that support it. Loop unrolling is an optimization that increases the speed of the resulting executable but also generally increases its size (unless the loop is very short, with only one or two iterations, for example).

Loop unrolling is also possible when the upper bound of the loop is unknown, provided the start and end conditions are handled correctly. For example, the same loop with an arbitrary upper bound,

for (i = 0; i < n; i++)
  {
    y[i] = i;
  }

can be rewritten by the compiler as follows:

for (i = 0; i < (n % 2); i++)
  {
    y[i] = i;
  }

for ( ; i + 1 < n; i += 2) /* no initializer */
  {
    y[i] = i;
    y[i+1] = i+1; 
  }

The first loop handles the case i = 0 when n is odd, and the second loop handles all the remaining iterations. Note that the second loop does not use an initializer in the first argument of the for statement, since it continues where the first loop finishes. The assignments in the second loop can be parallelized, and the overall number of tests is reduced by a factor of 2 (approximately). Higher factors can be achieved by unrolling more assignments inside the loop, at the cost of greater code size.

<<< previous

table of contents

next >>>