6.2 Speed-space tradeoffs
While some forms of optimization, such as common subexpression
elimination, are able to increase the speed and reduce the size of a
program simultaneously, other types of optimization produce faster code
at the expense of increasing the size of the executable. This choice
between speed and memory is referred to as a speed-space tradeoff.
Optimizations with a speed-space tradeoff can also be used in reverse
to make an executable smaller, at the expense of making it run slower.
6.2.1 Loop unrolling
A prime example of an optimization with a speed-space tradeoff is
loop unrolling. This form of optimization increases the speed of
loops by eliminating the "end of loop" condition on each iteration.
For example, the following loop from 0 to 7 tests the condition i
< 8
on each iteration:
for (i = 0; i < 8; i++)
{
y[i] = i;
}
At the end of the loop, this test will have been performed 9 times, and
a large fraction of the run time will have been spent checking it.
A more efficient way to write the same code is simply to unroll the
loop and execute the assignments directly:
y[0] = 0;
y[1] = 1;
y[2] = 2;
y[3] = 3;
y[4] = 4;
y[5] = 5;
y[6] = 6;
y[7] = 7;
This form of the code does not require any tests, and executes at
maximum speed. Since each assignment is independent, it also allows the
compiler to use parallelism on processors that support it. Loop
unrolling is an optimization that increases the speed of the resulting
executable but also generally increases its size (unless the loop is
very short, with only one or two iterations, for example).
Loop unrolling is also possible when the upper bound of the loop is
unknown, provided the start and end conditions are handled correctly.
For example, the same loop with an arbitrary upper bound,
for (i = 0; i < n; i++)
{
y[i] = i;
}
can be rewritten by the compiler as follows:
for (i = 0; i < (n % 2); i++)
{
y[i] = i;
}
for ( ; i + 1 < n; i += 2) /* no initializer */
{
y[i] = i;
y[i+1] = i+1;
}
The first loop handles the case i = 0
when n
is odd, and
the second loop handles all the remaining iterations. Note that the
second loop does not use an initializer in the first argument of the
for
statement, since it continues where the first loop
finishes. The assignments in the second loop can be parallelized, and
the overall number of tests is reduced by a factor of 2 (approximately).
Higher factors can be achieved by unrolling more assignments inside the
loop, at the cost of greater code size.