| An Introduction to GCC - for the GNU compilers gcc and g++ by Brian J. Gough, foreword by Richard M. Stallman Paperback (6"x9"), 144 pages ISBN 0954161793 RRP £12.95 ($19.95) "A wonderfully thorough guide... well-written, seriously usable information" --- Linux User and Developer Magazine (Issue 40, June 2004) Get a printed copy>>> |
6.5 Examples
The following program will be used to demonstrate the effects of different optimization levels:
#include <stdio.h>
double
powern (double d, unsigned n)
{
double x = 1.0;
unsigned j;
for (j = 1; j <= n; j++)
x *= d;
return x;
}
int
main (void)
{
double sum = 0.0;
unsigned i;
for (i = 1; i <= 100000000; i++)
{
sum += powern (i, i % 5);
}
printf ("sum = %g\n", sum);
return 0;
}
The main program contains a loop calling the powern function.
This function computes the n-th power of a floating point number by
repeated multiplication--it has been chosen because it is suitable for
both inlining and loop-unrolling. The run-time of the program can be
measured using the time command in the GNU Bash shell.
Here are some results for the program above, compiled on a 566MHz Intel Celeron with 16KB L1-cache and 128KB L2-cache, using GCC 3.3.1 on a GNU/Linux system:
$ gcc -Wall -O0 test.c -lm $ time ./a.out real 0m13.388s user 0m13.370s sys 0m0.010s $ gcc -Wall -O1 test.c -lm $ time ./a.out real 0m10.030s user 0m10.030s sys 0m0.000s $ gcc -Wall -O2 test.c -lm $ time ./a.out real 0m8.388s user 0m8.380s sys 0m0.000s $ gcc -Wall -O3 test.c -lm $ time ./a.out real 0m6.742s user 0m6.730s sys 0m0.000s $ gcc -Wall -O3 -funroll-loops test.c -lm $ time ./a.out real 0m5.412s user 0m5.390s sys 0m0.000s
The relevant entry in the output for comparing the speed of the resulting executables is the ‘user’ time, which gives the actual CPU time spent running the process. The other rows, ‘real’ and ‘sys’, record the total real time for the process to run (including times where other processes were using the CPU) and the time spent waiting for operating system calls. Although only one run is shown for each case above, the benchmarks were executed several times to confirm the results.
From the results it can be seen in this case that increasing the
optimization level with -O1, -O2 and -O3
produces an increasing speedup, relative to the unoptimized code
compiled with -O0. The additional option
-funroll-loops produces a further speedup. The speed of the
program is more than doubled overall, when going from unoptimized code
to the highest level of optimization.
Note that for a small program such as this there can be considerable
variation between systems and compiler versions. For example, on a
Mobile 2.0GHz Intel Pentium 4M system the trend of the results
using the same version of GCC is similar except that the performance
with -O2 is slightly worse than with -O1. This
illustrates an important point: optimizations may not necessarily make a
program faster in every case.
| ISBN 0954161793 | An Introduction to GCC - for the GNU compilers gcc and g++ | See the print edition |