Articles > Sample of 'An Introduction to GCC'
This is the first chapter of the book "An Introduction to GCC" (ISBN: 0-9541617-9-3), published by Network Theory Ltd.
For more information, visit the webpage for the book itself.
Introduction
The purpose of this book is to explain the use of the GNU C and C++
compilers, gcc and g++. After reading this book you
should understand how to compile a program, and how to use basic
compiler options for optimization and debugging. This book does not
attempt to teach the C or C++ language itself, since this material can
be found in many other places (see section Further reading).
A brief history of GCC
The original author of the GNU C Compiler (GCC) is Richard Stallman, the founder of the GNU Project.
The GNU project was started in 1984 to create a complete Unix-like operating system as free software, in order to promote freedom and cooperation among computer users and programmers. Every Unix-like operating system needs a C compiler, and as there were no free compilers in existence at that time the GNU Project had to develop one from scratch. The work was funded by donations from individuals and companies to the Free Software Foundation, a non-profit organization set up to support the work of the GNU Project.
The first release of GCC was made in 1987. This was a significant breakthrough, being the first portable ANSI C optimizing compiler released as free software. Since that time GCC has become the foundation of all free software.
A major revision of the compiler came with the 2.0 series in 1992, which added the ability to compile C++. In 1997 an experimental branch of the compiler (EGCS) was created to improve optimization and C++ support. These features were integrated back into the main-line of GCC development and became widely available in the 3.0 release of GCC in 2001.
Today GCC has been extended to support many additional languages, including Fortran, ADA and Java. The acronym GCC is now used to refer to the "GNU Compiler Collection". Its development is guided by the GCC Steering Committee, a group composed of representatives from GCC user communities in industry, research and academia.
Major features of GCC
This section describes some of the most important features of GCC.
First of all, GCC is a portable compiler -- it runs on most platforms available today and can produce output for many types of chips, from 8-bit microcontrollers through to processors used by supercomputers.
GCC is not only a compiler -- it can also cross-compile any program, producing executable files for a different system from the one where the compiler is running. This is particularly useful for embedded systems which are not capable of running a compiler. GCC is written in C and can compile itself, so it can be ported to new systems in the future.
GCC has multiple language front-ends, for parsing different languages. Programs in each language can be compiled, or cross-compiled, for any architecture. For example an ADA program can be compiled for a microcontroller, or a C program for a supercomputer.
GCC has a modular design, allowing support for new languages and architectures to be added easily. Adding a new language front-end to GCC enables the use of that language immediately on any architecture. Similarly, adding support for a new architecture makes it immediately available to all languages.
Finally and most importantly GCC is free software. This means you have the freedom to use and to modify GCC, as with all GNU software. If you need support for a new type of CPU, a new language, or new language feature you can hire someone to enhance GCC for you. You can hire someone to fix a bug if it is important for your work.
Furthermore you have the freedom to share any enhancements you make to GCC. As a result of this freedom you can also make use of enhancements to GCC developed by others. The many features offered by GCC today show how this freedom to cooperate works to benefit you and everyone else who uses GCC.
Programming in C and C++
C and C++ are languages which allow direct access to the computer's memory. As such they are important for writing low-level systems software, and applications where high-performance or control over resource usage are critical. However, great care is required to ensure that memory is accessed correctly, to avoid corrupting other data-structures. This book describes techniques to help detect potential errors when a program is compiled, but the risk in using C or C++ can never be eliminated.
In addition to C and C++ the GNU operating system also provides other
high-level languages such as GNU Common Lisp (gcl), GNU Smalltalk
(gst), the GNU Scheme extension language (guile) and the
GNU Compiler for Java (gcj) -- these languages do not require
the user to access memory directly, eliminating the possibility of memory
access errors. They provide a safer alternative to C and C++ for many
applications.
Conventions used in this manual
This manual contains many examples which can be typed at the keyboard. A command entered at the terminal is shown like this,
$ command
For example,
$ echo hello world hello world
The first character on the line is the terminal prompt, and should not be typed. The dollar sign `$' is used as the standard prompt in this manual, although some systems may use a different character.
When a command in an example is too long to fit in a single line it is wrapped and then indented on subsequent lines, like this:
$ echo "an example of a line which is too long to fit
in this manual"
When entered at the keyboard the entire command should be typed on a single line.
The example source files used in this manual can be downloaded from the
publisher's website, or entered by hand using the standard GNU editor
emacs. The example commands use gcc and g++ as the
names of the GNU C and C++ compilers, and cc to refer to other
compilers. The example programs should work with any version of GCC.
Any command-line options which are only available in recent versions of
GCC are noted in the text.
The examples assume the use of the GNU operating system. There may be
minor differences in the output on other systems. The commands for
setting environment variables use the Bourne shell syntax of the
standard GNU shell (bash).
Compiling a C program
This chapter describes how to compile C programs using
gcc. Programs can be compiled from a single file or from multiple
files, using system libraries and header files.
Compilation refers to the process of converting a program from the textual source code in a programming language such as C or C++ into machine code, the sequence of 1's and 0's used to control the central processing unit (CPU) of the computer. This machine code is stored in a file known as an executable file, sometimes referred to as a binary file.
Compiling a simple C program
The classic example program for the C language is Hello World. Here is the source code for our version of the program:
#include <stdio.h>
int
main (void)
{
printf ("Hello, world!\n");
return 0;
}
We'll assume the source code is stored in a file called `hello.c'.
To compile the file `hello.c' with gcc, use the following
command:
$ gcc -Wall hello.c -o hello
This compiles the source code in `hello.c' to machine code and
stores it in an executable file `hello'. The output file for the
machine code is specified using the -o option. This option is
usually given as the last argument on the command-line. If it is
omitted, the output is written to a default file called `a.out'.
Note that if a file with the same name as the executable file already exists in the directory it will be overwritten.
The option -Wall turns on the most commonly-used compiler
warnings -- it is recommended that you always use this option.
Without it the compiler will not produce any warning messages. There
are other -W warning options for special cases which will be
discussed in later chapters, but -Wall is the most important.
Compiler warnings are an essential aid in detecting possible problems
when programming in C and C++
In this case the compiler does not produce any warnings with the
-Wall option, since the program is completely valid. Source code
which does not produce any warnings is said to compile cleanly.
To run the program, type the path name of the executable:
$ ./hello Hello, world!
This loads the executable file into memory and causes the CPU to begin
executing the instructions contained within it. The path ./
refers to the current directory, so ./hello loads and runs the
executable file `hello' located in the current directory.
Finding errors in a simple program
As mentioned above, compiler warnings are an essential aid when
programming in C and C++. To demonstrate this the program below contains
a subtle error: it uses the function printf incorrectly by
specifying a floating-point format %f for an integer value:
#include <stdio.h>
int
main (void)
{
printf ("Two plus two is %f\n", 4);
return 0;
}
This error is not obvious at first sight but can be detected by the
compiler. However, in order for it to be reported the warning option
-Wall must be turned on.
If the program above is stored in a file `bad.c' and compiled with
the warning option -Wall the compiler produces the following
message:
$ gcc -Wall bad.c -o bad bad.c: In function `main': bad.c:6: warning: double format, different type arg (arg 2)
The text of the warning indicates that a format string has been used
incorrectly in the file `bad.c' at line 6. The messages produced
by gcc always have the form file:line-number:message. The
compiler distinguishes between error messages, which prevent the
program being compiled, and warning messages which indicate likely
problems (but do not stop the program from compiling).
In this case the correct format specifier for displaying integers with
printf would be %d (the format specifiers for printf
can be found in any general book on C, such as the GNU C Library
Reference Manual, see section Further reading).
Without the warning option -Wall the program appears to
compile without problems, but produces corrupted results:
$ gcc bad.c -o bad $ ./bad Two plus two is 2.585495 (corrupted output)
The incorrect format specifier causes the output to be corrupted, because
the function printf is passed an integer instead of a
floating-point number. Integers and floating-point numbers are stored
in different formats in memory, and occupy different numbers of bytes,
leading to a spurious result.
Clearly it is very dangerous to compile a program without checking for
compiler warnings. If there are any functions which are not used
correctly they can cause the program to crash or to produce incorrect
results. However, turning on the compiler warning option -Wall
will catch many of the common errors which occur in C programming.
Compiling multiple source files
A large program can be split up into multiple files. This makes it easier to edit and understand, and allows the files to be compiled independently.
Here we have split up the program Hello World into three files. `main.c', `fn_hello.c' and the header file `fn_hello.h'. First we will look at the main program `main.c',
#include "fn_hello.h"
int
main (void)
{
hello ("world");
return 0;
}
The original call to the system printf function in the previous
program `hello.c' has been replaced by a call to a new external
function hello, which we will define in a separate file
`fn_hello.c'.
The main program also includes the header file `fn_hello.h' which will
contain the declaration of the function hello. The declaration
is used to ensure that the types of the arguments and return value match
up correctly between the function call and the function definition. We
no longer need to include the system header file `stdio.h' in
`main.c' to declare the function printf since the file
`main.c' does not use printf directly.
The declaration in `fn_hello.h' is a single line specifying the
prototype of the function hello,
void hello (const char * name);
The definition of the function hello itself is contained in the
file `fn_hello.c',
#include <stdio.h>
#include "fn_hello.h"
void
hello (const char * name)
{
printf ("Hello, %s!\n", name);
}
This function prints the message Hello name using the
function argument as the value of name.
Incidentally, the difference between the two forms of the include
statement #include "FILE.h" and #include
<FILE.h> is that the former searches for `FILE.h' in
the current directory before looking in the system header file
directories. The include statement #include <FILE.h>
searches the system header files only.
To compile these files together with gcc, use the following
command:
$ gcc -Wall main.c fn_hello.c -o newhello
In this case we use the -o option to specify a different output
file for the executable, `newhello'.
Note that the header file `fn_hello.h' does not need to be specified
in the list of files on the command line. The directive #include
"fn_hello.h" in the source files instructs the compiler to include it
automatically.
To run the program, type the path name of the executable:
$ ./newhello Hello, world!
All the parts of the program have been combined into a single executable file, which produces the same result as the executable created from the single source file used earlier.
Compiling files independently
The most important reason for using multiple source files is that they can be compiled independently, so that only the parts of a program which have changed need to be recompiled.
If a program is stored in a single file then any change to an individual function within the program requires the whole file to be recompiled to produce a new executable. The recompilation of a large source files can be very time-consuming.
When functions are stored in separate files they can be compiled separately and then linked together. This is a two stage process. When a file is compiled without creating an executable the result is referred to as an object file, and has the extension `.o'.
To create an object file from a source file the compiler produces machine code where any references to the memory addresses of external functions and variables in other files are left undefined. This allows source files to be compiled without reference to each other. The missing addresses are filled in later by a separate program called the linker which merges all the object files together to create a single executable.
Creating object files from source files
The option -c is used to compile a source file to an object file.
For example, to compile the source file for `main.c' to an object
file, we use the following command:
$ gcc -Wall -c main.c
This produces an object file `main.o' containing the machine code
for the main function. The main function contains a reference to
the external function hello, but this is left undefined in the
object file at this stage (to be filled in later by linking).
Note that there is no need to use the option -o to specify the
output file in this case. When compiling with -c the compiler
automatically creates an object file with the extension `.o' from
the original source file.
The corresponding command for the source file `fn_hello.c' is:
$ gcc -Wall -c fn_hello.c
and produces the object file `fn_hello.o'.
There no need to compile the header file `fn_hello.h' on the
command-line since it is automatically included by the #include
statements in `main.c' and `fn_hello.c'.
Creating executables from object files
The final step in creating an executable file is to use gcc to
link the object files together and fill in the missing addresses of
external functions. To link object files together they are simply
listed on the command line:
$ gcc main.o fn_hello.o -o hello
There is no need to use the -Wall warning option here, since
the individual source files have already been successfully compiled to
object code. Once the source files have been compiled linking is an
unambiguous process, which either succeeds or fails (if there are
references which cannot be resolved).
To perform the linking process gcc actually uses the GNU Linker
ld, which is a separate program. The linker itself will be
discussed later. By running the linker gcc creates an executable
file from the object files.
The resulting executable file can now be run,
$ ./hello Hello, world!
It produces exactly the same output as the version of the program using a single source file in the previous section.
Link order of object files
On Unix systems the traditional behavior of compilers is to search for external functions from left to right in the object files specified on the command line. This means that the object file which contains the definition of a function should appear after any files which call that function.
In this case the file `fn_hello.o' containing the function hello
should be specified after `main.o' itself, since main calls
hello:
$ gcc main.o fn_hello.o -o hello (correct order)
With some compilers the opposite ordering would result in an error,
$ cc fn_hello.o main.o -o hello (incorrect order) main.o: In function `main': main.o(.text+0xf): undefined reference to `hello'
because there is no object file containing hello after
`main.o'.
Some compilers will search all object files, regardless of order, but since not all compilers do this it is best to follow the convention of ordering object files from left to right.
This is worth keeping in mind if you ever encounter unexpected problems with undefined references.
Recompiling and relinking
To show how the source files can be compiled independently we will edit
the main program `main.c' and modify it to print a
greeting to everyone instead of world,
#include "fn_hello.h"
int
main (void)
{
hello ("everyone");
return 0;
}
The updated file `main.c' can then be recompiled with the following command,
$ gcc -Wall -c main.c
This produces a new object file `main.o'. There is no need to create a new object file for `fn_hello.c' since that file has not changed.
The new object file can be relinked with the hello function to
create a new executable file,
$ gcc main.o fn_hello.o -o hello
The resulting executable `hello' now uses the new main
function to produce the following output,
$ ./hello Hello, everyone!
Note that only the file `main.c' has been recompiled, and then
relinked with the existing object file for the hello function.
If the file `fn_hello.c' had been modified instead, we could have
recompiled `fn_hello.c' to create a new object file
`fn_hello.o' and relinked this with the existing file
`main.o'.
Linking is faster than compilation, and in a large project with many source files recompiling only those that have been modified makes a significant saving.
The process of recompiling only the modified files in a project can be automated using GNU Make (see section Further reading).
Linking with external libraries
A library is a collection of precompiled object files which can be
linked into new programs. The most common use of libraries is to
provide system functions, such as the square root function sqrt
found in the C math library.
Libraries are stored in special archive files with the extension
`.a'. They are created from object files with a separate tool, the
GNU archiver ar. We will see later how to create libraries using
this command.
The standard system libraries are usually found in the directory `/usr/lib'. For example the C math library is typically stored the file `/usr/lib/libm.a'. The prototype declarations for the functions in this library are given in the header file `/usr/include/math.h'.
Here is an example of a program which makes a call to the external
function sqrt in the math library `libm.a',
#include <math.h>
#include <stdio.h>
int
main (void)
{
double x = sqrt(2.0);
printf ("The square root of 2 is %f\n", x);
return 0;
}
Trying to create an executable from this source file alone causes the compiler to give an error at the link stage,
$ gcc -Wall calc.c -o calc /tmp/ccbR6Ojm.o: In function `main': /tmp/ccbR6Ojm.o(.text+0x19): undefined reference to `sqrt'
The problem is that the reference to the sqrt function cannot be
matched to an object file without the library `libm.a'. The
function sqrt is not defined in the program, and the compiler
does not link to any external libraries, such as `libm.a', unless
they are explicitly selected. Incidentally, the file mentioned in the
error message `/tmp/ccbR60jm.o' is a temporary object file created
by the compiler from `calc.c' in order to carry out the linking
process.
To enable the compiler to link the main program in `calc.c' to the
sqrt function we need to supply the library `libm.a' on the
command line,
$ gcc -Wall calc.c /usr/lib/libm.a -o calc
The library `libm.a' contains object files for all the mathematical
functions, such as sin, cos, exp, log and
sqrt. The compiler searches through these to find the object
file containing the sqrt function.
Once the object file for the sqrt function has been found the
main program can be linked and a complete executable produced:
$ ./calc The square root of two is 1.414214
The executable file includes the machine code for the main function and
the machine code for the sqrt function, copied from the object
file in the library `libm.a'.
The compiler provides a short-cut option `-l' for linking to
libraries. The option -lm will link to the library
`libm.a'. For example, the following command,
$ gcc -Wall calc.c -lm -o calc
is equivalent to the original command above using the full library name `/usr/lib/libm.a'.
In general, the compiler option -lNAME will attempt to
link object files with a library file `libNAME.a' in the
system library directories, such as `/usr/lib' and
`/usr/local/lib'. A large program will typically use many
-l options to link libraries such as the math library, graphics
libraries and networking libraries.
Link order of libraries
The ordering of libraries on the command-line follows the
same convention as object files: they are searched from left to right.
The library containing the definition of a function should appear after
any source file or object file which uses it. This includes libraries
specified with the short-cut -l option:
$ gcc -Wall calc.c -lm -o calc (correct order)
With some compilers the opposite ordering would result in an error,
$ cc -Wall -lm calc.c -o calc (incorrect order) main.o: In function `main': main.o(.text+0xf): undefined reference to `sqrt'
because there is no library or object file containing sqrt after
`calc.c'. The option -lm should appear after the file
`calc.c'
When several libraries are being used the same convention should be followed for the libraries themselves. A library which calls an external function defined in another library should appear before the library containing the function itself.
For example, a program `data.c' using the GNU Linear Programming library `libglpk.a', which in turn uses the math library `libm.a', should be compiled as:
$ gcc -Wall data.c -lglpk -lm
since the object files in `libglpk.a' use functions defined in `libm.a'. With some compilers the opposite ordering would result in an error,
$ cc -Wall data.c -lm -lglpk (incorrect order) main.o: In function `main': main.o(.text+0xf): undefined reference to `exp'
because there is no library containing mathematical functions used by
`libglpk.a' (such as exp) after the -lglpk option.
As for object files, some compilers will search all libraries, regardless of order. However, since not all compilers do this it is best to follow the convention of ordering libraries from left to right.
Detecting missing header files
When calling an external library it is essential to include the appropriate header files, in order to declare the function arguments with the correct types. Without declarations the arguments can be passed to a function with the wrong type, causing corrupted results.
The following example shows another program which makes a function call
to the C math library. In this case the function pow is used to
compute the cube of two (2 raised to the power of 3):
#include <stdio.h>
int
main (void)
{
double x = pow (2.0, 3.0);
printf ("Two cubed is %f\n", x);
return 0;
}
However, the program contains an error -- the include statement for the
pow function #include <math.h> is missing, so the
necessary prototype double pow (double x, double y); will not be
seen by the compiler. Note that in this case the format specifier
%f is correct, since x is a floating point variable.
Compiling the program without any warning options will produce an executable file which gives incorrect results,
$ gcc badpow.c -lm $ ./a.out Two cubed is 2.851120 (incorrect result, should be 8)
The results are corrupted because the arguments and return value of the
call to pow are passed with incorrect types. This can be detected
by turning on the warning option -Wall:
$ gcc -Wall badpow.c -lm badpow.c: In function `main': badpow.c:6: warning: implicit declaration of function `pow'
The error is now detected and can be fixed by adding the line
#include <math.h> to the beginning of the source file. This
example shows again the importance of using the warning option
-Wall to detect problems that could otherwise easily be
overlooked.
Bibilographic details:
Title: "An Introduction to GCC"Author: Brian J. Gough
Published by Network Theory Ltd, May 2004
Paperback (6"x9"), 124 pages
Retail Price: $19.95 (£12.95 in UK)
ISBN: 0-9541617-9-3
Webpage: http://www.network-theory.co.uk/gcc/intro/