- publishing free software manuals

Articles > Sample of 'An Introduction to GCC'

This is the first chapter of the book "An Introduction to GCC" (ISBN: 0-9541617-9-3), published by Network Theory Ltd.

For more information, visit the webpage for the book itself.

Introduction

The purpose of this book is to explain the use of the GNU C and C++ compilers, gcc and g++. After reading this book you should understand how to compile a program, and how to use basic compiler options for optimization and debugging. This book does not attempt to teach the C or C++ language itself, since this material can be found in many other places (see section Further reading).

A brief history of GCC

The original author of the GNU C Compiler (GCC) is Richard Stallman, the founder of the GNU Project.

The GNU project was started in 1984 to create a complete Unix-like operating system as free software, in order to promote freedom and cooperation among computer users and programmers. Every Unix-like operating system needs a C compiler, and as there were no free compilers in existence at that time the GNU Project had to develop one from scratch. The work was funded by donations from individuals and companies to the Free Software Foundation, a non-profit organization set up to support the work of the GNU Project.

The first release of GCC was made in 1987. This was a significant breakthrough, being the first portable ANSI C optimizing compiler released as free software. Since that time GCC has become the foundation of all free software.

A major revision of the compiler came with the 2.0 series in 1992, which added the ability to compile C++. In 1997 an experimental branch of the compiler (EGCS) was created to improve optimization and C++ support. These features were integrated back into the main-line of GCC development and became widely available in the 3.0 release of GCC in 2001.

Today GCC has been extended to support many additional languages, including Fortran, ADA and Java. The acronym GCC is now used to refer to the "GNU Compiler Collection". Its development is guided by the GCC Steering Committee, a group composed of representatives from GCC user communities in industry, research and academia.

Major features of GCC

This section describes some of the most important features of GCC.

First of all, GCC is a portable compiler -- it runs on most platforms available today and can produce output for many types of chips, from 8-bit microcontrollers through to processors used by supercomputers.

GCC is not only a compiler -- it can also cross-compile any program, producing executable files for a different system from the one where the compiler is running. This is particularly useful for embedded systems which are not capable of running a compiler. GCC is written in C and can compile itself, so it can be ported to new systems in the future.

GCC has multiple language front-ends, for parsing different languages. Programs in each language can be compiled, or cross-compiled, for any architecture. For example an ADA program can be compiled for a microcontroller, or a C program for a supercomputer.

GCC has a modular design, allowing support for new languages and architectures to be added easily. Adding a new language front-end to GCC enables the use of that language immediately on any architecture. Similarly, adding support for a new architecture makes it immediately available to all languages.

Finally and most importantly GCC is free software. This means you have the freedom to use and to modify GCC, as with all GNU software. If you need support for a new type of CPU, a new language, or new language feature you can hire someone to enhance GCC for you. You can hire someone to fix a bug if it is important for your work.

Furthermore you have the freedom to share any enhancements you make to GCC. As a result of this freedom you can also make use of enhancements to GCC developed by others. The many features offered by GCC today show how this freedom to cooperate works to benefit you and everyone else who uses GCC.

Programming in C and C++

C and C++ are languages which allow direct access to the computer's memory. As such they are important for writing low-level systems software, and applications where high-performance or control over resource usage are critical. However, great care is required to ensure that memory is accessed correctly, to avoid corrupting other data-structures. This book describes techniques to help detect potential errors when a program is compiled, but the risk in using C or C++ can never be eliminated.

In addition to C and C++ the GNU operating system also provides other high-level languages such as GNU Common Lisp (gcl), GNU Smalltalk (gst), the GNU Scheme extension language (guile) and the GNU Compiler for Java (gcj) -- these languages do not require the user to access memory directly, eliminating the possibility of memory access errors. They provide a safer alternative to C and C++ for many applications.

Conventions used in this manual

This manual contains many examples which can be typed at the keyboard. A command entered at the terminal is shown like this,

$ command

For example,

$ echo hello world
hello world

The first character on the line is the terminal prompt, and should not be typed. The dollar sign `$' is used as the standard prompt in this manual, although some systems may use a different character.

When a command in an example is too long to fit in a single line it is wrapped and then indented on subsequent lines, like this:

$ echo "an example of a line which is too long to fit 
    in this manual"

When entered at the keyboard the entire command should be typed on a single line.

The example source files used in this manual can be downloaded from the publisher's website, or entered by hand using the standard GNU editor emacs. The example commands use gcc and g++ as the names of the GNU C and C++ compilers, and cc to refer to other compilers. The example programs should work with any version of GCC. Any command-line options which are only available in recent versions of GCC are noted in the text.

The examples assume the use of the GNU operating system. There may be minor differences in the output on other systems. The commands for setting environment variables use the Bourne shell syntax of the standard GNU shell (bash).

Compiling a C program

This chapter describes how to compile C programs using gcc. Programs can be compiled from a single file or from multiple files, using system libraries and header files.

Compilation refers to the process of converting a program from the textual source code in a programming language such as C or C++ into machine code, the sequence of 1's and 0's used to control the central processing unit (CPU) of the computer. This machine code is stored in a file known as an executable file, sometimes referred to as a binary file.

Compiling a simple C program

The classic example program for the C language is Hello World. Here is the source code for our version of the program:

#include <stdio.h>

int
main (void)
{
  printf ("Hello, world!\n");
  return 0;
}

We'll assume the source code is stored in a file called `hello.c'. To compile the file `hello.c' with gcc, use the following command:

$ gcc -Wall hello.c -o hello

This compiles the source code in `hello.c' to machine code and stores it in an executable file `hello'. The output file for the machine code is specified using the -o option. This option is usually given as the last argument on the command-line. If it is omitted, the output is written to a default file called `a.out'.

Note that if a file with the same name as the executable file already exists in the directory it will be overwritten.

The option -Wall turns on the most commonly-used compiler warnings -- it is recommended that you always use this option. Without it the compiler will not produce any warning messages. There are other -W warning options for special cases which will be discussed in later chapters, but -Wall is the most important. Compiler warnings are an essential aid in detecting possible problems when programming in C and C++

In this case the compiler does not produce any warnings with the -Wall option, since the program is completely valid. Source code which does not produce any warnings is said to compile cleanly.

To run the program, type the path name of the executable:

$ ./hello
Hello, world!

This loads the executable file into memory and causes the CPU to begin executing the instructions contained within it. The path ./ refers to the current directory, so ./hello loads and runs the executable file `hello' located in the current directory.

Finding errors in a simple program

As mentioned above, compiler warnings are an essential aid when programming in C and C++. To demonstrate this the program below contains a subtle error: it uses the function printf incorrectly by specifying a floating-point format %f for an integer value:

#include <stdio.h>

int
main (void)
{
  printf ("Two plus two is %f\n", 4);
  return 0;
}

This error is not obvious at first sight but can be detected by the compiler. However, in order for it to be reported the warning option -Wall must be turned on.

If the program above is stored in a file `bad.c' and compiled with the warning option -Wall the compiler produces the following message:

$ gcc -Wall bad.c -o bad
bad.c: In function `main':
bad.c:6: warning: double format, different 
  type arg (arg 2)

The text of the warning indicates that a format string has been used incorrectly in the file `bad.c' at line 6. The messages produced by gcc always have the form file:line-number:message. The compiler distinguishes between error messages, which prevent the program being compiled, and warning messages which indicate likely problems (but do not stop the program from compiling).

In this case the correct format specifier for displaying integers with printf would be %d (the format specifiers for printf can be found in any general book on C, such as the GNU C Library Reference Manual, see section Further reading).

Without the warning option -Wall the program appears to compile without problems, but produces corrupted results:

$ gcc bad.c -o bad
$ ./bad
Two plus two is 2.585495    (corrupted output)

The incorrect format specifier causes the output to be corrupted, because the function printf is passed an integer instead of a floating-point number. Integers and floating-point numbers are stored in different formats in memory, and occupy different numbers of bytes, leading to a spurious result.

Clearly it is very dangerous to compile a program without checking for compiler warnings. If there are any functions which are not used correctly they can cause the program to crash or to produce incorrect results. However, turning on the compiler warning option -Wall will catch many of the common errors which occur in C programming.

Compiling multiple source files

A large program can be split up into multiple files. This makes it easier to edit and understand, and allows the files to be compiled independently.

Here we have split up the program Hello World into three files. `main.c', `fn_hello.c' and the header file `fn_hello.h'. First we will look at the main program `main.c',

#include "fn_hello.h"

int
main (void)
{
  hello ("world");
  return 0;
}

The original call to the system printf function in the previous program `hello.c' has been replaced by a call to a new external function hello, which we will define in a separate file `fn_hello.c'.

The main program also includes the header file `fn_hello.h' which will contain the declaration of the function hello. The declaration is used to ensure that the types of the arguments and return value match up correctly between the function call and the function definition. We no longer need to include the system header file `stdio.h' in `main.c' to declare the function printf since the file `main.c' does not use printf directly.

The declaration in `fn_hello.h' is a single line specifying the prototype of the function hello,

void hello (const char * name);

The definition of the function hello itself is contained in the file `fn_hello.c',

#include <stdio.h>
#include "fn_hello.h"

void 
hello (const char * name)
{
  printf ("Hello, %s!\n", name);
}

This function prints the message Hello name using the function argument as the value of name.

Incidentally, the difference between the two forms of the include statement #include "FILE.h" and #include <FILE.h> is that the former searches for `FILE.h' in the current directory before looking in the system header file directories. The include statement #include <FILE.h> searches the system header files only.

To compile these files together with gcc, use the following command:

$ gcc -Wall main.c fn_hello.c -o newhello

In this case we use the -o option to specify a different output file for the executable, `newhello'.

Note that the header file `fn_hello.h' does not need to be specified in the list of files on the command line. The directive #include "fn_hello.h" in the source files instructs the compiler to include it automatically.

To run the program, type the path name of the executable:

$ ./newhello
Hello, world!

All the parts of the program have been combined into a single executable file, which produces the same result as the executable created from the single source file used earlier.

Compiling files independently

The most important reason for using multiple source files is that they can be compiled independently, so that only the parts of a program which have changed need to be recompiled.

If a program is stored in a single file then any change to an individual function within the program requires the whole file to be recompiled to produce a new executable. The recompilation of a large source files can be very time-consuming.

When functions are stored in separate files they can be compiled separately and then linked together. This is a two stage process. When a file is compiled without creating an executable the result is referred to as an object file, and has the extension `.o'.

To create an object file from a source file the compiler produces machine code where any references to the memory addresses of external functions and variables in other files are left undefined. This allows source files to be compiled without reference to each other. The missing addresses are filled in later by a separate program called the linker which merges all the object files together to create a single executable.

Creating object files from source files

The option -c is used to compile a source file to an object file. For example, to compile the source file for `main.c' to an object file, we use the following command:

$ gcc -Wall -c main.c

This produces an object file `main.o' containing the machine code for the main function. The main function contains a reference to the external function hello, but this is left undefined in the object file at this stage (to be filled in later by linking).

Note that there is no need to use the option -o to specify the output file in this case. When compiling with -c the compiler automatically creates an object file with the extension `.o' from the original source file.

The corresponding command for the source file `fn_hello.c' is:

$ gcc -Wall -c fn_hello.c

and produces the object file `fn_hello.o'.

There no need to compile the header file `fn_hello.h' on the command-line since it is automatically included by the #include statements in `main.c' and `fn_hello.c'.

Creating executables from object files

The final step in creating an executable file is to use gcc to link the object files together and fill in the missing addresses of external functions. To link object files together they are simply listed on the command line:

$ gcc main.o fn_hello.o -o hello

There is no need to use the -Wall warning option here, since the individual source files have already been successfully compiled to object code. Once the source files have been compiled linking is an unambiguous process, which either succeeds or fails (if there are references which cannot be resolved).

To perform the linking process gcc actually uses the GNU Linker ld, which is a separate program. The linker itself will be discussed later. By running the linker gcc creates an executable file from the object files.

The resulting executable file can now be run,

$ ./hello
Hello, world!

It produces exactly the same output as the version of the program using a single source file in the previous section.

Link order of object files

On Unix systems the traditional behavior of compilers is to search for external functions from left to right in the object files specified on the command line. This means that the object file which contains the definition of a function should appear after any files which call that function.

In this case the file `fn_hello.o' containing the function hello should be specified after `main.o' itself, since main calls hello:

$ gcc main.o fn_hello.o -o hello   (correct order)

With some compilers the opposite ordering would result in an error,

$ cc fn_hello.o main.o -o hello    (incorrect order)
main.o: In function `main':
main.o(.text+0xf): undefined reference to `hello'

because there is no object file containing hello after `main.o'.

Some compilers will search all object files, regardless of order, but since not all compilers do this it is best to follow the convention of ordering object files from left to right.

This is worth keeping in mind if you ever encounter unexpected problems with undefined references.

Recompiling and relinking

To show how the source files can be compiled independently we will edit the main program `main.c' and modify it to print a greeting to everyone instead of world,

#include "fn_hello.h"

int
main (void)
{
  hello ("everyone");
  return 0;
}

The updated file `main.c' can then be recompiled with the following command,

$ gcc -Wall -c main.c

This produces a new object file `main.o'. There is no need to create a new object file for `fn_hello.c' since that file has not changed.

The new object file can be relinked with the hello function to create a new executable file,

$ gcc main.o fn_hello.o -o hello

The resulting executable `hello' now uses the new main function to produce the following output,

$ ./hello
Hello, everyone!

Note that only the file `main.c' has been recompiled, and then relinked with the existing object file for the hello function. If the file `fn_hello.c' had been modified instead, we could have recompiled `fn_hello.c' to create a new object file `fn_hello.o' and relinked this with the existing file `main.o'.

Linking is faster than compilation, and in a large project with many source files recompiling only those that have been modified makes a significant saving.

The process of recompiling only the modified files in a project can be automated using GNU Make (see section Further reading).

Linking with external libraries

A library is a collection of precompiled object files which can be linked into new programs. The most common use of libraries is to provide system functions, such as the square root function sqrt found in the C math library.

Libraries are stored in special archive files with the extension `.a'. They are created from object files with a separate tool, the GNU archiver ar. We will see later how to create libraries using this command.

The standard system libraries are usually found in the directory `/usr/lib'. For example the C math library is typically stored the file `/usr/lib/libm.a'. The prototype declarations for the functions in this library are given in the header file `/usr/include/math.h'.

Here is an example of a program which makes a call to the external function sqrt in the math library `libm.a',

#include <math.h>
#include <stdio.h>

int
main (void)
{
  double x = sqrt(2.0);
  printf ("The square root of 2 is %f\n", x);
  return 0;
}

Trying to create an executable from this source file alone causes the compiler to give an error at the link stage,

$ gcc -Wall calc.c -o calc
/tmp/ccbR6Ojm.o: In function `main':
/tmp/ccbR6Ojm.o(.text+0x19): undefined reference 
  to `sqrt'

The problem is that the reference to the sqrt function cannot be matched to an object file without the library `libm.a'. The function sqrt is not defined in the program, and the compiler does not link to any external libraries, such as `libm.a', unless they are explicitly selected. Incidentally, the file mentioned in the error message `/tmp/ccbR60jm.o' is a temporary object file created by the compiler from `calc.c' in order to carry out the linking process.

To enable the compiler to link the main program in `calc.c' to the sqrt function we need to supply the library `libm.a' on the command line,

$ gcc -Wall calc.c /usr/lib/libm.a -o calc

The library `libm.a' contains object files for all the mathematical functions, such as sin, cos, exp, log and sqrt. The compiler searches through these to find the object file containing the sqrt function.

Once the object file for the sqrt function has been found the main program can be linked and a complete executable produced:

$ ./calc 
The square root of two is 1.414214

The executable file includes the machine code for the main function and the machine code for the sqrt function, copied from the object file in the library `libm.a'.

The compiler provides a short-cut option `-l' for linking to libraries. The option -lm will link to the library `libm.a'. For example, the following command,

$ gcc -Wall calc.c -lm -o calc

is equivalent to the original command above using the full library name `/usr/lib/libm.a'.

In general, the compiler option -lNAME will attempt to link object files with a library file `libNAME.a' in the system library directories, such as `/usr/lib' and `/usr/local/lib'. A large program will typically use many -l options to link libraries such as the math library, graphics libraries and networking libraries.

Link order of libraries

The ordering of libraries on the command-line follows the same convention as object files: they are searched from left to right. The library containing the definition of a function should appear after any source file or object file which uses it. This includes libraries specified with the short-cut -l option:

$ gcc -Wall calc.c -lm -o calc   (correct order)

With some compilers the opposite ordering would result in an error,

$ cc -Wall -lm calc.c -o calc    (incorrect order)
main.o: In function `main':
main.o(.text+0xf): undefined reference to `sqrt'

because there is no library or object file containing sqrt after `calc.c'. The option -lm should appear after the file `calc.c'

When several libraries are being used the same convention should be followed for the libraries themselves. A library which calls an external function defined in another library should appear before the library containing the function itself.

For example, a program `data.c' using the GNU Linear Programming library `libglpk.a', which in turn uses the math library `libm.a', should be compiled as:

$ gcc -Wall data.c -lglpk -lm

since the object files in `libglpk.a' use functions defined in `libm.a'. With some compilers the opposite ordering would result in an error,

$ cc -Wall data.c -lm -lglpk    (incorrect order)
main.o: In function `main':
main.o(.text+0xf): undefined reference to `exp'

because there is no library containing mathematical functions used by `libglpk.a' (such as exp) after the -lglpk option.

As for object files, some compilers will search all libraries, regardless of order. However, since not all compilers do this it is best to follow the convention of ordering libraries from left to right.

Detecting missing header files

When calling an external library it is essential to include the appropriate header files, in order to declare the function arguments with the correct types. Without declarations the arguments can be passed to a function with the wrong type, causing corrupted results.

The following example shows another program which makes a function call to the C math library. In this case the function pow is used to compute the cube of two (2 raised to the power of 3):

#include <stdio.h>

int
main (void)
{
  double x = pow (2.0, 3.0);
  printf ("Two cubed is %f\n", x);
  return 0;
}

However, the program contains an error -- the include statement for the pow function #include <math.h> is missing, so the necessary prototype double pow (double x, double y); will not be seen by the compiler. Note that in this case the format specifier %f is correct, since x is a floating point variable.

Compiling the program without any warning options will produce an executable file which gives incorrect results,

$ gcc badpow.c -lm
$ ./a.out
Two cubed is 2.851120    (incorrect result, should be 8)

The results are corrupted because the arguments and return value of the call to pow are passed with incorrect types. This can be detected by turning on the warning option -Wall:

$ gcc -Wall badpow.c -lm
badpow.c: In function `main':
badpow.c:6: warning: implicit declaration of 
  function `pow'

The error is now detected and can be fixed by adding the line #include <math.h> to the beginning of the source file. This example shows again the importance of using the warning option -Wall to detect problems that could otherwise easily be overlooked.

Bibilographic details:

Title: "An Introduction to GCC"
Author: Brian J. Gough
Published by Network Theory Ltd, May 2004
Paperback (6"x9"), 124 pages
Retail Price: $19.95 (£12.95 in UK)
ISBN: 0-9541617-9-3
Webpage: http://www.network-theory.co.uk/gcc/intro/