- publishing free software manuals
An Introduction to GCC - for the GNU compilers gcc and g++
by Brian J. Gough, foreword by Richard M. Stallman
Paperback (6"x9"), 144 pages
ISBN 0954161793
RRP £12.95 ($19.95)

"A wonderfully thorough guide... well-written, seriously usable information" --- Linux User and Developer Magazine (Issue 40, June 2004) Get a printed copy>>>

8.7 Portability of signed and unsigned types

The C and C++ standards allows the character type char to be signed or unsigned, depending on the platform and compiler. Most systems, including x86 GNU/Linux and Microsoft Windows, use signed char, but those based on PowerPC and ARM processors typically use unsigned char.(29) This can lead to unexpected results when porting programs between platforms which have different defaults for the type of char.

The following code demonstrates the difference between platforms with signed and unsigned char types:

#include <stdio.h>

int 
main (void)
{
  char c = 255;
  if (c > 128) {
    printf ("char is unsigned (c = %d)\n", c);
  } else {
    printf ("char is signed (c = %d)\n", c);
  }
  return 0;
}

With an unsigned char, the variable c takes the value 255, but with a signed char it becomes -1.

The correct way to manipulate char variables in C is through the portable functions declared in ‘ctype.h’, such as isalpha, isdigit and isblank, rather than by their numerical values. The behavior of non-portable conditional expressions such as c > 'a' depends on the signedness of the char type. If the signed or unsigned version of char is explicitly required at certain points in a program, it can be specified using the declarations signed char or unsigned char.

For existing programs which assume that char is signed or unsigned, GCC provides the options -fsigned-char and -funsigned-char to set the default type of char. Using these options, the example code above compiles cleanly when char is unsigned:

$ gcc -Wall -funsigned-char signed.c 
$ ./a.out 
char is unsigned (c = 255)

However, when char is signed the value 255 wraps around to -1, giving a warning:

$ gcc -Wall -fsigned-char signed.c 
signed.c: In function `main':
signed.c:7: warning: comparison is always false due to 
  limited range of data type
$ ./a.out 
char is signed (c = -1)

The warning message "comparison is always true/false due to limited range of data type" is one symptom of code which assumes a definition of char which is different from the actual type.

The most common problem with code written assuming signed char types occurs with the functions getc, fgetc and getchar (which read a character from a file). They have a return type of int, not char, and this allows them to use the special value -1 (defined as EOF) to indicate an end-of-file error. Unfortunately, many programs have been written which incorrectly store this return value straight into a char variable. Here is a typical example:

#include <stdio.h>

int
main (void)
{
  char c;
  while ((c = getchar()) != EOF) /* not portable */
    {
      printf ("read c = '%c'\n", c);
    }
  return 0;
}

This only works on platforms which default to a signed char type.(30) On platforms which use an unsigned char the same code will fail, because the value -1 becomes 255 when stored in an unsigned char. This usually causes an infinite loop because the end of the file cannot be recognized.(31) To be portable, the program should test the return value as an integer before coercing it to a char, as follows:

#include <stdio.h>

int
main (void)
{
  int i;
  while ((i = getchar()) != EOF)
    {
      unsigned char c = i;
      printf ("read c = '%c'\n", c);
    }
  return 0;
}

The same considerations described in this section apply to the definitions of bitfields in structs, which can be signed or unsigned by default. In GCC, the default type of bitfields can be controlled using the options -fsigned-bitfields and -funsigned-bitfields.

ISBN 0954161793An Introduction to GCC - for the GNU compilers gcc and g++See the print edition