- publishing free software manuals
Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications
by J. Seward, N. Nethercote, J. Weidendorfer and the Valgrind Development Team
Paperback (6"x9"), 164 pages
ISBN 0954612051
RRP £12.95 ($19.95)

Get a printed copy>>>

9.4.1 A Simple Data Race

About the simplest possible example of a race is as follows. In this program, it is impossible to know what the value of ‘var’ is at the end of the program. Is it 2 ? Or 1 ?

#include <pthread.h>

int var = 0;

void* child_fn ( void* arg ) {
   var++; /* Unprotected relative to parent : line 6 */
   return NULL;
}

int main ( void ) {
   pthread_t child;
   pthread_create(&child, NULL, child_fn, NULL);
   var++; /* Unprotected relative to child : line 13 */
   pthread_join(child, NULL);
   return 0;
}

The problem is there is nothing to stop ‘var’ being updated simultaneously by both threads. A correct program would protect ‘var’ with a lock of type ‘pthread_mutex_t’, which is acquired before each access and released afterwards. Helgrind's output for this program is:

Thread #1 is the program's root thread

Thread #2 was created
   at 0x510548E: clone (in /lib64/libc-2.5.so)
   by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
   by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5
   (in /lib64/libpthread-2.5.so)
   by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
   by 0x4005F1: main (simple_race.c:12)

Possible data race during write of size 4 at 0x601034
   at 0x4005F2: main (simple_race.c:13)
  Old state: shared-readonly by threads #1, #2
  New state: shared-modified by threads #1, #2
  Reason:    this thread, #1, holds no consistent locks
  Location 0x601034 has never been protected by any lock

This is quite a lot of detail for an apparently simple error. The last clause is the main error message. It says there is a race as a result of a write of size 4 (bytes), at 0x601034, which is presumably the address of ‘var’, happening in function ‘main’ at line 13 in the program.

Note that it is purely by chance that the race is reported for the parent thread's access. It could equally have been reported instead for the child's access, at line 6. The error will only be reported for one of the locations, since neither the parent nor child is, by itself, incorrect. It is only when both access ‘var’ without a lock that an error exists.

The error message shows some other interesting details. The sections below explain them. Here we merely note their presence:

Understanding the memory state machine is central to understanding Helgrind's race-detection algorithm. The next three subsections explain this.

ISBN 0954612051Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applicationsSee the print edition