|Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications|
by J. Seward, N. Nethercote, J. Weidendorfer and the Valgrind Development Team
Paperback (6"x9"), 164 pages
RRP £12.95 ($19.95)
9.4.6 Interpreting Race Error Messages
Helgrind's race detection algorithm collects a lot of information, and tries to present it in a helpful way when a race is detected. Here's an example:
Thread #2 was created at 0x510548E: clone (in /lib64/libc-2.5.so) by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so) by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so) by 0x4C23870: pthread_create@* (hg_intercepts.c:198) by 0x400CEF: main (tc17_sembar.c:195) // And the same for threads #3, #4 and #5 Possible data race during read of size 4 at 0x602174 at 0x400BE5: gomp_barrier_wait (tc17_sembar.c:122) by 0x400C44: child (tc17_sembar.c:161) by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178) by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so) by 0x51054CC: clone (in /lib64/libc-2.5.so) Old state: shared-modified by threads #2, #3, #4, #5 New state: shared-modified by threads #2, #3, #4, #5 Reason: this thread, #2, holds no consistent locks Last consistently used lock for 0x602174 was first observed at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326) by 0x4009E4: gomp_barrier_init (tc17_sembar.c:46) by 0x400CBC: main (tc17_sembar.c:192)
Helgrind first announces the creation points of any threads referenced in the error message. This is so it can speak concisely about threads and sets of threads without repeatedly printing their creation point call stacks. Each thread is only ever announced once, the first time it appears in any Helgrind error message.
The main error message begins at the text "‘Possible data race during read’". At the start is information you would expect to see--address and size of the racing access, whether a read or a write, and the call stack at the point it was detected.
More interesting is the state transition caused by this access. This memory is already in the shared-modified state, and up to now has been consistently protected by at least one lock. However, the thread making the access in question (thread #2, here) does not hold any locks in common with those held during all previous accesses to the location---“no consistent locks”, in other words.
Finally, Helgrind shows the lock which has protected this location in all previous accesses. (If there is more than one, only one is shown). This can be a useful hint, because it typically shows the lock that the programmers intended to use to protect the location, but in this case forgot.
Here are some more examples of race reports. This not an exhaustive list of combinations, but should give you some insight into how to interpret the output.
Possible data race during write ... Old state: shared-readonly by threads #1, #2, #3 New state: shared-modified by threads #1, #2, #3 Reason: this thread, #3, holds no consistent locks Location ... has never been protected by any lock
The location is shared by 3 threads, all of which have been reading it without locking (“has never been protected by any lock”). Now one of them is writing it. Regardless of whether the writer has a lock or not, this is still an error, because the write races against the previously observed reads.
Possible data race during read ... Old state: shared-modified by threads #1, #2, #3 New state: shared-modified by threads #1, #2, #3 Reason: this thread, #3, holds no consistent locks Last consistently used lock for ... was first observed ...
The location is shared by 3 threads, all of which have been reading and writing it while (as required) holding at least one lock in common. Now it is being read without that lock being held. In the “Last consistently used lock” part, Helgrind offers its best guess as to the identity of the lock that should have been used.
Possible data race during write ... Old state: owned exclusively by thread #4 New state: shared-modified by threads #4, #5 Reason: this thread, #5, holds no locks at all
A location that has so far been accessed exclusively by thread #4 has now been written by thread #5, without use of any lock. This can be a sign that the programmer did not consider the possibility of the location being shared between threads, or, alternatively, forgot to use the appropriate lock.
Note that thread #4 exclusively owns the location, and so has the right to access it without holding a lock. However, this message does not say that thread #4 is not using a lock for this location. Indeed, it could be using a lock for the location because it intends to make it available to other threads, one of which is thread #5--and thread #5 has forgotten to use the lock.
Also, this message implies that Helgrind did not see any synchronisation event between threads #4 and #5 that would have allowed #5 to acquire exclusive ownership from #4. See 9.4.3 for a discussion of transfers of exclusive ownership states between threads.
|ISBN 0954612051||Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications||See the print edition|