- publishing free software manuals
Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications
by J. Seward, N. Nethercote, J. Weidendorfer and the Valgrind Development Team
Paperback (6"x9"), 164 pages
ISBN 0954612051
RRP £12.95 ($19.95)

Get a printed copy>>>

9.5 Hints and Tips for Effective Use of Helgrind

Helgrind can be very helpful in finding and resolving threading-related problems. Like all sophisticated tools, it is most effective when you understand how to play to its strengths.

Helgrind will be less effective when you merely throw an existing threaded program at it and try to make sense of any reported errors. It will be more effective if you design threaded programs from the start in a way that helps Helgrind verify correctness. The same is true for finding memory errors with Memcheck, but applies more here, because thread checking is a harder problem. Consequently it is much easier to write a correct program for which Helgrind falsely reports (threading) errors than it is to write a correct program for which Memcheck falsely reports (memory) errors.

With that in mind, here are some tips, listed most important first, for getting reliable results and avoiding false errors. The first two are critical. Any violations of them will swamp you with huge numbers of false data-race errors.

  1. Make sure your application, and all the libraries it uses, use the POSIX threading primitives. Helgrind needs to be able to see all events pertaining to thread creation, exit, locking and other synchronisation events. To do so it intercepts many POSIX pthread_ functions. Do not roll your own threading primitives (mutexes, etc) from combinations of the Linux futex syscall, counters and wotnot. These throw Helgrind's internal what's-going-on models way off course and will give bogus results. Also, do not reimplement existing POSIX abstractions using other POSIX abstractions. For example, don't build your own semaphore routines or reader-writer locks from POSIX mutexes and condition variables. Instead use POSIX reader-writer locks and semaphores directly, since Helgrind supports them directly. Helgrind directly supports the following POSIX threading abstractions: mutexes, reader-writer locks, condition variables (but see below), and semaphores. Currently spinlocks and barriers are not supported, although they could be in future. A prototype “safe” implementation of barriers, based on semaphores, is available: please contact the Valgrind authors for details. At the time of writing, the following popular Linux packages are known to implement their own threading primitives:
    • Qt version 4.X. Qt 3.X is fine, but not 4.X. Helgrind contains partial direct support for Qt 4.X threading, but this is not yet in a usable state. Assistance from folks knowledgeable in Qt 4 threading internals would be appreciated.
    • Runtime support library for GNU OpenMP (part of GCC), at least GCC versions 4.2 and 4.3. With some minor effort of modifying the GNU OpenMP runtime support sources, it is possible to use Helgrind on GNU OpenMP compiled codes. Please contact the Valgrind authors for details.
  2. Avoid memory recycling. If you can't avoid it, you must use tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY client request (in ‘helgrind.h’). Helgrind is aware of standard memory allocation and deallocation that occurs via malloc/free/new/delete and from entry and exit of stack frames. In particular, when memory is deallocated via free, delete, or function exit, Helgrind considers that memory clean, so when it is eventually reallocated, its history is irrelevant. However, it is common practice to implement memory recycling schemes. In these, memory to be freed is not handed to malloc/delete, but instead put into a pool of free buffers to be handed out again as required. The problem is that Helgrind has no way to know that such memory is logically no longer in use, and its history is irrelevant. Hence you must make that explicit, using the VALGRIND_HG_CLEAN_MEMORY client request to specify the relevant address ranges. It's easiest to put these requests into the pool manager code, and use them either when memory is returned to the pool, or is allocated from it.
  3. Avoid POSIX condition variables. If you can, use POSIX semaphores (sem_t, sem_post, sem_wait) to do inter-thread event signalling. Semaphores with an initial value of zero are particularly useful for this. Helgrind only partially correctly handles POSIX condition variables. This is because Helgrind can see inter-thread dependencies between a pthread_cond_wait call and a pthread_cond_signal/broadcast call only if the waiting thread actually gets to the rendezvous first (so that it actually calls pthread_cond_wait). It can't see dependencies between the threads if the signaller arrives first. In the latter case, POSIX guidelines imply that the associated boolean condition still provides an inter-thread synchronisation event, but one which is invisible to Helgrind. The result of Helgrind missing some inter-thread synchronisation events is to cause it to report false positives. That's because missing such events reduces the extent to which it can transfer exclusive memory ownership between threads. So memory may end up in a shared-modified state when that was not intended by the application programmers. The root cause of this synchronisation lossage is particularly hard to understand, so an example is helpful. It was discussed at length by Arndt Muehlenfeld (Runtime Race Detection in Multi-Threaded Programs, Dissertation, TU Graz, Austria). The canonical POSIX-recommended usage scheme for condition variables is as follows:
    b   is a Boolean condition (False most of the time)
    cv  is a condition variable
    mx  is its associated mutex
    
    Signaller:                     Waiter:
    
    lock(mx)                       lock(mx)
    b = True                       while (b == False)
    signal(cv)                        wait(cv,mx)
    unlock(mx)                     unlock(mx)
    
    Assume ‘b’ is False most of the time. If the waiter arrives at the rendezvous first, it enters its while-loop, waits for the signaller to signal, and eventually proceeds. Helgrind sees the signal, notes the dependency, and all is well. If the signaller arrives first, ‘b’ is set to true, and the signal disappears into nowhere. When the waiter later arrives, it does not enter its while-loop and simply carries on. But even in this case, the waiter code following the while-loop cannot execute until the signaller sets ‘b’ to True. Hence there is still the same inter-thread dependency, but this time it is through an arbitrary in-memory condition, and Helgrind cannot see it. By comparison, Helgrind's detection of inter-thread dependencies caused by semaphore operations is believed to be exactly correct. As far as I know, a solution to this problem that does not require source-level annotation of condition-variable wait loops is beyond the current state of the art.
  4. Make sure you are using a supported Linux distribution. At present, Helgrind only properly supports x86-linux and amd64-linux with glibc-2.3 or later. The latter restriction means we only support glibc's NPTL threading implementation. The old LinuxThreads implementation is not supported. Unsupported targets may work to varying degrees. In particular ppc32-linux and ppc64-linux running NTPL should work, but you will get false race errors because Helgrind does not know how to properly handle atomic instruction sequences created using the lwarx/stwcx instructions.
  5. Round up all finished threads using pthread_join. Avoid detaching threads: don't create threads in the detached state, and don't call pthread_detach on existing threads. Using pthread_join to round up finished threads provides a clear synchronisation point that both Helgrind and programmers can see. This synchronisation point allows Helgrind to adjust its memory ownership models as described extensively above (see 9.4.3), which helps Helgrind produce more accurate error reports. If you don't call pthread_join on a thread, Helgrind has no way to know when it finishes, relative to any significant synchronisation points for other threads in the program. So it assumes that the thread lingers indefinitely and can potentially interfere indefinitely with the memory state of the program. It has every right to assume that--after all, it might really be the case that, for scheduling reasons, the exiting thread did run very slowly in the last stages of its life.
  6. Perform thread debugging (with Helgrind) and memory debugging (with Memcheck) together. Helgrind tracks the state of memory in detail, and memory management bugs in the application are liable to cause confusion. In extreme cases, applications which do many invalid reads and writes (particularly to freed memory) have been known to crash Helgrind. So, ideally, you should make your application Memcheck-clean before using Helgrind. It may be impossible to make your application Memcheck-clean unless you first remove threading bugs. In particular, it may be difficult to remove all reads and writes to freed memory in multithreaded C++ destructor sequences at program termination. So, ideally, you should make your application Helgrind-clean before using Memcheck. Since this circularity is obviously unresolvable, at least bear in mind that Memcheck and Helgrind are to some extent complementary, and you may need to use them together.
  7. POSIX requires that implementations of standard I/O (printf, fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU libc implements this by using internal locking primitives that Helgrind is unable to intercept. Consequently Helgrind generates many false race reports when you use these functions. Helgrind attempts to hide these errors using the standard Valgrind error-suppression mechanism. So, at least for simple test cases, you don't see any. Nevertheless, some may slip through. Just something to be aware of.
  8. Helgrind's error checks do not work properly inside the system threading library itself (‘libpthread.so’), and it usually observes large numbers of (false) errors in there. Valgrind's suppression system then filters these out, so you should not see them. If you see any race errors reported where ‘libpthread.so’ or ‘ld.so’ is the object associated with the innermost stack frame, please file a bug report at http://www.valgrind.org.
ISBN 0954612051Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applicationsSee the print edition