|Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications|
by J. Seward, N. Nethercote, J. Weidendorfer and the Valgrind Development Team
Paperback (6"x9"), 164 pages
RRP £12.95 ($19.95)
9.3 Detected errors: Inconsistent Lock Orderings
In this section, and in general, to “acquire” a lock simply means to lock that lock, and to “release” a lock means to unlock it.
Helgrind monitors the order in which threads acquire locks. This allows it to detect potential deadlocks which could arise from the formation of cycles of locks. Detecting such inconsistencies is useful because, whilst actual deadlocks are fairly obvious, potential deadlocks may never be discovered during testing and could later lead to hard-to-diagnose in-service failures.
The simplest example of such a problem is as follows.
- Imagine some shared resource R, which, for whatever reason, is guarded by two locks, L1 and L2, which must both be held when R is accessed.
- Suppose a thread acquires L1, then L2, and proceeds to access R. The implication of this is that all threads in the program must acquire the two locks in the order first L1 then L2. Not doing so risks deadlock.
- The deadlock could happen if two threads--call them T1 and T2--both want to access R. Suppose T1 acquires L1 first, and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries to acquire L1, but those locks are both already held. So T1 and T2 become deadlocked.
Helgrind builds a directed graph indicating the order in which locks have been acquired in the past. When a thread acquires a new lock, the graph is updated, and then checked to see if it now contains a cycle. The presence of a cycle indicates a potential deadlock involving the locks in the cycle.
In simple situations, where the cycle only contains two locks, Helgrind will show where the required order was established:
Thread #1: lock order "0x7FEFFFAB0 before 0x7FEFFFA80" violated at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) by 0x40081F: main (tc13_laog1.c:24) Required order was established by acquisition of lock at 0x7FEFFFAB0 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) by 0x400748: main (tc13_laog1.c:17) followed by a later acquisition of lock at 0x7FEFFFA80 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) by 0x400773: main (tc13_laog1.c:18)
When there are more than two locks in the cycle, the error is equally serious. However, at present Helgrind does not show the locks involved, so as to avoid flooding you with information. That could be fixed in future. For example, here is a an example involving a cycle of five locks from a naive implementation the famous Dining Philosophers problem (see ‘helgrind/tests/tc14_laog_dinphils.c’). In this case Helgrind has detected that all 5 philosophers could simultaneously pick up their left fork and then deadlock whilst waiting to pick up their right forks.
Thread #6: lock order "0x6010C0 before 0x601160" violated at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) by 0x4007C0: dine (tc14_laog_dinphils.c:19) by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178) by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so) by 0x51054CC: clone (in /lib64/libc-2.5.so)
|ISBN 0954612051||Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications||See the print edition|