|Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications|
by J. Seward, N. Nethercote, J. Weidendorfer and the Valgrind Development Team
Paperback (6"x9"), 164 pages
RRP £12.95 ($19.95)
The following list of limitations seems long. However, most programs actually work fine.
Valgrind will run Linux ELF binaries, on a kernel 2.4.X or 2.6.X system, on the x86, amd64, ppc32 and ppc64 architectures, subject to the following constraints:
- On x86 and amd64, there is no support for 3DNow! instructions. If the translator encounters these, Valgrind will generate a SIGILL when the instruction is executed. Apart from that, on x86 and amd64, essentially all instructions are supported, up to and including SSE3. On ppc32 and ppc64, almost all integer, floating point and Altivec instructions are supported. Specifically: integer and FP insns that are mandatory for PowerPC, the “General-purpose optional” group (fsqrt, fsqrts, stfiwx), the “Graphics optional” group (fre, fres, frsqrte, frsqrtes), and the Altivec (also known as VMX) SIMD instruction set, are supported.
- Atomic instruction sequences are not properly supported, in the sense that their atomicity is not preserved. This will affect any use of synchronization via memory shared between processes. They will appear to work, but fail sporadically.
- If your program does its own memory management, rather than using malloc/new/free/delete, it should still work, but Memcheck's error checking won't be so effective. If you describe your program's memory management scheme using “client requests” (see 4.1), Memcheck can do better. Nevertheless, using malloc/new and free/delete is still the best approach.
- Valgrind's signal simulation is not as robust as it could be. Basic POSIX-compliant sigaction and sigprocmask functionality is supplied, but it's conceivable that things could go badly awry if you do weird things with signals. Workaround: don't. Programs that do non-POSIX signal tricks are in any case inherently unportable, so should be avoided if possible.
- Machine instructions, and system calls, have been implemented on demand. So it's possible, although unlikely, that a program will fall over with a message to that effect. If this happens, please report all the details printed out, so we can try and implement the missing feature.
- Memory consumption of your program is majorly increased whilst running under Valgrind. This is due to the large amount of administrative information maintained behind the scenes. Another cause is that Valgrind dynamically translates the original executable. Translated, instrumented code is 12-18 times larger than the original so you can easily end up with 50+ MB of translations when running (e.g.) a web browser.
Valgrind can handle dynamically-generated code just fine. If
you regenerate code over the top of old code (i.e. at the same memory
addresses), if the code is on the stack Valgrind will realise the
code has changed, and work correctly. This is necessary to handle
the trampolines GCC uses to implemented nested functions. If you
regenerate code somewhere other than the stack, you will need to use
--smc-check=allflag, and Valgrind will run more slowly than normal.
As of version 3.0.0, Valgrind has the following limitations
in its implementation of x86/AMD64 floating point relative to
Precision: There is no support for 80 bit arithmetic.
Internally, Valgrind represents all such “long double” numbers in 64
bits, and so there may be some differences in results. Whether or
not this is critical remains to be seen. Note, the x86/amd64
fldt/fstpt instructions (read/write 80-bit numbers) are correctly
simulated, using conversions to/from 64 bits, so that in-memory
images of 80-bit numbers look correct if anyone wants to see.
The impression observed from many FP regression tests is that
the accuracy differences aren't significant. Generally speaking, if
a program relies on 80-bit precision, there may be difficulties
porting it to non x86/amd64 platforms which only support 64-bit FP
precision. Even on x86/amd64, the program may get different results
depending on whether it is compiled to use SSE2 instructions (64-bits
only), or x87 instructions (80-bit). The net effect is to make FP
programs behave as if they had been run on a machine with 64-bit IEEE
floats, for example PowerPC. On amd64 FP arithmetic is done by
default on SSE2, so amd64 looks more like PowerPC than x86 from an FP
perspective, and there are far fewer noticeable accuracy differences
than with x86.
Rounding: Valgrind does observe the 4 IEEE-mandated rounding
modes (to nearest, to +infinity, to -infinity, to zero) for the
following conversions: float to integer, integer to float where
there is a possibility of loss of precision, and float-to-float
rounding. For all other FP operations, only the IEEE default mode
(round to nearest) is supported.
Numeric exceptions in FP code: IEEE754 defines five types of
numeric exception that can happen: invalid operation (sqrt of
negative number, etc), division by zero, overflow, underflow,
inexact (loss of precision).
For each exception, two courses of action are defined by IEEE754:
either (1) a user-defined exception handler may be called, or (2) a
default action is defined, which “fixes things up” and allows the
computation to proceed without throwing an exception.
Currently Valgrind only supports the default fixup actions.
Again, feedback on the importance of exception support would be
When Valgrind detects that the program is trying to exceed any
of these limitations (setting exception handlers, rounding mode, or
precision control), it can print a message giving a traceback of
where this has happened, and continue execution. This behaviour used
to be the default, but the messages are annoying and so showing them
is now disabled by default. Use
--show-emwarns=yesto see them. The above limitations define precisely the IEEE754 'default' behaviour: default fixup on all exceptions, round-to-nearest operations, and 64-bit precision.
- As of version 3.0.0, Valgrind has the following limitations in its implementation of x86/AMD64 SSE2 FP arithmetic, relative to IEEE754. Essentially the same: no exceptions, and limited observance of rounding mode. Also, SSE2 has control bits which make it treat denormalised numbers as zero (DAZ) and a related action, flush denormals to zero (FTZ). Both of these cause SSE2 arithmetic to be less accurate than IEEE requires. Valgrind detects, ignores, and can warn about, attempts to enable either mode.
- As of version 3.2.0, Valgrind has the following limitations in its implementation of PPC32 and PPC64 floating point arithmetic, relative to IEEE754. Scalar (non-Altivec): Valgrind provides a bit-exact emulation of all floating point instructions, except for ‘fre’ and ‘fres’, which are done more precisely than required by the PowerPC architecture specification. All floating point operations observe the current rounding mode. However, ‘fpscr[FPRF]’ is not set after each operation. That could be done but would give measurable performance overheads, and so far no need for it has been found. As on x86/AMD64, IEEE754 exceptions are not supported: all floating point exceptions are handled using the default IEEE fixup actions. Valgrind detects, ignores, and can warn about, attempts to unmask the 5 IEEE FP exception kinds by writing to the floating-point status and control register (fpscr). Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2: no exceptions, and limited observance of rounding mode. For Altivec, FP arithmetic is done in IEEE/Java mode, which is more accurate than the Linux default setting. “More accurate” means that denormals are handled properly, rather than simply being flushed to zero.
Programs which are known not to work are:
- emacs starts up but immediately concludes it is out of memory and aborts. It may be that Memcheck does not provide a good enough emulation of the ‘mallinfo’ function. Emacs works fine if you build it to use the standard malloc/free routines.
|ISBN 0954612051||Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications||See the print edition|