| Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications by J. Seward, N. Nethercote, J. Weidendorfer and the Valgrind Development Team Paperback (6"x9"), 164 pages ISBN 0954612051 RRP £12.95 ($19.95) |
6.1 Cache and branch profiling
To use this tool, you must specify ‘--tool=cachegrind’ on the Valgrind command line.
Cachegrind is a tool for finding places where programs interact badly with typical modern superscalar processors and run slowly as a result. In particular, it will do a cache simulation of your program, and optionally a branch-predictor simulation, and can then annotate your source line-by-line with the number of cache misses and branch mispredictions. The following statistics are collected:
- L1 instruction cache reads and misses;
- L1 data cache reads and read misses, writes and write misses;
- L2 unified cache reads and read misses, writes and writes misses.
- Conditional branches and mispredicted conditional branches.
- Indirect branches and mispredicted indirect branches. An indirect branch is a jump or call to a destination only known at run time.
On a modern machine, an L1 miss will typically cost around 10 cycles, an L2 miss can cost as much as 200 cycles, and a mispredicted branch costs in the region of 10 to 30 cycles. Detailed cache and branch profiling can be very useful for improving the performance of your program.
Also, since one instruction cache read is performed per instruction executed, you can find out how many instructions are executed per line, which can be useful for traditional profiling and test coverage.
Branch profiling is not enabled by default. To use it, you must additionally specify ‘--branch-sim=yes’ on the command line.
| ISBN 0954612051 | Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications | See the print edition |