| Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications by J. Seward, N. Nethercote, J. Weidendorfer and the Valgrind Development Team Paperback (6"x9"), 164 pages ISBN 0954612051 RRP £12.95 ($19.95) |
6.1.2 Cache simulation specifics
Cachegrind simulates a machine with independent first level instruction and data caches (I1 and D1), backed by a unified second level cache (L2). This configuration is used by almost all modern machines. Some old Cyrix CPUs had a unified I and D L1 cache, but they are ancient history now.
Specific characteristics of the simulation are as follows:
- Write-allocate: when a write miss occurs, the block written to is brought into the D1 cache. Most modern caches have this property.
-
Bit-selection hash function: the line(s) in the cache
to which a memory block maps is chosen by the middle bits
M--(M+N-1) of the byte address, where:
- line size = 2^M bytes
- (cache size / line size) = 2^N bytes
- Inclusive L2 cache: the L2 cache replicates all the entries of the L1 cache. This is standard on Pentium chips, but AMD Opterons, Athlons and Durons use an exclusive L2 cache that only holds blocks evicted from L1. Ditto most modern VIA CPUs.
The cache configuration simulated (cache size, associativity and line size) is determined automagically using the CPUID instruction. If you have an old machine that (a) doesn't support the CPUID instruction, or (b) supports it in an early incarnation that doesn't give any cache information, then Cachegrind will fall back to using a default configuration (that of a model 3/4 Athlon). Cachegrind will tell you if this happens. You can manually specify one, two or all three levels (I1/D1/L2) of the cache from the command line using the ‘--I1’, ‘--D1’ and ‘--L2’ options.
On PowerPC platforms Cachegrind cannot automatically determine the cache configuration, so you will need to specify it with the ‘--I1’, ‘--D1’ and ‘--L2’ options.
Other noteworthy behaviour:
-
References that straddle two cache lines are treated as
follows:
- If both blocks hit => counted as one hit
- If one block hits, the other misses => counted as one miss.
- If both blocks miss => counted as one miss (not two)
- Instructions that modify a memory location (e.g. ‘inc’ and ‘dec’) are counted as doing just a read, i.e. a single data reference. This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting. Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur.
If you are interested in simulating a cache with different properties, it is not particularly hard to write your own cache simulator, or to modify the existing ones in ‘vg_cachesim_I1.c’, ‘vg_cachesim_D1.c’, ‘vg_cachesim_L2.c’ and ‘vg_cachesim_gen.c’. We'd be interested to hear from anyone who does.
| ISBN 0954612051 | Valgrind 3.3 - Advanced Debugging and Profiling for GNU/Linux applications | See the print edition |