- publishing free software manuals
The Apache HTTP Server Reference Manual
by Apache Software Foundation
Paperback (6"x9"), 862 pages
ISBN 9781906966034
RRP £19.95 ($29.95)

Get a printed copy>>>

8.6  Disk-based Caching





Related Modules

Related Directives





mod_disk_cache

CacheEnable
CacheDisable




mod_disk_cache provides a disk-based caching mechanism for mod_cache. As with mod_mem_cache this cache is intelligent and content will be served from the cache only as long as it is considered valid.

Typically the module will be configured as so;

CacheRoot   /var/cache/apache/  
CacheEnable disk /  
CacheDirLevels 2  
CacheDirLength 1

Importantly, as the cached files are locally stored, operating system in-memory caching will typically be applied to their access also. So although the files are stored on disk, if they are frequently accessed it is likely the operating system will ensure that they are actually served from memory.

8.6.1  Understanding the Cache-Store

To store items in the cache, mod_disk_cache creates a 22 character hash of the URL being requested. This hash incorporates the hostname, protocol, port, path and any CGI arguments to the URL, to ensure that multiple URLs do not collide.

Each character may be any one of 64-different characters, which mean that overall there are 64ˆ22 possible hashes. For example, a URL might be hashed to xyTGxSMO2b68mBCykqkp1w. This hash is used as a prefix for the naming of the files specific to that URL within the cache, however first it is split up into directories as per the CacheDirLevels and CacheDirLength directives.

CacheDirLevels specifies how many levels of subdirectory there should be, and CacheDirLength specifies how many characters should be in each directory. With the example settings given above, the hash would be turned into a filename prefix as /var/cache/apache/x/y/TGxSMO2b68mBCykqkp1w.

The overall aim of this technique is to reduce the number of subdirectories or files that may be in a particular directory, as most file-systems slow down as this number increases. With setting of "1" for CacheDirLength there can at most be 64 subdirectories at any particular level. With a setting of 2 there can be 64 * 64 subdirectories, and so on. Unless you have a good reason not to, using a setting of "1" for CacheDirLength is recommended.

Setting CacheDirLevels depends on how many files you anticipate to store in the cache. With the setting of "2" used in the above example, a grand total of 4096 subdirectories can ultimately be created. With 1 million files cached, this works out at roughly 245 cached URLs per directory.

Each URL uses at least two files in the cache-store. Typically there is a ".header" file, which includes meta-information about the URL, such as when it is due to expire and a ".data" file which is a verbatim copy of the content to be served.

In the case of a content negotiated via the "Vary" header, a ".vary" directory will be created for the URL in question. This directory will have multiple ".data" files corresponding to the differently negotiated content.

8.6.2  Maintaining the Disk Cache

Although mod_disk_cache will remove cached content as it is expired, it does not maintain any information on the total size of the cache or how little free space may be left.

Instead, provided with Apache is the htcacheclean (p. 59) tool which, as the name suggests, allows you to clean the cache periodically. Determining how frequently to run htcacheclean (p. 59) and what target size to use for the cache is somewhat complex and trial and error may be needed to select optimal values.

htcacheclean (p. 59) has two modes of operation. It can be run as persistent daemon, or periodically from cron. htcacheclean (p. 59) can take up to an hour or more to process very large (tens of gigabytes) caches and if you are running it from cron it is recommended that you determine how long a typical run takes, to avoid running more than one instance at a time.


(Picture in printed edition)

Figure 8.1: Typical cache growth / clean sequence.


Because mod_disk_cache does not itself pay attention to how much space is used you should ensure that htcacheclean (p. 59) is configured to leave enough "grow room" following a clean.

ISBN 9781906966034The Apache HTTP Server Reference ManualSee the print edition