|The Apache HTTP Server Reference Manual
by Apache Software Foundation
Paperback (6"x9"), 862 pages
RRP £19.95 ($29.95)
There are two main stages in mod_cache that can occur in the lifetime of a request. First, mod_cache is a URL mapping module, which means that if a URL has been cached, and the cached version of that URL has not expired, the request will be served directly by mod_cache.
This means that any other stages that might ordinarily happen in the process of serving a request – for example being handled by mod_proxy, or mod_rewrite – won’t happen. But then this is the point of caching content in the first place.
If the URL is not found within the cache, mod_cache will add a filter (p. 1497) to the request handling. After Apache has located the content by the usual means, the filter will be run as the content is served. If the content is determined to be cacheable, the content will be saved to the cache for future serving.
If the URL is found within the cache, but also found to have expired, the filter is added anyway, but mod_cache will create a conditional request to the backend, to determine if the cached version is still current. If the cached version is still current, its meta-information will be updated and the request will be served from the cache. If the cached version is no longer current, the cached version will be deleted and the filter will save the updated content to the cache as it is served.
When caching locally generated content, ensuring that UseCanonicalName is set to On can dramatically improve the ratio of cache hits. This is because the hostname of the virtual-host serving the content forms a part of the cache key. With the setting set to On virtual-hosts with multiple server names or aliases will not produce differently cached entities, and instead content will be cached as per the canonical hostname.
Because caching is performed within the URL to filename translation phase, cached documents will only be served in response to URL requests. Ordinarily this is of little consequence, but there is one circumstance in which it matters: If you are using Server Side Includes (p. 1341);
<!-- The following include can be cached -->
<!--#include virtual="/footer.html" -->
<!-- The following include can not be cached -->
<!--#include file="/path/to/footer.html" -->
If you are using Server Side Includes, and want the benefit of speedy serves from the cache, you should use virtual include types.
The default expiry period for cached entities is one hour, however this can be easily over-ridden by using the CacheDefaultExpire directive. This default is only used when the original source of the content does not specify an expire time or time of last modification.
When content expires from the cache and is re-requested from the backend or content provider, rather than pass on the original request, Apache will use a conditional request instead.
HTTP offers a number of headers which allow a client, or cache to discern between different versions of the same content. For example if a resource was served with an "Etag:" header, it is possible to make a conditional request with an "If-None-Match:" header. If a resource was served with a "Last-Modified:" header it is possible to make a conditional request with an "If-Modified-Since:" header, and so on.
When such a conditional request is made, the response differs depending on whether the content matches the conditions. If a request is made with an "If-Modified-Since:" header, and the content has not been modified since the time indicated in the request then a terse "304 Not Modified" response is issued.
If the content has changed, then it is served as if the request were not conditional to begin with.
The benefits of conditional requests in relation to caching are twofold. Firstly, when making such a request to the backend, if the content from the backend matches the content in the store, this can be determined easily and without the overhead of transferring the entire resource.
Secondly, conditional requests are usually less strenuous on the backend. For static files, typically all that is involved is a call to stat() or similar system call, to see if the file has changed in size or modification time. As such, even if Apache is caching local content, even expired content may still be served faster from the cache if it has not changed. As long as reading from the cache store is faster than reading from the backend (e.g. an in-memory cache compared to reading from disk).
As mentioned already, the two styles of caching in Apache work differently, mod_file_cache caching maintains file contents as they were when Apache was started. When a request is made for a file that is cached by this module, it is intercepted and the cached file is served.
mod_cache caching on the other hand is more complex. When serving a request, if it has not been cached previously, the caching module will determine if the content is cacheable. The conditions for determining cachability of a response are;
- Caching must be enabled for this URL. See the CacheEnable and CacheDisable directives.
- The response must have a HTTP status code of 200, 203, 300, 301 or 410.
- The request must be a HTTP GET request.
- If the request contains an "Authorization:" header, the response will not be cached.
- If the response contains an "Authorization:" header, it must also contain an "s-maxage", "must-revalidate" or "public" option in the "Cache-Control:" header.
- If the URL included a query string (e.g. from an HTML form GET method) it will not be cached unless the response specifies an explicit expiration by including an "Expires:" header or the max-age or s-maxage directive of the "Cache-Control:" header, as per RFC2616 sections 13.9 and 13.2.1.
- If the response has a status of 200 (OK), the response must also include at least one of the "Etag", "Last-Modified" or the "Expires" headers, or the max-age or s-maxage directive of the "Cache-Control:" header, unless the CacheIgnoreNoLastMod directive has been used to require otherwise.
- If the response includes the "private" option in a "Cache-Control:" header, it will not be stored unless the CacheStorePrivate has been used to require otherwise.
- Likewise, if the response includes the "no-store" option in a "Cache-Control:" header, it will not be stored unless the CacheStoreNoStore has been used.
- A response will not be stored if it includes a "Vary:" header containing the match-all "*".
In short, any content which is highly time-sensitive, or which varies depending on the particulars of the request that are not covered by HTTP negotiation, should not be cached.
If you have dynamic content which changes depending on the IP address of the requester, or changes every 5 minutes, it should almost certainly not be cached.
If on the other hand, the content served differs depending on the values of various HTTP headers, it might be possible to cache it intelligently through the use of a "Vary" header.
If a response with a "Vary" header is received by mod_cache when requesting content by the backend it will attempt to handle it intelligently. If possible, mod_cache will detect the headers attributed in the "Vary" response in future requests and serve the correct cached response.
If for example, a response is received with a vary header such as;
|ISBN 9781906966034||The Apache HTTP Server Reference Manual||See the print edition|