Apache::AxKit::CacheCleaner - delete old files from AxKit's cache directory



NAME

Apache::AxKit::CacheCleaner - delete old files from AxKit's cache directory


SYNOPSIS

Method 1: run it from a cron job (recommended method):

    axkit-cache-cleaner --max-age="1 month" --cache-dir=/var/cache/axkit

Method 2: configure it as an Apache cleanup handler in httpd.conf:

    AxCacheDir /var/cache/axkit
    PerlCleanupHandler +Apache::AxKit::CacheCleaner
    PerlSetVar AxCacheCleanerFrequency "1 day"
    PerlSetVar AxCacheCleanerMaxAge    "1 month"

To enable file locking (for either method):

    AxCacheModule +Apache::AxKit::CacheCleaner

You also need to patch AxKit's cache module to enable file locking.


DESCRIPTION

This module searches AxKit's cache directory and deletes files whose last access time is older than a specific maximum age. For example, you could run it nightly in a cron job and instruct it to delete files older than one month. Alternatively, you can configure it to run periodically during Apache's cleanup phase.

The maximum age of files to keep can be specified in terms of the following units:

    second, minute, hour, day, month, year

Fractional units are allowed, and spaces between the number and the unit are optional, that is, ``1.5months'' is allowed as well as ``1.5 months''. The unit can be specified as singular (``1 month'') or plural (``2 months'').

File locking can be enabled so that a file is not deleted from the cache when it is about to be used again to serve a request. This locking avoids the browser displaying a ``404 Not Found'' error message. To enable it, two steps are required:

  1. Configure AxKit to use this CacheCleaner as its caching module in httpd.conf:
        AxCacheModule +Apache::AxKit::CacheCleaner

  2. Patch AxKit to allow a custom module to override the initialization of the cache. See below for details.

Note about AxCacheDir

It is recommended that you set the configuration directive AxCacheDir to use a specific location for AxKit's cache directory. If you do not set AxCacheDir, then AxKit automatically creates a cache subdirectory named ``.xmlstyle_cache'' under the current directory whenever it processes a document. This CacheCleaner will clean out these subdirectories when configured as an Apache cleanup handler (assuming some document located above the cache subdirectory is visited periodically). However, when run from a cron job, you must specify all the cache subdirectories as in this example:

    axkit-cache-cleaner --max-age=1month \
        --cache-dir=/var/www/.xmlstyle_cache \
        --cache-dir=/var/www/subdir1/.xmlstyle_cache \
        --cache-dir=/var/www/subdir2/.xmlstyle_cache

Overview of AxKit Caching

Before describing how the cache cleaner operates, this section presents a short overview of how AxKit stores cache files.

AxKit assigns a 32-character hexadecimal key for caching a document. This key is generated based on the pathname of the source document and on other data such as hostname, port, media, and style. The first four characters of this key are used to create two subdirectory levels under the cache directory, and the document is stored under this subdirectory with a filename using the remaining 28 characters of the key. For example, a document could have a key of 8c6577c54cef4c2ca1cbdf319516e6d5, and AxKit would store the document in this location:

  cachedir/8c/65/77c54cef4c2ca1cbdf319516e6d5

Additional files related to this cache entry are also stored in the cache. The content type is stored in a file with the same key and an extension of ``.type''. A list of dependencies is stored in a file with a different key, which is derived from the given key appended with '.depends' (but this string is not used as the file name extension).

Furthermore, XSP stores its Perl code in a cache file with a key derived from the self-generated Perl package name appended with 'compiled XSP'. Output from the XSP script is stored in a cache file with a key derived from the self-generated Perl package name appended with any user-defined cache parameters (or '' if none defined). However, the output cache file is not used unless it is separately enabled by defining a method for has_changed in your XSP script as described in the http://axkit.org/wiki/view/AxKit/XSPGuide.

Cache Cleaning Details

This CacheCleaner module searches for files under the two levels of subdirectories, but only for 2-character hexadecimal subdirectory names and only for 28-character hexadecimal filenames (with an optional extension). It compares the access timestamp to the current time, and if it is too old then the file is deleted. All types of old files are deleted: cached documents, content-type files, dependency files, XSP cache files, and anything else that matches the 28-character hexadecimal pattern.

While the files are being deleted, a list of empty subdirectories is generated, and these subdirectories are removed after all the old files are deleted.

When configured as an Apache Cleanup handler, this module uses the timestamp of a ``last_cleaned'' file in the cache directory to keep track of when the cache was last cleaned. It will not clean the cache more often than the period specified by AxCacheCleanerFrequency. When it is time to clean the cache, an exclusive lock is obtained on the ``last_cleaned'' file so that only one Apache process/thread is used to clean the cache and the others are free to continue serving requests.

Do I Really Need to Clean AxKit's Cache?

The AxKit cache improves performance, so you should not clean it unless really necessary. If your web site is made up of static XML files, then there should be no reason to clean the cache (except maybe after a large update when lots of old source documents are deleted or renamed). Even if your web site includes some dynamic pages such as XSP, there should still be no reason to clean the cache because you would have only a limited number of source files. (Only the source code for your dynamic page is cached by default. The generated output document is not served from the cache without specific configuration to support it).

The main reason to clean AxKit's would be if your web site caches individualized pages (per user) and you have a large user base. In this situation, the cache would continue to grow as more and more users made use of your web site. This is especially true if the dynamically generated individual pages are dated. For example, if your web site generates monthly account statements for thousands of users on-the-fly, then you should probably clean out old statements from the cache directory periodically before they fill up the disk.

Locking Details

Without some form of cache file locking, a race condition exists where it is possible that this CacheCleaner module will delete a cache file just as AxKit is preparing to serve it to the browser. Here is the scenario: assume there is a cache file old enough to be deleted. Suppose a user requests the resource corresponding to this cache file. AxKit checks the dependencies and decides to use the cache file. But if the cache cleaner deletes it right after AxKit has performed its checks and right before AxKit turns over the cache file name to Apache to be served, a ``404 Not Found'' error will be generated.

To avoid this possibility, locking has been implemented so that this module does not delete an old cache file if AxKit is in the process of considering whether to use it.

Two stages of locking are used. A primary lock file is used to lock out the removal of secondary cache directories. This lock file is located in the top-level cache directory. Secondary lock files are used to lock out the deletion of cache files under the secondary directories. These lock files are located under the secondary directories (one in each directory).

Locking During Request

For each request, a shared lock is obtained on the primary lock file just before AxKit initializes the cache object. This ensures that a secondary subdirectory will not be removed before AxKit has a chance to review a potential cache file under it. After AxKit has initialized the cache object (and created the secondary subdirectory if it did not already exist), a shared lock is obtained on the secondary lock file, and then the primary lock is released. The secondary lock ensures that a cache file will not be deleted while AxKit considers whether to use it. After AxKit has finished serving the request, that is, after it has sent the document to the browser, the secondary lock is released.

Locking During Cleanup

When the cache cleaner is scheduled to run, it scans each secondary subdirectory one at a time and makes a list of cache files with a last access time older than the specified maximum age. For each subdirectory, it obtains an exclusive lock on the secondary lock file, deletes the old cache files, and releases the lock. If all the files were deleted, it adds the subdirectory to the list of empty directories which will be removed later. After the cache cleaner finishes scanning all the subdirectories and deleting all the old cache files, it then obtains an exclusive lock on the primary lock file, removes any empty subdirectories, and releases the lock.

Patching AxKit

It is necessary to patch AxKit to allow this module to override the initialization of the cache so that locks can be obtained before cache subdirectories are created. The included patch can be applied to versions 1.6.X and 1.7.0 of AxKit. Without the patch, locking cannot be enabled and a race condition exists where this cleanup module may delete a cache file just as it is about to be served, leading to a ``404 Not Found'' error message reported to the browser.

Why is the patch needed?

AxKit provides four directives ``to specify a new class to replace the one being used for certain operations.'' These are:

    AxConfigReader     default: Apache::AxKit::ConfigReader
    AxContentProvider  default: Apache::AxKit::Provider
    AxStyleProvider    default: Apache::AxKit::Provider
    AxCacheModule      default: Apache::AxKit::Cache

The key phrase is ``certain operations''. Unlike the other modules, the default AxCacheModule does not provide an overridable method for the initialization of that object. Specifically, the default AxConfigReader provides a get_config method, and the default AxContentProvider and AxStyleProvider modules provide an init method. However, the default AxCacheModule does its initialization in the new method which cannot be overridden.

What does the patch do?

The included patch moves most of the initialization logic out of the AxCacheModules's new method and places it into an init method which can be overridden. This patch allows this CacheCleaner module to obtain locks on the cache before it is initialized.

Do I really need to install the patch?

All the patch does is get rid of the possibility of a ``404 Not Found'' error message appearing. To decide if you need to install the patch you need to consider: (1) how likely is this error to happen, and (2) if it does happen what ill effects will it have? If it is unlikely that the error will happen, and if it is ok for a user to receive the error message once in a while, then you do not need to install the patch.

Here are some recommendations if you decide not to install the patch:


SEE ALSO

AxKit


AUTHOR

Ken Neighbors <ken@nsds.com>


COPYRIGHT and LICENSE

Copyright (c) 2006 Ken Neighbors. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

 Apache::AxKit::CacheCleaner - delete old files from AxKit's cache directory