Apache::AxKit::CacheCleaner - delete old files from AxKit's cache directory |
Apache::AxKit::CacheCleaner - delete old files from AxKit's cache directory
Method 1: run it from a cron job (recommended method):
axkit-cache-cleaner --max-age="1 month" --cache-dir=/var/cache/axkit
Method 2: configure it as an Apache cleanup handler in httpd.conf:
AxCacheDir /var/cache/axkit PerlCleanupHandler +Apache::AxKit::CacheCleaner PerlSetVar AxCacheCleanerFrequency "1 day" PerlSetVar AxCacheCleanerMaxAge "1 month"
To enable file locking (for either method):
AxCacheModule +Apache::AxKit::CacheCleaner
You also need to patch AxKit's cache module to enable file locking.
This module searches AxKit's cache directory and deletes files whose last access time is older than a specific maximum age. For example, you could run it nightly in a cron job and instruct it to delete files older than one month. Alternatively, you can configure it to run periodically during Apache's cleanup phase.
The maximum age of files to keep can be specified in terms of the following units:
second, minute, hour, day, month, year
Fractional units are allowed, and spaces between the number and the
unit are optional, that is, ``1.5months
'' is allowed as well as
``1.5 months
''. The unit can be specified as singular (``1
month
'') or plural (``2 months
'').
File locking can be enabled so that a file is not deleted from the
cache when it is about to be used again to serve a request. This
locking avoids the browser displaying a ``404 Not Found
'' error
message. To enable it, two steps are required:
AxCacheModule +Apache::AxKit::CacheCleanerPatch AxKit to allow a custom module to override the initialization of the cache. See below for details.
It is recommended that you set the configuration directive AxCacheDir to use a specific location for AxKit's cache directory. If you do not set AxCacheDir, then AxKit automatically creates a cache subdirectory named ``.xmlstyle_cache'' under the current directory whenever it processes a document. This CacheCleaner will clean out these subdirectories when configured as an Apache cleanup handler (assuming some document located above the cache subdirectory is visited periodically). However, when run from a cron job, you must specify all the cache subdirectories as in this example:
axkit-cache-cleaner --max-age=1month \ --cache-dir=/var/www/.xmlstyle_cache \ --cache-dir=/var/www/subdir1/.xmlstyle_cache \ --cache-dir=/var/www/subdir2/.xmlstyle_cache
Before describing how the cache cleaner operates, this section presents a short overview of how AxKit stores cache files.
AxKit assigns a 32-character hexadecimal key for caching a document. This key is generated based on the pathname of the source document and on other data such as hostname, port, media, and style. The first four characters of this key are used to create two subdirectory levels under the cache directory, and the document is stored under this subdirectory with a filename using the remaining 28 characters of the key. For example, a document could have a key of 8c6577c54cef4c2ca1cbdf319516e6d5, and AxKit would store the document in this location:
cachedir/8c/65/77c54cef4c2ca1cbdf319516e6d5
Additional files related to this cache entry are also stored in the cache. The content type is stored in a file with the same key and an extension of ``.type''. A list of dependencies is stored in a file with a different key, which is derived from the given key appended with '.depends' (but this string is not used as the file name extension).
Furthermore, XSP stores its Perl code in a cache file with a key
derived from the self-generated Perl package name appended with
'compiled XSP'. Output from the XSP script is stored in a cache file
with a key derived from the self-generated Perl package name appended
with any user-defined cache parameters (or '' if none defined).
However, the output cache file is not used unless it is separately
enabled by defining a method for has_changed
in your XSP script as
described in the http://axkit.org/wiki/view/AxKit/XSPGuide.
This CacheCleaner module searches for files under the two levels of subdirectories, but only for 2-character hexadecimal subdirectory names and only for 28-character hexadecimal filenames (with an optional extension). It compares the access timestamp to the current time, and if it is too old then the file is deleted. All types of old files are deleted: cached documents, content-type files, dependency files, XSP cache files, and anything else that matches the 28-character hexadecimal pattern.
While the files are being deleted, a list of empty subdirectories is generated, and these subdirectories are removed after all the old files are deleted.
When configured as an Apache Cleanup handler, this module uses the timestamp of a ``last_cleaned'' file in the cache directory to keep track of when the cache was last cleaned. It will not clean the cache more often than the period specified by AxCacheCleanerFrequency. When it is time to clean the cache, an exclusive lock is obtained on the ``last_cleaned'' file so that only one Apache process/thread is used to clean the cache and the others are free to continue serving requests.
The AxKit cache improves performance, so you should not clean it unless really necessary. If your web site is made up of static XML files, then there should be no reason to clean the cache (except maybe after a large update when lots of old source documents are deleted or renamed). Even if your web site includes some dynamic pages such as XSP, there should still be no reason to clean the cache because you would have only a limited number of source files. (Only the source code for your dynamic page is cached by default. The generated output document is not served from the cache without specific configuration to support it).
The main reason to clean AxKit's would be if your web site caches individualized pages (per user) and you have a large user base. In this situation, the cache would continue to grow as more and more users made use of your web site. This is especially true if the dynamically generated individual pages are dated. For example, if your web site generates monthly account statements for thousands of users on-the-fly, then you should probably clean out old statements from the cache directory periodically before they fill up the disk.
Without some form of cache file locking, a race condition exists where
it is possible that this CacheCleaner module will delete a cache file
just as AxKit is preparing to serve it to the browser. Here is the
scenario: assume there is a cache file old enough to be deleted.
Suppose a user requests the resource corresponding to this cache file.
AxKit checks the dependencies and decides to use the cache file. But
if the cache cleaner deletes it right after AxKit has performed its
checks and right before AxKit turns over the cache file name to Apache
to be served, a ``404 Not Found
'' error will be generated.
To avoid this possibility, locking has been implemented so that this module does not delete an old cache file if AxKit is in the process of considering whether to use it.
Two stages of locking are used. A primary lock file is used to lock out the removal of secondary cache directories. This lock file is located in the top-level cache directory. Secondary lock files are used to lock out the deletion of cache files under the secondary directories. These lock files are located under the secondary directories (one in each directory).
For each request, a shared lock is obtained on the primary lock file just before AxKit initializes the cache object. This ensures that a secondary subdirectory will not be removed before AxKit has a chance to review a potential cache file under it. After AxKit has initialized the cache object (and created the secondary subdirectory if it did not already exist), a shared lock is obtained on the secondary lock file, and then the primary lock is released. The secondary lock ensures that a cache file will not be deleted while AxKit considers whether to use it. After AxKit has finished serving the request, that is, after it has sent the document to the browser, the secondary lock is released.
When the cache cleaner is scheduled to run, it scans each secondary subdirectory one at a time and makes a list of cache files with a last access time older than the specified maximum age. For each subdirectory, it obtains an exclusive lock on the secondary lock file, deletes the old cache files, and releases the lock. If all the files were deleted, it adds the subdirectory to the list of empty directories which will be removed later. After the cache cleaner finishes scanning all the subdirectories and deleting all the old cache files, it then obtains an exclusive lock on the primary lock file, removes any empty subdirectories, and releases the lock.
It is necessary to patch AxKit to allow this module to override the
initialization of the cache so that locks can be obtained before cache
subdirectories are created. The included patch can be applied to
versions 1.6.X and 1.7.0 of AxKit. Without the patch, locking cannot
be enabled and a race condition exists where this cleanup module may
delete a cache file just as it is about to be served, leading to a
``404 Not Found
'' error message reported to the browser.
AxKit provides four directives ``to specify a new class to replace the one being used for certain operations.'' These are:
AxConfigReader default: Apache::AxKit::ConfigReader AxContentProvider default: Apache::AxKit::Provider AxStyleProvider default: Apache::AxKit::Provider AxCacheModule default: Apache::AxKit::Cache
The key phrase is ``certain operations''. Unlike the other modules, the
default AxCacheModule does not provide an overridable method for the
initialization of that object. Specifically, the default
AxConfigReader provides a get_config
method, and the default
AxContentProvider and AxStyleProvider modules provide an init
method. However, the default AxCacheModule does its initialization in
the new
method which cannot be overridden.
The included patch moves most of the initialization logic out of the
AxCacheModules's new
method and places it into an init
method
which can be overridden. This patch allows this CacheCleaner module
to obtain locks on the cache before it is initialized.
All the patch does is get rid of the possibility of a ``404 Not
Found
'' error message appearing. To decide if you need to install
the patch you need to consider: (1) how likely is this error to
happen, and (2) if it does happen what ill effects will it have? If
it is unlikely that the error will happen, and if it is ok for a user
to receive the error message once in a while, then you do not need to
install the patch.
Here are some recommendations if you decide not to install the patch:
Ken Neighbors <ken@nsds.com>
Copyright (c) 2006 Ken Neighbors. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Apache::AxKit::CacheCleaner - delete old files from AxKit's cache directory |