- publishing free software manuals
PostgreSQL Reference Manual - Volume 3 - Server Administration Guide
by The PostgreSQL Global Development Group
Paperback (6"x9"), 204 pages
ISBN 0954612043
RRP £13.95 ($24.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.3.1 Setting up WAL archiving

In an abstract sense, a running PostgreSQL system produces an indefinitely long sequence of WAL records. The system physically divides this sequence into WAL segment files, which are normally 16MB apiece (although the size can be altered when building PostgreSQL). The segment files are given numeric names that reflect their position in the abstract WAL sequence. When not using WAL archiving, the system normally creates just a few segment files and then “recycles” them by renaming no-longer-needed segment files to higher segment numbers. It's assumed that a segment file whose contents precede the checkpoint-before-last is no longer of interest and can be recycled.

When archiving WAL data, we want to capture the contents of each segment file once it is filled, and save that data somewhere before the segment file is recycled for reuse. Depending on the application and the available hardware, there could be many different ways of “saving the data somewhere”: we could copy the segment files to an NFS-mounted directory on another machine, write them onto a tape drive (ensuring that you have a way of identifying the original name of each file), or batch them together and burn them onto CDs, or something else entirely. To provide the database administrator with as much flexibility as possible, PostgreSQL tries not to make any assumptions about how the archiving will be done. Instead, PostgreSQL lets the administrator specify a shell command to be executed to copy a completed segment file to wherever it needs to go. The command could be as simple as a cp, or it could invoke a complex shell script--it's all up to you.

The shell command to use is specified by the archive_command configuration parameter, which in practice will always be placed in the ‘postgresql.conf’ file. In this string, any %p is replaced by the path name of the file to archive, while any %f is replaced by the file name only. (The path name is relative to the working directory of the server, i.e., the cluster's data directory.) Write %% if you need to embed an actual % character in the command. The simplest useful command is something like

archive_command = 'cp -i %p /mnt/server/archivedir/%f </dev/null'

which will copy archivable WAL segments to the directory ‘/mnt/server/archivedir’. (This is an example, not a recommendation, and may not work on all platforms.)

The archive command will be executed under the ownership of the same user that the PostgreSQL server is running as. Since the series of WAL files being archived contains effectively everything in your database, you will want to be sure that the archived data is protected from prying eyes; for example, archive into a directory that does not have group or world read access.

It is important that the archive command return zero exit status if and only if it succeeded. Upon getting a zero result, PostgreSQL will assume that the WAL segment file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds.

The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory). It is advisable to test your proposed archive command to ensure that it indeed does not overwrite an existing file, and that it returns nonzero status in this case. We have found that cp -i does this correctly on some platforms but not others. If the chosen command does not itself handle this case correctly, you should add a command to test for pre-existence of the archive file. For example, something like

archive_command = 'test ! -f .../%f && cp %p .../%f'

works correctly on most Unix variants.

While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved relatively quickly. The ‘pg_xlog/’ directory will continue to fill with WAL segment files until the situation is resolved.

The speed of the archiving command is not important, so long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. If archiving falls significantly behind, this will increase the amount of data that would be lost in the event of a disaster. It will also mean that the ‘pg_xlog/’ directory will contain large numbers of not-yet-archived segment files, which could eventually exceed available disk space. You are advised to monitor the archiving process to ensure that it is working as you intend.

In writing your archive command, you should assume that the file names to be archived may be up to 64 characters long and may contain any combination of ASCII letters, digits, and dots. It is not necessary to remember the original relative path (%p) but it is necessary to remember the file name (%f).

Note that although WAL archiving will allow you to restore any modifications made to the data in your PostgreSQL database, it will not restore changes made to configuration files (that is, ‘postgresql.conf’, ‘pg_hba.conf’ and ‘pg_ident.conf’), since those are edited manually rather than through SQL operations. You may wish to keep the configuration files in a location that will be backed up by your regular file system backup procedures. See section 4.2 File Locations for how to relocate the configuration files.

The archive command is only invoked on completed WAL segments. Hence, if your server generates only little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To put a limit on how old unarchived data can be, you can set archive_timeout to force the server to switch to a new WAL segment file at least that often. Note that archived files that are ended early due to a forced switch are still the same length as completely full files. It is therefore unwise to set a very short archive_timeout---it will bloat your archive storage. archive_timeout settings of a minute or so are usually reasonable.

Also, you can force a segment switch manually with pg_switch_xlog, if you want to ensure that a just-finished transaction is archived immediately. Other utility functions related to WAL management are listed in Volume 1: Table 7-47.

ISBN 0954612043PostgreSQL Reference Manual - Volume 3 - Server Administration GuideSee the print edition