| PostgreSQL Reference Manual - Volume 3 - Server Administration Guide by The PostgreSQL Global Development Group Paperback (6"x9"), 204 pages ISBN 0954612043 RRP £13.95 ($24.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
10.3.1 Setting up WAL archiving
In an abstract sense, a running PostgreSQL system produces an indefinitely long sequence of WAL records. The system physically divides this sequence into WAL segment files, which are normally 16MB apiece (although the size can be altered when building PostgreSQL). The segment files are given numeric names that reflect their position in the abstract WAL sequence. When not using WAL archiving, the system normally creates just a few segment files and then “recycles” them by renaming no-longer-needed segment files to higher segment numbers. It's assumed that a segment file whose contents precede the checkpoint-before-last is no longer of interest and can be recycled.
When archiving WAL data, we want to capture the contents of each segment
file once it is filled, and save that data somewhere before the segment
file is recycled for reuse. Depending on the application and the
available hardware, there could be many different ways of “saving
the data somewhere”: we could copy the segment files to an NFS-mounted
directory on another machine, write them onto a tape drive (ensuring that
you have a way of identifying the original name of each file), or batch
them together and burn them onto CDs, or something else entirely. To
provide the database administrator with as much flexibility as possible,
PostgreSQL tries not to make any assumptions about how
the archiving will be done. Instead, PostgreSQL lets
the administrator specify a shell command to be executed to copy a
completed segment file to wherever it needs to go. The command could be
as simple as a cp, or it could invoke a complex shell
script--it's all up to you.
The shell command to use is specified by the archive_command configuration parameter, which in practice
will always be placed in the ‘postgresql.conf’ file.
In this string,
any %p is replaced by the path name of the file to
archive, while any %f is replaced by the file name only.
(The path name is relative to the working directory of the server,
i.e., the cluster's data directory.)
Write %% if you need to embed an actual %
character in the command. The simplest useful command is something
like
archive_command = 'cp -i %p /mnt/server/archivedir/%f </dev/null'
which will copy archivable WAL segments to the directory ‘/mnt/server/archivedir’. (This is an example, not a recommendation, and may not work on all platforms.)
The archive command will be executed under the ownership of the same user that the PostgreSQL server is running as. Since the series of WAL files being archived contains effectively everything in your database, you will want to be sure that the archived data is protected from prying eyes; for example, archive into a directory that does not have group or world read access.
It is important that the archive command return zero exit status if and only if it succeeded. Upon getting a zero result, PostgreSQL will assume that the WAL segment file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds.
The archive command should generally be designed to refuse to overwrite
any pre-existing archive file. This is an important safety feature to
preserve the integrity of your archive in case of administrator error
(such as sending the output of two different servers to the same archive
directory).
It is advisable to test your proposed archive command to ensure that it
indeed does not overwrite an existing file, and that it returns
nonzero status in this case. We have found that cp -i does
this correctly on some platforms but not others. If the chosen command
does not itself handle this case correctly, you should add a command
to test for pre-existence of the archive file. For example, something
like
archive_command = 'test ! -f .../%f && cp %p .../%f'
works correctly on most Unix variants.
While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved relatively quickly. The ‘pg_xlog/’ directory will continue to fill with WAL segment files until the situation is resolved.
The speed of the archiving command is not important, so long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. If archiving falls significantly behind, this will increase the amount of data that would be lost in the event of a disaster. It will also mean that the ‘pg_xlog/’ directory will contain large numbers of not-yet-archived segment files, which could eventually exceed available disk space. You are advised to monitor the archiving process to ensure that it is working as you intend.
In writing your archive command, you should assume that the file names to
be archived may be up to 64 characters long and may contain any
combination of ASCII letters, digits, and dots. It is not necessary to
remember the original relative path (%p) but it is necessary to
remember the file name (%f).
Note that although WAL archiving will allow you to restore any modifications made to the data in your PostgreSQL database, it will not restore changes made to configuration files (that is, ‘postgresql.conf’, ‘pg_hba.conf’ and ‘pg_ident.conf’), since those are edited manually rather than through SQL operations. You may wish to keep the configuration files in a location that will be backed up by your regular file system backup procedures. See section 4.2 File Locations for how to relocate the configuration files.
The archive command is only invoked on completed WAL segments. Hence,
if your server generates only little WAL traffic (or has slack periods
where it does so), there could be a long delay between the completion
of a transaction and its safe recording in archive storage. To put
a limit on how old unarchived data can be, you can set
archive_timeout to force the server to switch
to a new WAL segment file at least that often. Note that archived
files that are ended early due to a forced switch are still the same
length as completely full files. It is therefore unwise to set a very
short archive_timeout---it will bloat your archive
storage. archive_timeout settings of a minute or so are
usually reasonable.
Also, you can force a segment switch manually with
pg_switch_xlog, if you want to ensure that a
just-finished transaction is archived immediately. Other utility
functions related to WAL management are listed in Volume 1: Table 7-47.
| ISBN 0954612043 | PostgreSQL Reference Manual - Volume 3 - Server Administration Guide | See the print edition |