| The PostgreSQL 9.0 Reference Manual - Volume 3 - Server Administration Guide
by The PostgreSQL Global Development Group Paperback (6"x9"), 274 pages ISBN 9781906966072 RRP £9.95 ($14.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
10.3.1 Setting up WAL archiving
In an abstract sense, a running PostgreSQL system produces an indefinitely long sequence of WAL records. The system physically divides this sequence into WAL segment files, which are normally 16MB apiece (although the segment size can be altered when building PostgreSQL). The segment files are given numeric names that reflect their position in the abstract WAL sequence. When not using WAL archiving, the system normally creates just a few segment files and then “recycles” them by renaming no-longer-needed segment files to higher segment numbers. It's assumed that segment files whose contents precede the checkpoint-before-last are no longer of interest and can be recycled.
When archiving WAL data, we need to capture the contents of each segment
file once it is filled, and save that data somewhere before the segment
file is recycled for reuse. Depending on the application and the
available hardware, there could be many different ways of “saving
the data somewhere”: we could copy the segment files to an NFS-mounted
directory on another machine, write them onto a tape drive (ensuring that
you have a way of identifying the original name of each file), or batch
them together and burn them onto CDs, or something else entirely. To
provide the database administrator with flexibility,
PostgreSQL tries not to make any assumptions about how
the archiving will be done. Instead, PostgreSQL lets
the administrator specify a shell command to be executed to copy a
completed segment file to wherever it needs to go. The command could be
as simple as a cp, or it could invoke a complex shell
script--it's all up to you.
To enable WAL archiving, set the wal_level
configuration parameter to archive (or hot_standby),
archive_mode to on,
and specify the shell command to use in the archive_command configuration parameter. In practice
these settings will always be placed in the
‘postgresql.conf’ file.
In archive_command,
%p is replaced by the path name of the file to
archive, while %f is replaced by only the file name.
(The path name is relative to the current working directory,
i.e., the cluster's data directory.)
Use %% if you need to embed an actual %
character in the command. The simplest useful command is something
like:
archive_command = 'cp -i %p /mnt/server/archivedir/%f </dev/null' # Unix archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows
which will copy archivable WAL segments to the directory
‘/mnt/server/archivedir’. (This is an example, not a
recommendation, and might not work on all platforms.) After the
%p and %f parameters have been replaced,
the actual command executed might look like this:
cp -i pg_xlog/00000001000000A900000065 /mnt/server/archivedir/ 00000001000000A900000065 </dev/null
A similar command will be generated for each new file to be archived.
The archive command will be executed under the ownership of the same user that the PostgreSQL server is running as. Since the series of WAL files being archived contains effectively everything in your database, you will want to be sure that the archived data is protected from prying eyes; for example, archive into a directory that does not have group or world read access.
It is important that the archive command return zero exit status if and only if it succeeds. Upon getting a zero result, PostgreSQL will assume that the file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds.
The archive command should generally be designed to refuse to overwrite
any pre-existing archive file. This is an important safety feature to
preserve the integrity of your archive in case of administrator error
(such as sending the output of two different servers to the same archive
directory).
It is advisable to test your proposed archive command to ensure that it
indeed does not overwrite an existing file, and that it returns
nonzero status in this case. On many Unix platforms, cp
-i causes copy to prompt before overwriting a file, and
< /dev/null causes the prompt (and overwriting) to
fail. If your platform does not support this behavior, you should
add a command to test for the existence of the archive file. For
example, something like:
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
works correctly on most Unix variants.
While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The ‘pg_xlog/’ directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing ‘pg_xlog/’ fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)
The speed of the archiving command is unimportant as long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. If archiving falls significantly behind, this will increase the amount of data that would be lost in the event of a disaster. It will also mean that the ‘pg_xlog/’ directory will contain large numbers of not-yet-archived segment files, which could eventually exceed available disk space. You are advised to monitor the archiving process to ensure that it is working as you intend.
In writing your archive command, you should assume that the file names to
be archived can be up to 64 characters long and can contain any
combination of ASCII letters, digits, and dots. It is not necessary to
preserve the original relative path (%p) but it is necessary to
preserve the file name (%f).
Note that although WAL archiving will allow you to restore any modifications made to the data in your PostgreSQL database, it will not restore changes made to configuration files (that is, ‘postgresql.conf’, ‘pg_hba.conf’ and ‘pg_ident.conf’), since those are edited manually rather than through SQL operations. You might wish to keep the configuration files in a location that will be backed up by your regular file system backup procedures. See section 4.2 File Locations for how to relocate the configuration files.
The archive command is only invoked on completed WAL segments. Hence,
if your server generates only little WAL traffic (or has slack periods
where it does so), there could be a long delay between the completion
of a transaction and its safe recording in archive storage. To put
a limit on how old unarchived data can be, you can set
archive_timeout to force the server to switch
to a new WAL segment file at least that often. Note that archived
files that are archived early due to a forced switch are still the same
length as completely full files. It is therefore unwise to set a very
short archive_timeout---it will bloat your archive
storage. archive_timeout settings of a minute or so are
usually reasonable.
Also, you can force a segment switch manually with
pg_switch_xlog if you want to ensure that a
just-finished transaction is archived as soon as possible. Other utility
functions related to WAL management are listed in Volume 1A: Table 7-56.
When wal_level is minimal some SQL commands
are optimized to avoid WAL logging, as described in Volume 1A: 12.4.7 Disable WAL archival and streaming replication. If archiving or streaming replication were
turned on during execution of one of these statements, WAL would not
contain enough information for archive recovery. (Crash recovery is
unaffected.) For this reason, wal_level can only be changed at
server start. However, archive_command can be changed with a
configuration file reload. If you wish to temporarily stop archiving,
one way to do it is to set archive_command to the empty
string ('').
This will cause WAL files to accumulate in ‘pg_xlog/’ until a
working archive_command is re-established.
| ISBN 9781906966072 | The PostgreSQL 9.0 Reference Manual - Volume 3 - Server Administration Guide | See the print edition |