- publishing free software manuals
PostgreSQL Reference Manual - Volume 3 - Server Administration Guide
by The PostgreSQL Global Development Group
Paperback (6"x9"), 204 pages
ISBN 0954612043
RRP £13.95 ($24.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.3.3 Recovering using a Continuous Archive Backup

Okay, the worst has happened and you need to recover from your backup. Here is the procedure:

  1. Stop the server, if it's running.
  2. If you have the space to do so, copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later. Note that this precaution will require that you have enough free space on your system to hold two copies of your existing database. If you do not have enough space, you need at the least to copy the contents of the ‘pg_xlog’ subdirectory of the cluster data directory, as it may contain logs which were not archived before the system went down.
  3. Clean out all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.
  4. Restore the database files from your backup dump. Be careful that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in ‘pg_tblspc/’ were correctly restored.
  5. Remove any files present in ‘pg_xlog/’; these came from the backup dump and are therefore probably obsolete rather than current. If you didn't archive ‘pg_xlog/’ at all, then recreate it, and be sure to recreate the subdirectory ‘pg_xlog/archive_status/’ as well.
  6. If you had unarchived WAL segment files that you saved in step 2, copy them into ‘pg_xlog/’. (It is best to copy them, not move them, so that you still have the unmodified files if a problem occurs and you have to start over.)
  7. Create a recovery command file ‘recovery.conf’ in the cluster data directory (see section Recovery Settings). You may also want to temporarily modify ‘pg_hba.conf’ to prevent ordinary users from connecting until you are sure the recovery has worked.
  8. Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename ‘recovery.conf’ to ‘recovery.done’ (to prevent accidentally re-entering recovery mode in case of a crash later) and then commence normal database operations.
  9. Inspect the contents of the database to ensure you have recovered to where you want to be. If not, return to step 1. If all is well, let in your users by restoring ‘pg_hba.conf’ to normal.

The key part of all this is to setup a recovery command file that describes how you want to recover and how far the recovery should run. You can use ‘recovery.conf.sample’ (normally installed in the installation ‘share/’ directory) as a prototype. The one thing that you absolutely must specify in ‘recovery.conf’ is the restore_command, which tells PostgreSQL how to get back archived WAL file segments. Like the archive_command, this is a shell command string. It may contain %f, which is replaced by the name of the desired log file, and %p, which is replaced by the path name to copy the log file to. (The path name is relative to the working directory of the server, i.e., the cluster's data directory.) Write %% if you need to embed an actual % character in the command. The simplest useful command is something like

restore_command = 'cp /mnt/server/archivedir/%f %p'

which will copy previously archived WAL segments from the directory ‘/mnt/server/archivedir’. You could of course use something much more complicated, perhaps even a shell script that requests the operator to mount an appropriate tape.

It is important that the command return nonzero exit status on failure. The command will be asked for log files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. Be aware also that the base name of the %p path will be different from %f; do not expect them to be interchangeable.

WAL segments that cannot be found in the archive will be sought in ‘pg_xlog/’; this allows use of recent un-archived segments. However segments that are available from the archive will be used in preference to files in ‘pg_xlog/’. The system will not overwrite the existing contents of ‘pg_xlog/’ when retrieving archived files.

Normally, recovery will proceed through all available WAL segments, thereby restoring the database to the current point in time (or as close as we can get given the available WAL segments). But if you want to recover to some previous point in time (say, right before the junior DBA dropped your main transaction table), just specify the required stopping point in ‘recovery.conf’. You can specify the stop point, known as the “recovery target”, either by date/time or by completion of a specific transaction ID. As of this writing only the date/time option is very usable, since there are no tools to help you identify with any accuracy which transaction ID to use.

Note: The stop point must be after the ending time of the base backup (the time of pg_stop_backup). You cannot use a base backup to recover to a time when that backup was still going on. (To recover to such a time, you must go back to your previous base backup and roll forward from there.)

If recovery finds a corruption in the WAL data then recovery will complete at that point and the server will not start. In such a case the recovery process could be re-run from the beginning, specifying a “recovery target” before the point of corruption so that recovery can complete normally. If recovery fails for an external reason, such as a system crash or if the WAL archive has become inaccessible, then the recovery can simply be restarted and it will restart almost from where it failed. Recovery restart works much like checkpointing in normal operation: the server periodically forces all its state to disk, and then updates the ‘pg_control’ file to indicate that the already-processed WAL data need not be scanned again.

ISBN 0954612043PostgreSQL Reference Manual - Volume 3 - Server Administration GuideSee the print edition