- publishing free software manuals
The PostgreSQL 9.0 Reference Manual - Volume 3 - Server Administration Guide
by The PostgreSQL Global Development Group
Paperback (6"x9"), 274 pages
ISBN 9781906966072
RRP £9.95 ($14.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.3.3 Recovering using a Continuous Archive Backup

Okay, the worst has happened and you need to recover from your backup. Here is the procedure:

  1. Stop the server, if it's running.
  2. If you have the space to do so, copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later. Note that this precaution will require that you have enough free space on your system to hold two copies of your existing database. If you do not have enough space, you should at least save the contents of the cluster's ‘pg_xlog’ subdirectory, as it might contain logs which were not archived before the system went down.
  3. Remove all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.
  4. Restore the database files from your file system backup. Be sure that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in ‘pg_tblspc/’ were correctly restored.
  5. Remove any files present in ‘pg_xlog/’; these came from the file system backup and are therefore probably obsolete rather than current. If you didn't archive ‘pg_xlog/’ at all, then recreate it with proper permissions, being careful to ensure that you re-establish it as a symbolic link if you had it set up that way before.
  6. If you have unarchived WAL segment files that you saved in step 2, copy them into ‘pg_xlog/’. (It is best to copy them, not move them, so you still have the unmodified files if a problem occurs and you have to start over.)
  7. Create a recovery command file ‘recovery.conf’ in the cluster data directory (see section 12 Recovery Configuration). You might also want to temporarily modify ‘pg_hba.conf’ to prevent ordinary users from connecting until you are sure the recovery was successful.
  8. Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename ‘recovery.conf’ to ‘recovery.done’ (to prevent accidentally re-entering recovery mode later) and then commence normal database operations.
  9. Inspect the contents of the database to ensure you have recovered to the desired state. If not, return to step 1. If all is well, allow your users to connect by restoring ‘pg_hba.conf’ to normal.

The key part of all this is to set up a recovery configuration file that describes how you want to recover and how far the recovery should run. You can use ‘recovery.conf.sample’ (normally located in the installation's ‘share/’ directory) as a prototype. The one thing that you absolutely must specify in ‘recovery.conf’ is the restore_command, which tells PostgreSQL how to retrieve archived WAL file segments. Like the archive_command, this is a shell command string. It can contain %f, which is replaced by the name of the desired log file, and %p, which is replaced by the path name to copy the log file to. (The path name is relative to the current working directory, i.e., the cluster's data directory.) Write %% if you need to embed an actual % character in the command. The simplest useful command is something like:

restore_command = 'cp /mnt/server/archivedir/%f %p'

which will copy previously archived WAL segments from the directory ‘/mnt/server/archivedir’. Of course, you can use something much more complicated, perhaps even a shell script that requests the operator to mount an appropriate tape.

It is important that the command return nonzero exit status on failure. The command will be called requesting files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. Not all of the requested files will be WAL segment files; you should also expect requests for files with a suffix of .backup or .history. Also be aware that the base name of the %p path will be different from %f; do not expect them to be interchangeable.

WAL segments that cannot be found in the archive will be sought in ‘pg_xlog/’; this allows use of recent un-archived segments. However, segments that are available from the archive will be used in preference to files in ‘pg_xlog/’. The system will not overwrite the existing contents of ‘pg_xlog/’ when retrieving archived files.

Normally, recovery will proceed through all available WAL segments, thereby restoring the database to the current point in time (or as close as possible given the available WAL segments). Therefore, a normal recovery will end with a “file not found” message, the exact text of the error message depending upon your choice of restore_command. You may also see an error message at the start of recovery for a file named something like ‘00000001.history’. This is also normal and does not indicate a problem in simple recovery situations; see section 10.3.4 Timelines for discussion.

If you want to recover to some previous point in time (say, right before the junior DBA dropped your main transaction table), just specify the required stopping point in ‘recovery.conf’. You can specify the stop point, known as the “recovery target”, either by date/time or by completion of a specific transaction ID. As of this writing only the date/time option is very usable, since there are no tools to help you identify with any accuracy which transaction ID to use.

Note: The stop point must be after the ending time of the base backup, i.e., the end time of pg_stop_backup. You cannot use a base backup to recover to a time when that backup was in progress. (To recover to such a time, you must go back to your previous base backup and roll forward from there.)

If recovery finds corrupted WAL data, recovery will halt at that point and the server will not start. In such a case the recovery process could be re-run from the beginning, specifying a “recovery target” before the point of corruption so that recovery can complete normally. If recovery fails for an external reason, such as a system crash or if the WAL archive has become inaccessible, then the recovery can simply be restarted and it will restart almost from where it failed. Recovery restart works much like checkpointing in normal operation: the server periodically forces all its state to disk, and then updates the ‘pg_control’ file to indicate that the already-processed WAL data need not be scanned again.

ISBN 9781906966072The PostgreSQL 9.0 Reference Manual - Volume 3 - Server Administration GuideSee the print edition