- publishing free software manuals
PostgreSQL Reference Manual - Volume 3 - Server Administration Guide
by The PostgreSQL Global Development Group
Paperback (6"x9"), 204 pages
ISBN 0954612043
RRP £13.95 ($24.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.4.3 Failover

If the primary server fails then the standby server should begin failover procedures.

If the standby server fails then no failover need take place. If the standby server can be restarted, even some time later, then the recovery process can also be immediately restarted, taking advantage of restartable recovery. If the standby server cannot be restarted, then a full new standby server should be created.

If the primary server fails and then immediately restarts, you must have a mechanism for informing it that it is no longer the primary. This is sometimes known as STONITH (Shoot the Other Node In The Head), which is necessary to avoid situations where both systems think they are the primary, which can lead to confusion and ultimately data loss.

Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary. It is also possible to use a third system (called a witness server) to avoid some problems of inappropriate failover, but the additional complexity may not be worthwhile unless it is set-up with sufficient care and rigorous testing.

Once failover to the standby occurs, we have only a single server in operation. This is known as a degenerate state. The former standby is now the primary, but the former primary is down and may stay down. To return to normal operation we must fully recreate a standby server, either on the former primary system when it comes up, or on a third, possibly new, system. Once complete the primary and standby can be considered to have switched roles. Some people choose to use a third server to provide backup to the new primary until the new standby server is recreated, though clearly this complicates the system configuration and operational processes.

So, switching from primary to standby server can be fast but requires some time to re-prepare the failover cluster. Regular switching from primary to standby is encouraged, since it allows regular downtime on each system for maintenance. This also acts as a test of the failover mechanism to ensure that it will really work when you need it. Written administration procedures are advised.

ISBN 0954612043PostgreSQL Reference Manual - Volume 3 - Server Administration GuideSee the print edition