|The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 454 pages
RRP £14.95 ($19.95)
Sales of this book support the PostgreSQL project! Get a printed copy>>>
3.9.2 Implementing Partitioning
To set up a partitioned table, do the following:
- Create the “master” table, from which all of the partitions will inherit. This table will contain no data. Do not define any check constraints on this table, unless you intend them to be applied equally to all partitions. There is no point in defining any indexes or unique constraints on it, either.
- Create several “child” tables that each inherit from the master table. Normally, these tables will not add any columns to the set inherited from the master. We will refer to the child tables as partitions, though they are in every way normal PostgreSQL tables.
Add table constraints to the partition tables to define the
allowed key values in each partition.
Typical examples would be:
CHECK ( x = 1 ) CHECK ( county IN ( 'Oxfordshire', 'Buckinghamshire', 'Warwickshire' )) CHECK ( outletID >= 100 AND outletID < 200 )Ensure that the constraints guarantee that there is no overlap between the key values permitted in different partitions. A common mistake is to set up range constraints like:
CHECK ( outletID BETWEEN 100 AND 200 ) CHECK ( outletID BETWEEN 200 AND 300 )This is wrong since it is not clear which partition the key value 200 belongs in. Note that there is no difference in syntax between range and list partitioning; those terms are descriptive only.
- For each partition, create an index on the key column(s), as well as any other indexes you might want. (The key index is not strictly necessary, but in most scenarios it is helpful. If you intend the key values to be unique then you should always create a unique or primary-key constraint for each partition.)
- Optionally, define a trigger or rule to redirect data inserted into the master table to the appropriate partition.
Ensure that the
constraint_exclusionconfiguration parameter is not disabled in ‘postgresql.conf’. If it is, queries will not be optimized as desired.
For example, suppose we are constructing a database for a large ice cream company. The company measures peak temperatures every day as well as ice cream sales in each region. Conceptually, we want a table like:
CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktemp int, unitsales int );
We know that most queries will access just the last week's, month's or quarter's data, since the main use of this table will be to prepare online reports for management. To reduce the amount of old data that needs to be stored, we decide to only keep the most recent 3 years worth of data. At the beginning of each month we will remove the oldest month's data.
In this situation we can use partitioning to help us meet all of our different requirements for the measurements table. Following the steps outlined above, partitioning can be set up as follows:
The master table is the
measurementtable, declared exactly as above.
Next we create one partition for each active month:
CREATE TABLE measurement_y2006m02 ( ) INHERITS (measurement); CREATE TABLE measurement_y2006m03 ( ) INHERITS (measurement); ... CREATE TABLE measurement_y2007m11 ( ) INHERITS (measurement); CREATE TABLE measurement_y2007m12 ( ) INHERITS (measurement); CREATE TABLE measurement_y2008m01 ( ) INHERITS (measurement);Each of the partitions are complete tables in their own right, but they inherit their definitions from the
measurementtable. This solves one of our problems: deleting old data. Each month, all we will need to do is perform a
DROP TABLEon the oldest child table and create a new child table for the new month's data.
We must provide non-overlapping table constraints. Rather than
just creating the partition tables as above, the table creation
script should really be:
CREATE TABLE measurement_y2006m02 ( CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m03 ( CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' ) ) INHERITS (measurement); ... CREATE TABLE measurement_y2007m11 ( CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE '2007-12-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2007m12 ( CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE '2008-01-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2008m01 ( CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' ) ) INHERITS (measurement);
We probably need indexes on the key columns too:
CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate); CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate); ... CREATE INDEX measurement_y2007m11_logdate ON measurement_y2007m11 (logdate); CREATE INDEX measurement_y2007m12_logdate ON measurement_y2007m12 (logdate); CREATE INDEX measurement_y2008m01_logdate ON measurement_y2008m01 (logdate);We choose not to add further indexes at this time.
We want our application to be able to say
INSERT INTO measurement ...and have the data be redirected into the appropriate partition table. We can arrange that by attaching a suitable trigger function to the master table. If data will be added only to the latest partition, we can use a very simple trigger function:
CREATE OR REPLACE FUNCTION measurement_insert_trigger() RETURNS TRIGGER AS $$ BEGIN INSERT INTO measurement_y2008m01 VALUES (NEW.*); RETURN NULL; END; $$ LANGUAGE plpgsql;After creating the function, we create a trigger which calls the trigger function:
CREATE TRIGGER insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();We must redefine the trigger function each month so that it always points to the current partition. The trigger definition does not need to be updated, however. We might want to insert data and have the server automatically locate the partition into which the row should be added. We could do this with a more complex trigger function, for example:
CREATE OR REPLACE FUNCTION measurement_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.logdate >= DATE '2006-02-01' AND NEW.logdate < DATE '2006-03-01' ) THEN INSERT INTO measurement_y2006m02 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-03-01' AND NEW.logdate < DATE '2006-04-01' ) THEN INSERT INTO measurement_y2006m03 VALUES (NEW.*); ... ELSIF ( NEW.logdate >= DATE '2008-01-01' AND NEW.logdate < DATE '2008-02-01' ) THEN INSERT INTO measurement_y2008m01 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql;The trigger definition is the same as before. Note that each
IFtest must exactly match the
CHECKconstraint for its partition. While this function is more complex than the single-month case, it doesn't need to be updated as often, since branches can be added in advance of being needed.
Note: In practice it might be best to check the newest partition first, if most inserts go into that partition. For simplicity we have shown the trigger's tests in the same order as in other parts of this example.
As we can see, a complex partitioning scheme could require a substantial amount of data definition language (DDL). In the above example we would be creating a new partition each month, so it might be wise to write a script that generates the required data definition language (DDL) automatically.
|ISBN 9781906966041||The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference||See the print edition|