| The PostgreSQL 9.0 Reference Manual - Volume 3 - Server Administration Guide
by The PostgreSQL Global Development Group Paperback (6"x9"), 274 pages ISBN 9781906966072 RRP £9.95 ($14.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
9.1.3 Updating Planner Statistics
The PostgreSQL query planner relies on
statistical information about the contents of tables in order to
generate good plans for queries. These statistics are gathered by
the ANALYZE command,
which can be invoked by itself or
as an optional step in VACUUM. It is important to have
reasonably accurate statistics, otherwise poor choices of plans might
degrade database performance.
The autovacuum daemon, if enabled, will automatically issue
ANALYZE commands whenever the content of a table has
changed sufficiently. However, administrators might prefer to rely
on manually-scheduled ANALYZE operations, particularly
if it is known that update activity on a table will not affect the
statistics of “interesting” columns. The daemon schedules
ANALYZE strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
ones. But even for a heavily-updated table, there might be no need for
statistics updates if the statistical distribution of the data is
not changing much. A simple rule of thumb is to think about how much
the minimum and maximum values of the columns in the table change.
For example, a timestamp column that contains the time
of row update will have a constantly-increasing maximum value as
rows are added and updated; such a column will probably need more
frequent statistics updates than, say, a column containing URLs for
pages accessed on a website. The URL column might receive changes just
as often, but the statistical distribution of its values probably
changes relatively slowly.
It is possible to run ANALYZE on specific tables and even
just specific columns of a table, so the flexibility exists to update some
statistics more frequently than others if your application requires it.
In practice, however, it is usually best to just analyze the entire
database, because it is a fast operation. ANALYZE uses a
statistically random sampling of the rows of a table rather than reading
every single row.
Tip: Although per-column tweaking of
ANALYZEfrequency might not be very productive, you might find it worthwhile to do per-column adjustment of the level of detail of the statistics collected byANALYZE. Columns that are heavily used inWHEREclauses and have highly irregular data distributions might require a finer-grain data histogram than other columns. SeeALTER TABLE SET STATISTICS, or change the database-wide default using thedefault_statistics_targetconfiguration parameter.Also, by default there is limited information available about the selectivity of functions. However, if you create an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.
| ISBN 9781906966072 | The PostgreSQL 9.0 Reference Manual - Volume 3 - Server Administration Guide | See the print edition |