|PostgreSQL Reference Manual - Volume 1 - SQL Language Reference|
by The PostgreSQL Global Development Group
Paperback (6"x9"), 716 pages
RRP £32.00 ($49.95)
Sales of this book support the PostgreSQL project! Get a printed copy>>>
9.9 Examining Index Usage
Although indexes in PostgreSQL do not need
maintenance and tuning, it is still important to check
which indexes are actually used by the real-life query workload.
Examining index usage for an individual query is done with the
command; its application for this purpose is
illustrated in section 11.1 Using EXPLAIN.
It is also possible to gather overall statistics about index usage
in a running server, as described in Volume 3: The Statistics Collector.
It is difficult to formulate a general procedure for determining which indexes to set up. There are a number of typical cases that have been shown in the examples throughout the previous sections. A good deal of experimentation will be necessary in most cases. The rest of this section gives some tips for that.
ANALYZEfirst. This command collects statistics about the distribution of the values in the table. This information is required to guess the number of rows returned by a query, which is needed by the planner to assign realistic costs to each possible query plan. In absence of any real statistics, some default values are assumed, which are almost certain to be inaccurate. Examining an application's index usage without having run
ANALYZEis therefore a lost cause.
- Use real data for experimentation. Using test data for setting up indexes will tell you what indexes you need for the test data, but that is all. It is especially fatal to use very small test data sets. While selecting 1000 out of 100000 rows could be a candidate for an index, selecting 1 out of 100 rows will hardly be, because the 100 rows will probably fit within a single disk page, and there is no plan that can beat sequentially fetching 1 disk page. Also be careful when making up test data, which is often unavoidable when the application is not in production use yet. Values that are very similar, completely random, or inserted in sorted order will skew the statistics away from the distribution that real data would have.
When indexes are not used, it can be useful for testing to force
their use. There are run-time parameters that can turn off
various plan types (see Volume 3: Planner Method Configuration).
For instance, turning off sequential scans
enable_seqscan) and nested-loop joins (
enable_nestloop), which are the most basic plans, will force the system to use a different plan. If the system still chooses a sequential scan or nested-loop join then there is probably a more fundamental reason why the index is not used; for example, the query condition does not match the index. (What kind of query can use what kind of index is explained in the previous sections.)
If forcing index usage does use the index, then there are two
possibilities: Either the system is right and using the index is
indeed not appropriate, or the cost estimates of the query plans
are not reflecting reality. So you should time your query with
and without indexes. The
EXPLAIN ANALYZEcommand can be useful here.
If it turns out that the cost estimates are wrong, there are,
again, two possibilities. The total cost is computed from the
per-row costs of each plan node times the selectivity estimate of
the plan node. The costs estimated for the plan nodes can be adjusted
via run-time parameters (described in Volume 3: Planner Cost Constants).
An inaccurate selectivity estimate is due to
insufficient statistics. It may be possible to improve this by
tuning the statistics-gathering parameters (see
ALTER TABLE). If you do not succeed in adjusting the costs to be more appropriate, then you may have to resort to forcing index usage explicitly. You may also want to contact the PostgreSQL developers to examine the issue.
|ISBN 0954612027||PostgreSQL Reference Manual - Volume 1 - SQL Language Reference||See the print edition|