- publishing free software manuals
The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 454 pages
ISBN 9781906966041
RRP £14.95 ($19.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.2.2 Creating Indexes

We can create a GIN index (section 10.9 GiST and GIN Index Types) to speed up text searches:

CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', 
  body));

Notice that the 2-argument version of to_tsvector is used. Only text search functions that specify a configuration name can be used in expression indexes (section 9.7 Indexes on Expressions). This is because the index contents must be unaffected by default_text_search_config. If they were affected, the index contents might be inconsistent because different entries could contain tsvectors that were created with different text search configurations, and there would be no way to guess which was which. It would be impossible to dump and restore such an index correctly.

Because the two-argument version of to_tsvector was used in the index above, only a query reference that uses the 2-argument version of to_tsvector with the same configuration name will use that index. That is, WHERE to_tsvector('english', body) @@ 'a & b' can use the index, but WHERE to_tsvector(body) @@ 'a & b' cannot. This ensures that an index will be used only with the same configuration used to create the index entries.

It is possible to set up more complex expression indexes wherein the configuration name is specified by another column, e.g.:

CREATE INDEX pgweb_idx ON pgweb USING
   gin(to_tsvector(config_name, body));

where config_name is a column in the pgweb table. This allows mixed configurations in the same index while recording which configuration was used for each index entry. This would be useful, for example, if the document collection contained documents in different languages. Again, queries that are meant to use the index must be phrased to match, e.g., WHERE to_tsvector(config_name, body) @@ 'a & b'.

Indexes can even concatenate columns:

CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', 
  title || ' ' || body));

Another approach is to create a separate tsvector column to hold the output of to_tsvector. This example is a concatenation of title and body, using coalesce to ensure that one field will still be indexed when the other is NULL:

ALTER TABLE pgweb ADD COLUMN textsearchable_index_col tsvector;
UPDATE pgweb SET textsearchable_index_col =
     to_tsvector('english', coalesce(title,”) || ' ' || 
  coalesce(body,”));

Then we create a GIN index to speed up the search:

CREATE INDEX textsearch_idx ON pgweb USING 
  gin(textsearchable_index_col);

Now we are ready to perform a fast full text search:

SELECT title
FROM pgweb
WHERE textsearchable_index_col @@ to_tsquery('create & table')
ORDER BY last_mod_date DESC
LIMIT 10;

When using a separate column to store the tsvector representation, it is necessary to create a trigger to keep the tsvector column current anytime title or body changes. section 10.4.3 Triggers for Automatic Updates explains how to do that.

One advantage of the separate-column approach over an expression index is that it is not necessary to explicitly specify the text search configuration in queries in order to make use of the index. As shown in the example above, the query can depend on default_text_search_config. Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches. (This is more important when using a GiST index than a GIN index; see section 10.9 GiST and GIN Index Types.) The expression-index approach is simpler to set up, however, and it requires less disk space since the tsvector representation is not stored explicitly.

ISBN 9781906966041The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language ReferenceSee the print edition