- publishing free software manuals
The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 454 pages
ISBN 9781906966041
RRP £14.95 ($19.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.6.3 Synonym Dictionary

This dictionary template is used to create dictionaries that replace a word with a synonym. Phrases are not supported (use the thesaurus template (section 10.6.4 Thesaurus Dictionary) for that). A synonym dictionary can be used to overcome linguistic problems, for example, to prevent an English stemmer dictionary from reducing the word 'Paris' to 'pari'. It is enough to have a Paris paris line in the synonym dictionary and put it before the english_stem dictionary. For example:

SELECT * FROM ts_debug('english', 'Paris');
   alias   |   description   | token |  dictionaries  |
-----------+-----------------+-------+----------------+
 asciiword | Word, all ASCII | Paris | {english_stem}
  dictionary  | lexemes
--------------+---------
 | english_stem | {pari}

CREATE TEXT SEARCH DICTIONARY my_synonym (
    TEMPLATE = synonym,
    SYNONYMS = my_synonyms
);

ALTER TEXT SEARCH CONFIGURATION english
    ALTER MAPPING FOR asciiword
    WITH my_synonym, english_stem;

SELECT * FROM ts_debug('english', 'Paris');
   alias   |   description   | token |
-----------+-----------------+-------+
 asciiword | Word, all ASCII | Paris |
       dictionaries        | dictionary | lexemes
---------------------------+------------+---------
 {my_synonym,english_stem} | my_synonym | {paris}

The only parameter required by the synonym template is SYNONYMS, which is the base name of its configuration file---my_synonyms in the above example. The file's full name will be ‘$SHAREDIR/tsearch_data/my_synonyms.syn’ (where $SHAREDIR means the PostgreSQL installation's shared-data directory). The file format is just one line per word to be substituted, with the word followed by its synonym, separated by white space. Blank lines and trailing spaces are ignored.

The synonym template also has an optional parameter CaseSensitive, which defaults to false. When CaseSensitive is false, words in the synonym file are folded to lower case, as are input tokens. When it is true, words and tokens are not folded to lower case, but are compared as-is.

An asterisk (*) can be placed at the end of a synonym in the configuration file. This indicates that the synonym is a prefix. The asterisk is ignored when the entry is used in to_tsvector(), but when it is used in to_tsquery(), the result will be a query item with the prefix match marker (see section 10.3.2 Parsing Queries). For example, suppose we have these entries in ‘$SHAREDIR/tsearch_data/synonym_sample.syn’:

postgres        pgsql
postgresql      pgsql
postgre pgsql
gogle   googl
indices index*

Then we will get these results:

mydb=# CREATE TEXT SEARCH DICTIONARY syn (template=synonym, 
  synonyms='synonym_sample');
mydb=# SELECT ts_lexize('syn','indices');
 ts_lexize
-----------
 {index}
(1 row)

mydb=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple);
mydb=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR 
  asciiword WITH syn;
mydb=# SELECT to_tsvector('tst','indices');
 to_tsvector
-------------
 'index':1
(1 row)

mydb=# SELECT to_tsquery('tst','indices');
 to_tsquery
------------
 'index':*
(1 row)

mydb=# SELECT 'indexes are very useful'::tsvector;
            tsvector
---------------------------------
 'are' 'indexes' 'useful' 'very'
(1 row)

mydb=# SELECT 'indexes are very useful'::tsvector @@
 to_tsquery('tst','indices');
 ?column?
----------
 t
(1 row)
ISBN 9781906966041The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language ReferenceSee the print edition