| The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group Paperback (6"x9"), 454 pages ISBN 9781906966041 RRP £14.95 ($19.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
10.6.3 Synonym Dictionary
This dictionary template is used to create dictionaries that replace a
word with a synonym. Phrases are not supported (use the thesaurus
template (section 10.6.4 Thesaurus Dictionary) for that). A synonym
dictionary can be used to overcome linguistic problems, for example, to
prevent an English stemmer dictionary from reducing the word 'Paris' to
'pari'. It is enough to have a Paris paris line in the
synonym dictionary and put it before the english_stem
dictionary. For example:
SELECT * FROM ts_debug('english', 'Paris');
alias | description | token | dictionaries |
-----------+-----------------+-------+----------------+
asciiword | Word, all ASCII | Paris | {english_stem}
dictionary | lexemes
--------------+---------
| english_stem | {pari}
CREATE TEXT SEARCH DICTIONARY my_synonym (
TEMPLATE = synonym,
SYNONYMS = my_synonyms
);
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR asciiword
WITH my_synonym, english_stem;
SELECT * FROM ts_debug('english', 'Paris');
alias | description | token |
-----------+-----------------+-------+
asciiword | Word, all ASCII | Paris |
dictionaries | dictionary | lexemes
---------------------------+------------+---------
{my_synonym,english_stem} | my_synonym | {paris}
The only parameter required by the synonym template is
SYNONYMS, which is the base name of its configuration file---my_synonyms in the above example.
The file's full name will be
‘$SHAREDIR/tsearch_data/my_synonyms.syn’
(where $SHAREDIR means the
PostgreSQL installation's shared-data directory).
The file format is just one line
per word to be substituted, with the word followed by its synonym,
separated by white space. Blank lines and trailing spaces are ignored.
The synonym template also has an optional parameter
CaseSensitive, which defaults to false. When
CaseSensitive is false, words in the synonym file
are folded to lower case, as are input tokens. When it is
true, words and tokens are not folded to lower case,
but are compared as-is.
An asterisk (*) can be placed at the end of a synonym
in the configuration file. This indicates that the synonym is a prefix.
The asterisk is ignored when the entry is used in
to_tsvector(), but when it is used in
to_tsquery(), the result will be a query item with
the prefix match marker (see
section 10.3.2 Parsing Queries).
For example, suppose we have these entries in
‘$SHAREDIR/tsearch_data/synonym_sample.syn’:
postgres pgsql postgresql pgsql postgre pgsql gogle googl indices index*
Then we will get these results:
mydb=# CREATE TEXT SEARCH DICTIONARY syn (template=synonym,
synonyms='synonym_sample');
mydb=# SELECT ts_lexize('syn','indices');
ts_lexize
-----------
{index}
(1 row)
mydb=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple);
mydb=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR
asciiword WITH syn;
mydb=# SELECT to_tsvector('tst','indices');
to_tsvector
-------------
'index':1
(1 row)
mydb=# SELECT to_tsquery('tst','indices');
to_tsquery
------------
'index':*
(1 row)
mydb=# SELECT 'indexes are very useful'::tsvector;
tsvector
---------------------------------
'are' 'indexes' 'useful' 'very'
(1 row)
mydb=# SELECT 'indexes are very useful'::tsvector @@
to_tsquery('tst','indices');
?column?
----------
t
(1 row)
| ISBN 9781906966041 | The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference | See the print edition |