| The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group Paperback (6"x9"), 454 pages ISBN 9781906966041 RRP £14.95 ($19.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
10.8.1 Configuration Testing
The function ts_debug allows easy testing of a
text search configuration.
ts_debug([ configregconfig, ] documenttext, OUT aliastext, OUT descriptiontext, OUT tokentext, OUT dictionariesregdictionary[], OUT dictionaryregdictionary, OUT lexemestext[]) returns setof record
ts_debug displays information about every token of
document as produced by the
parser and processed by the configured dictionaries. It uses the
configuration specified by config,
or default_text_search_config if that argument is
omitted.
ts_debug returns one row for each token identified in the text
by the parser. The columns returned are
-
alias
text---short name of the token type -
description
text---description of the token type -
token
text---text of the token -
dictionaries
regdictionary[]---the dictionaries selected by the configuration for this token type -
dictionary
regdictionary---the dictionary that recognized the token, orNULLif none did -
lexemes
text[]---the lexeme(s) produced by the dictionary that recognized the token, orNULLif none did; an empty array ({}) means it was recognized as a stop word
Here is a simple example:
SELECT * FROM ts_debug('english','a fat cat sat on a mat -
it ate a fat rats');
alias | description | token | dictionaries |
-----------+-----------------+-------+----------------+
asciiword | Word, all ASCII | a | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | fat | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | cat | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | sat | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | on | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | a | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | mat | {english_stem}
blank | Space symbols | | {}
blank | Space symbols | - | {}
asciiword | Word, all ASCII | it | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | ate | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | a | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | fat | {english_stem}
blank | Space symbols | | {}
asciiword | Word, all ASCII | rats | {english_stem}
dictionary | lexemes
--------------+---------
| english_stem | {}
| |
| english_stem | {fat}
| |
| english_stem | {cat}
| |
| english_stem | {sat}
| |
| english_stem | {}
| |
| english_stem | {}
| |
| english_stem | {mat}
| |
| |
| english_stem | {}
| |
| english_stem | {ate}
| |
| english_stem | {}
| |
| english_stem | {fat}
| |
| english_stem | {rat}
For a more extensive demonstration, we
first create a public.english configuration and
Ispell dictionary for the English language:
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY =
pg_catalog.english );
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);
ALTER TEXT SEARCH CONFIGURATION public.english
ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
SELECT * FROM ts_debug('public.english','The Brightest
supernovaes');
alias | description | token |
-----------+-----------------+-------------+
asciiword | Word, all ASCII | The |
blank | Space symbols | |
asciiword | Word, all ASCII | Brightest |
blank | Space symbols | |
asciiword | Word, all ASCII | supernovaes |
dictionaries | dictionary | lexemes
-------------------------------+----------------+-------------
{english_ispell,english_stem} | english_ispell | {}
{} | |
{english_ispell,english_stem} | english_ispell | {bright}
{} | |
{english_ispell,english_stem} | english_stem | {supernova}
In this example, the word Brightest was recognized by the
parser as an ASCII word (alias asciiword).
For this token type the dictionary list is
english_ispell and
english_stem. The word was recognized by
english_ispell, which reduced it to the noun
bright. The word supernovaes is
unknown to the english_ispell dictionary so it
was passed to the next dictionary, and, fortunately, was recognized (in
fact, english_stem is a Snowball dictionary which
recognizes everything; that is why it was placed at the end of the
dictionary list).
The word The was recognized by the
english_ispell dictionary as a stop word (section 10.6.1 Stop Words) and will not be indexed.
The spaces are discarded too, since the configuration provides no
dictionaries at all for them.
You can reduce the width of the output by explicitly specifying which columns you want to see:
SELECT alias, token, dictionary, lexemes
FROM ts_debug('public.english','The Brightest supernovaes');
alias | token | dictionary | lexemes
-----------+-------------+----------------+-------------
asciiword | The | english_ispell | {}
blank | | |
asciiword | Brightest | english_ispell | {bright}
blank | | |
asciiword | supernovaes | english_stem | {supernova}
| ISBN 9781906966041 | The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference | See the print edition |