- publishing free software manuals
The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 454 pages
ISBN 9781906966041
RRP £14.95 ($19.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.8.1 Configuration Testing

The function ts_debug allows easy testing of a text search configuration.

ts_debug([ config regconfig, ] document text,
         OUT alias text,
         OUT description text,
         OUT token text,
         OUT dictionaries regdictionary[],
         OUT dictionary regdictionary,
         OUT lexemes text[])
         returns setof record

ts_debug displays information about every token of document as produced by the parser and processed by the configured dictionaries. It uses the configuration specified by config, or default_text_search_config if that argument is omitted.

ts_debug returns one row for each token identified in the text by the parser. The columns returned are

Here is a simple example:

SELECT * FROM ts_debug('english','a fat  cat sat on a mat - 
  it ate a fat rats');
   alias   |   description   | token |  dictionaries  |
-----------+-----------------+-------+----------------+
 asciiword | Word, all ASCII | a     | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | fat   | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | cat   | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | sat   | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | on    | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | a     | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | mat   | {english_stem}
 blank     | Space symbols   |       | {}
 blank     | Space symbols   | -     | {}
 asciiword | Word, all ASCII | it    | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | ate   | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | a     | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | fat   | {english_stem}
 blank     | Space symbols   |       | {}
 asciiword | Word, all ASCII | rats  | {english_stem}
  dictionary  | lexemes
--------------+---------
 | english_stem | {}
 |              |
 | english_stem | {fat}
 |              |
 | english_stem | {cat}
 |              |
 | english_stem | {sat}
 |              |
 | english_stem | {}
 |              |
 | english_stem | {}
 |              |
 | english_stem | {mat}
 |              |
 |              |
 | english_stem | {}
 |              |
 | english_stem | {ate}
 |              |
 | english_stem | {}
 |              |
 | english_stem | {fat}
 |              |
 | english_stem | {rat}

For a more extensive demonstration, we first create a public.english configuration and Ispell dictionary for the English language:

CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = 
  pg_catalog.english );

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = english,
    AffFile = english,
    StopWords = english
);

ALTER TEXT SEARCH CONFIGURATION public.english
   ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
SELECT * FROM ts_debug('public.english','The Brightest 
  supernovaes');
   alias   |   description   |    token    |
-----------+-----------------+-------------+
 asciiword | Word, all ASCII | The         |
 blank     | Space symbols   |             |
 asciiword | Word, all ASCII | Brightest   |
 blank     | Space symbols   |             |
 asciiword | Word, all ASCII | supernovaes |
         dictionaries          |   dictionary   |   lexemes
-------------------------------+----------------+-------------
 {english_ispell,english_stem} | english_ispell | {}
 {}                            |                |
 {english_ispell,english_stem} | english_ispell | {bright}
 {}                            |                |
 {english_ispell,english_stem} | english_stem   | {supernova}

In this example, the word Brightest was recognized by the parser as an ASCII word (alias asciiword). For this token type the dictionary list is english_ispell and english_stem. The word was recognized by english_ispell, which reduced it to the noun bright. The word supernovaes is unknown to the english_ispell dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact, english_stem is a Snowball dictionary which recognizes everything; that is why it was placed at the end of the dictionary list).

The word The was recognized by the english_ispell dictionary as a stop word (section 10.6.1 Stop Words) and will not be indexed. The spaces are discarded too, since the configuration provides no dictionaries at all for them.

You can reduce the width of the output by explicitly specifying which columns you want to see:

SELECT alias, token, dictionary, lexemes
FROM ts_debug('public.english','The Brightest supernovaes');
   alias   |    token    |   dictionary   |   lexemes
-----------+-------------+----------------+-------------
 asciiword | The         | english_ispell | {}
 blank     |             |                |
 asciiword | Brightest   | english_ispell | {bright}
 blank     |             |                |
 asciiword | supernovaes | english_stem   | {supernova}
ISBN 9781906966041The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language ReferenceSee the print edition