| The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group Paperback (6"x9"), 454 pages ISBN 9781906966041 RRP £14.95 ($19.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
10.6.1 Stop Words
Stop words are words that are very common, appear in almost every
document, and have no discrimination value. Therefore, they can be ignored
in the context of full text searching. For example, every English text
contains words like a and the, so it is
useless to store them in an index. However, stop words do affect the
positions in tsvector, which in turn affect ranking:
SELECT to_tsvector('english','in the list of stop words');
to_tsvector
----------------------------
'list':3 'stop':5 'word':6
The missing positions 1,2,4 are because of stop words. Ranks calculated for documents with and without stop words are quite different:
SELECT ts_rank_cd (to_tsvector('english','in the list of stop
words'), to_tsquery('list & stop'));
ts_rank_cd
------------
0.05
SELECT ts_rank_cd (to_tsvector('english','list stop words'),
to_tsquery('list & stop'));
ts_rank_cd
------------
0.1
It is up to the specific dictionary how it treats stop words. For example,
ispell dictionaries first normalize words and then
look at the list of stop words, while Snowball stemmers
first check the list of stop words. The reason for the different
behavior is an attempt to decrease noise.
| ISBN 9781906966041 | The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference | See the print edition |