- publishing free software manuals
The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 454 pages
ISBN 9781906966041
RRP £14.95 ($19.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

10.6.1 Stop Words

Stop words are words that are very common, appear in almost every document, and have no discrimination value. Therefore, they can be ignored in the context of full text searching. For example, every English text contains words like a and the, so it is useless to store them in an index. However, stop words do affect the positions in tsvector, which in turn affect ranking:

SELECT to_tsvector('english','in the list of stop words');
        to_tsvector
----------------------------
 'list':3 'stop':5 'word':6

The missing positions 1,2,4 are because of stop words. Ranks calculated for documents with and without stop words are quite different:

SELECT ts_rank_cd (to_tsvector('english','in the list of stop 
  words'), to_tsquery('list & stop'));
 ts_rank_cd
------------
       0.05

SELECT ts_rank_cd (to_tsvector('english','list stop words'), 
  to_tsquery('list & stop'));
 ts_rank_cd
------------
        0.1

It is up to the specific dictionary how it treats stop words. For example, ispell dictionaries first normalize words and then look at the list of stop words, while Snowball stemmers first check the list of stop words. The reason for the different behavior is an attempt to decrease noise.

ISBN 9781906966041The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language ReferenceSee the print edition