|The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 454 pages
RRP £14.95 ($19.95)
Sales of this book support the PostgreSQL project! Get a printed copy>>>
10.4.1 Manipulating Documents
section 10.3.1 Parsing Documents showed how raw textual
documents can be converted into
PostgreSQL also provides functions and
operators that can be used to manipulate documents that are already
tsvectorconcatenation operator returns a vector which combines the lexemes and positional information of the two vectors given as arguments. Positions and weight labels are retained during the concatenation. Positions appearing in the right-hand vector are offset by the largest position mentioned in the left-hand vector, so that the result is nearly equivalent to the result of performing
to_tsvectoron the concatenation of the two original document strings. (The equivalence is not exact, because any stop-words removed from the end of the left-hand argument will not affect the result, whereas they would have affected the positions of the lexemes in the right-hand argument if textual concatenation were used.) One advantage of using concatenation in the vector form, rather than concatenating text before applying
to_tsvector, is that you can use different configurations to parse different sections of the document. Also, because the
setweightfunction marks all lexemes of the given vector the same way, it is necessary to parse the text and do
setweightbefore concatenating if you want to label different parts of the document with different weights.
setweightreturns a copy of the input vector in which every position has been labeled with the given weight, either
Dis the default for new vectors and as such is not displayed on output.) These labels are retained when vectors are concatenated, allowing words from different parts of a document to be weighted differently by ranking functions. Note that weight labels apply to positions, not lexemes. If the input vector has been stripped of positions then
- Returns the number of lexemes stored in the vector.
- Returns a vector which lists the same lexemes as the given vector, but which lacks any position or weight information. While the returned vector is much less useful than an unstripped vector for relevance ranking, it will usually be much smaller.
|ISBN 9781906966041||The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference||See the print edition|