| The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group Paperback (6"x9"), 454 pages ISBN 9781906966041 RRP £14.95 ($19.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
7.7.3 POSIX Regular Expressions
Table 7-11 lists the available operators for pattern matching using POSIX regular expressions.
| Operator | Description
|
~ | Matches regular expression, case sensitive e.g. 'thomas' ~ '.*thomas.*'
|
~* | Matches regular expression, case insensitive e.g. 'thomas' ~* '.*Thomas.*'
|
!~ | Does not match regular expression, case sensitive e.g. 'thomas' !~ '.*Thomas.*'
|
!~* | Does not match regular expression, case insensitive e.g. 'thomas' !~* '.*vadim.*'
|
POSIX regular expressions provide a more
powerful means for pattern matching than the LIKE and
SIMILAR TO operators.
Many Unix tools such as egrep,
sed, or awk use a pattern
matching language that is similar to the one described here.
A regular expression is a character sequence that is an
abbreviated definition of a set of strings (a regular
set). A string is said to match a regular expression
if it is a member of the regular set described by the regular
expression. As with LIKE, pattern characters
match string characters exactly unless they are special characters
in the regular expression language--but regular expressions use
different special characters than LIKE does.
Unlike LIKE patterns, a
regular expression is allowed to match anywhere within a string, unless
the regular expression is explicitly anchored to the beginning or
end of the string.
Some examples:
'abc' ~ 'abc' true 'abc' ~ '^a' true 'abc' ~ '(b|d)' true 'abc' ~ '^(b|c)' false
The POSIX pattern language is described in much greater detail below.
The substring function with two parameters,
substring(string from
pattern), provides extraction of a
substring
that matches a POSIX regular expression pattern. It returns null if
there is no match, otherwise the portion of the text that matched the
pattern. But if the pattern contains any parentheses, the portion
of the text that matched the first parenthesized subexpression (the
one whose left parenthesis comes first) is
returned. You can put parentheses around the whole expression
if you want to use parentheses within it without triggering this
exception. If you need parentheses in the pattern before the
subexpression you want to extract, see the non-capturing parentheses
described below.
Some examples:
substring('foobar' from 'o.b') oob
substring('foobar' from 'o(.)b') o
The regexp_replace function provides substitution of
new text for substrings that match POSIX regular expression patterns.
It has the syntax
regexp_replace(source,
pattern, replacement
[, flags ]).
The source string is returned unchanged if
there is no match to the pattern. If there is a
match, the source string is returned with the
replacement string substituted for the matching
substring. The replacement string can contain
\n, where n is 1
through 9, to indicate that the source substring matching the
n'th parenthesized subexpression of the pattern should be
inserted, and it can contain \& to indicate that the
substring matching the entire pattern should be inserted. Write
\\ if you need to put a literal backslash in the replacement
text. (As always, remember to double backslashes written in literal
constant strings, assuming escape string syntax is used.)
The flags parameter is an optional text
string containing zero or more single-letter flags that change the
function's behavior. Flag i specifies case-insensitive
matching, while flag g specifies replacement of each matching
substring rather than only the first one. Other supported flags are
described in Table 7-19.
Some examples:
regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
fooXarYXazY
The regexp_matches function returns a text array of
all of the captured substrings resulting from matching a POSIX
regular expression pattern. It has the syntax
regexp_matches(string, pattern
[, flags ]).
The function can return no rows, one row, or multiple rows (see
the g flag below). If the pattern
does not match, the function returns no rows. If the pattern
contains no parenthesized subexpressions, then each row
returned is a single-element text array containing the substring
matching the whole pattern. If the pattern contains parenthesized
subexpressions, the function returns a text array whose
n'th element is the substring matching the
n'th parenthesized subexpression of the pattern
(not counting “non-capturing” parentheses; see below for
details).
The flags parameter is an optional text
string containing zero or more single-letter flags that change the
function's behavior. Flag g causes the function to find
each match in the string, not only the first one, and return a row for
each such match. Other supported
flags are described in Table 7-19.
Some examples:
SELECT regexp_matches('foobarbequebaz', '(bar)(beque)');
regexp_matches
----------------
{bar,beque}
(1 row)
SELECT regexp_matches('foobarbequebazilbarfbonk',
'(b[^b]+)(b[^b]+)', 'g');
regexp_matches
----------------
{bar,beque}
{bazil,barf}
(2 rows)
SELECT regexp_matches('foobarbequebaz', 'barbeque');
regexp_matches
----------------
{barbeque}
(1 row)
It is possible to force regexp_matches() to always
return one row by using a sub-select; this is particularly useful
in a SELECT target list when you want all rows
returned, even non-matching ones:
SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
The regexp_split_to_table function splits a string using a POSIX
regular expression pattern as a delimiter. It has the syntax
regexp_split_to_table(string, pattern
[, flags ]).
If there is no match to the pattern, the function returns the
string. If there is at least one match, for each match it returns
the text from the end of the last match (or the beginning of the string)
to the beginning of the match. When there are no more matches, it
returns the text from the end of the last match to the end of the string.
The flags parameter is an optional text string containing
zero or more single-letter flags that change the function's behavior.
regexp_split_to_table supports the flags described in
Table 7-19.
The regexp_split_to_array function behaves the same as
regexp_split_to_table, except that regexp_split_to_array
returns its result as an array of text. It has the syntax
regexp_split_to_array(string, pattern
[, flags ]).
The parameters are the same as for regexp_split_to_table.
Some examples:
SELECT foo FROM regexp_split_to_table('the quick brown fox
jumped over the lazy dog', E'\\s+') AS foo;
foo
--------
the
quick
brown
fox
jumped
over
the
lazy
dog
(9 rows)
SELECT regexp_split_to_array('the quick brown fox jumped over
the lazy dog', E'\\s+');
regexp_split_to_array
------------------------------------------------
{the,quick,brown,fox,jumped,over,the,lazy,dog}
(1 row)
SELECT foo FROM regexp_split_to_table('the quick brown fox',
E'\\s*') AS foo;
foo
-----
t
h
e
q
u
i
c
k
b
r
o
w
n
f
o
x
(16 rows)
As the last example demonstrates, the regexp split functions ignore
zero-length matches that occur at the start or end of the string
or immediately after a previous match. This is contrary to the strict
definition of regexp matching that is implemented by
regexp_matches, but is usually the most convenient behavior
in practice. Other software systems such as Perl use similar definitions.
| ISBN 9781906966041 | The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference | See the print edition |