| PostgreSQL Reference Manual - Volume 1 - SQL Language Reference by The PostgreSQL Global Development Group Paperback (6"x9"), 716 pages ISBN 0954612027 RRP £32.00 ($49.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
7.7.3.1 Regular Expression Details
Regular expressions (REs), as defined in
POSIX 1003.2, come in two forms:
extended REs or EREs
(roughly those of egrep), and
basic REs or BREs
(roughly those of ed).
PostgreSQL supports both forms, and
also implements some extensions
that are not in the POSIX standard, but have become widely used anyway
due to their availability in programming languages such as Perl and Tcl.
REs using these non-POSIX extensions are called
advanced REs or AREs
in this documentation. AREs are almost an exact superset of EREs,
but BREs have several notational incompatibilities (as well as being
much more limited).
We first describe the ARE and ERE forms, noting features that apply
only to AREs, and then describe how BREs differ.
Note: The form of regular expressions accepted by PostgreSQL can be chosen by setting the
regex_flavorrun-time parameter. The usual setting isadvanced, but one might chooseextendedfor maximum backwards compatibility with pre-7.4 releases of PostgreSQL.
A regular expression is defined as one or more
branches, separated by
|. It matches anything that matches one of the
branches.
A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.
A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in Table 7-12. The possible quantifiers and their meanings are shown in Table 7-13.
A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it may not be followed by a quantifier. The simple constraints are shown in Table 7-14; some more constraints are described later.
| Atom | Description
|
(re) | (where re is any regular expression)
matches a match for
re, with the match noted for possible reporting
|
(?:re) | as above, but the match is not noted for reporting
(a “non-capturing” set of parentheses)
(AREs only)
|
. | matches any single character
|
[chars] | a bracket expression,
matching any one of the chars (see
section 7.7.3.2 Bracket Expressions for more detail)
|
\k | (where k is a non-alphanumeric character)
matches that character taken as an ordinary character,
e.g. \\ matches a backslash character
|
\c | where c is alphanumeric
(possibly followed by other characters)
is an escape, see section 7.7.3.3 Regular Expression Escapes
(AREs only; in EREs and BREs, this matches c)
|
{ | when followed by a character other than a digit,
matches the left-brace character {;
when followed by a digit, it is the beginning of a
bound (see below)
|
| x | where x is a single character with no other significance, matches that character |
An RE may not end with \.
Note: Remember that the backslash (
\) already has a special meaning in PostgreSQL string literals. To write a pattern constant that contains a backslash, you must write two backslashes in the statement, assuming escape string syntax is used.
| Quantifier | Matches
|
* | a sequence of 0 or more matches of the atom
|
+ | a sequence of 1 or more matches of the atom
|
? | a sequence of 0 or 1 matches of the atom
|
{m} | a sequence of exactly m matches of the atom
|
{m,} | a sequence of m or more matches of the atom
|
{m,n} | a sequence of m through n
(inclusive) matches of the atom; m may not exceed
n
|
*? | non-greedy version of *
|
+? | non-greedy version of +
|
?? | non-greedy version of ?
|
{m}? | non-greedy version of {m}
|
{m,}? | non-greedy version of {m,}
|
{m,n}? | non-greedy version of {m,n}
|
The forms using {...}
are known as bounds.
The numbers m and n within a bound are
unsigned decimal integers with permissible values from 0 to 255 inclusive.
Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See section 7.7.3.5 Regular Expression Matching Rules for more detail.
Note: A quantifier cannot immediately follow another quantifier. A quantifier cannot begin an expression or subexpression or follow
^or|.
| Constraint | Description
|
^ | matches at the beginning of the string
|
$ | matches at the end of the string
|
(?=re) | positive lookahead matches at any point
where a substring matching re begins
(AREs only)
|
(?!re) | negative lookahead matches at any point where no substring matching re begins (AREs only) |
Lookahead constraints may not contain back references (see section 7.7.3.3 Regular Expression Escapes), and all parentheses within them are considered non-capturing.
| ISBN 0954612027 | PostgreSQL Reference Manual - Volume 1 - SQL Language Reference | See the print edition |