- publishing free software manuals
PostgreSQL Reference Manual - Volume 1 - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 716 pages
ISBN 0954612027
RRP £32.00 ($49.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

7.7.3.1 Regular Expression Details

Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs (roughly those of egrep), and basic REs or BREs (roughly those of ed). PostgreSQL supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used anyway due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ.

Note: The form of regular expressions accepted by PostgreSQL can be chosen by setting the regex_flavor run-time parameter. The usual setting is advanced, but one might choose extended for maximum backwards compatibility with pre-7.4 releases of PostgreSQL.

A regular expression is defined as one or more branches, separated by |. It matches anything that matches one of the branches.

A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.

A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in Table 7-12. The possible quantifiers and their meanings are shown in Table 7-13.

A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it may not be followed by a quantifier. The simple constraints are shown in Table 7-14; some more constraints are described later.

Table 7-12: Regular Expression Atoms
Atom Description
(re) (where re is any regular expression) matches a match for re, with the match noted for possible reporting
(?:re) as above, but the match is not noted for reporting (a “non-capturing” set of parentheses) (AREs only)
. matches any single character
[chars] a bracket expression, matching any one of the chars (see section 7.7.3.2 Bracket Expressions for more detail)
\k (where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g. \\ matches a backslash character
\c where c is alphanumeric (possibly followed by other characters) is an escape, see section 7.7.3.3 Regular Expression Escapes (AREs only; in EREs and BREs, this matches c)
{ when followed by a character other than a digit, matches the left-brace character {; when followed by a digit, it is the beginning of a bound (see below)
x where x is a single character with no other significance, matches that character

An RE may not end with \.

Note: Remember that the backslash (\) already has a special meaning in PostgreSQL string literals. To write a pattern constant that contains a backslash, you must write two backslashes in the statement, assuming escape string syntax is used.

Table 7-13: Regular Expression Quantifiers
Quantifier Matches
* a sequence of 0 or more matches of the atom
+ a sequence of 1 or more matches of the atom
? a sequence of 0 or 1 matches of the atom
{m} a sequence of exactly m matches of the atom
{m,} a sequence of m or more matches of the atom
{m,n} a sequence of m through n (inclusive) matches of the atom; m may not exceed n
*? non-greedy version of *
+? non-greedy version of +
?? non-greedy version of ?
{m}? non-greedy version of {m}
{m,}? non-greedy version of {m,}
{m,n}? non-greedy version of {m,n}

The forms using {...} are known as bounds. The numbers m and n within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive.

Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See section 7.7.3.5 Regular Expression Matching Rules for more detail.

Note: A quantifier cannot immediately follow another quantifier. A quantifier cannot begin an expression or subexpression or follow ^ or |.

Table 7-14: Regular Expression Constraints
Constraint Description
^ matches at the beginning of the string
$ matches at the end of the string
(?=re) positive lookahead matches at any point where a substring matching re begins (AREs only)
(?!re) negative lookahead matches at any point where no substring matching re begins (AREs only)

Lookahead constraints may not contain back references (see section 7.7.3.3 Regular Expression Escapes), and all parentheses within them are considered non-capturing.

ISBN 0954612027PostgreSQL Reference Manual - Volume 1 - SQL Language ReferenceSee the print edition