- publishing free software manuals
The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 454 pages
ISBN 9781906966041
RRP £14.95 ($19.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>> Regular Expression Details

PostgreSQL's regular expressions are implemented using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his manual.

Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs (roughly those of egrep), and basic REs or BREs (roughly those of ed). PostgreSQL supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ.

Note: PostgreSQL always initially presumes that a regular expression follows the ARE rules. However, the more limited ERE or BRE rules can be chosen by prepending an embedded option to the RE pattern, as described in section Regular Expression Metasyntax. This can be useful for compatibility with applications that expect exactly the POSIX 1003.2 rules.

A regular expression is defined as one or more branches, separated by |. It matches anything that matches one of the branches.

A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string.

A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in Table 7-12. The possible quantifiers and their meanings are shown in Table 7-13.

A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it cannot be followed by a quantifier. The simple constraints are shown in Table 7-14; some more constraints are described later.

Table 7-12: Regular Expression Atoms
Atom Description
(re) (where re is any regular expression) matches a match for re, with the match noted for possible reporting
(?:re) as above, but the match is not noted for reporting (a “non-capturing” set of parentheses) (AREs only)
. matches any single character
[chars] a bracket expression, matching any one of the chars (see section Bracket Expressions for more detail)
\k (where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \\ matches a backslash character
\c where c is alphanumeric (possibly followed by other characters) is an escape, see section Regular Expression Escapes (AREs only; in EREs and BREs, this matches c)
{ when followed by a character other than a digit, matches the left-brace character {; when followed by a digit, it is the beginning of a bound (see below)
x where x is a single character with no other significance, matches that character

An RE cannot end with \.

Note: Remember that the backslash (\) already has a special meaning in PostgreSQL string literals. To write a pattern constant that contains a backslash, you must write two backslashes in the statement, assuming escape string syntax is used (see section String Constants).

Table 7-13: Regular Expression Quantifiers
Quantifier Matches
* a sequence of 0 or more matches of the atom
+ a sequence of 1 or more matches of the atom
? a sequence of 0 or 1 matches of the atom
{m} a sequence of exactly m matches of the atom
{m,} a sequence of m or more matches of the atom
{m,n} a sequence of m through n (inclusive) matches of the atom; m cannot exceed n
*? non-greedy version of *
+? non-greedy version of +
?? non-greedy version of ?
{m}? non-greedy version of {m}
{m,}? non-greedy version of {m,}
{m,n}? non-greedy version of {m,n}

The forms using {...} are known as bounds. The numbers m and n within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive.

Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See section Regular Expression Matching Rules for more detail.

Note: A quantifier cannot immediately follow another quantifier, e.g., ** is invalid. A quantifier cannot begin an expression or subexpression or follow ^ or |.

Table 7-14: Regular Expression Constraints
Constraint Description
^ matches at the beginning of the string
$ matches at the end of the string
(?=re) positive lookahead matches at any point where a substring matching re begins (AREs only)
(?!re) negative lookahead matches at any point where no substring matching re begins (AREs only)

Lookahead constraints cannot contain back references (see section Regular Expression Escapes), and all parentheses within them are considered non-capturing.

ISBN 9781906966041The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language ReferenceSee the print edition