Perl Language Reference Manual by Larry Wall and others Paperback (6"x9"), 724 pages ISBN 9781906966027 RRP £29.95 ($39.95) Sales of this book support The Perl Foundation! Get a printed copy>>> 
11.9 Combining RE Pieces
Each of the elementary pieces of regular expressions which were described
before (such as ab
or \Z
) could match at most one substring
at the given position of the input string. However, in a typical regular
expression these elementary pieces are combined into more complicated
patterns using combining operators ST
, ST
, S*
etc
(in these examples S
and T
are regular subexpressions).
Such combinations can include alternatives, leading to a problem of choice:
if we match a regular expression aab
against "abc"
, will it match
substring "a"
or "ab"
? One way to describe which substring is
actually matched is the concept of backtracking (see 11.5).
However, this description is too lowlevel and makes you think
in terms of a particular implementation.
Another description starts with notions of "better"/"worse". All the substrings which may be matched by the given regular expression can be sorted from the "best" match to the "worst" match, and it is the "best" match which is chosen. This substitutes the question of "what is chosen?" by the question of "which matches are better, and which are worse?".
Again, for elementary pieces there is no such question, since at most
one match at a given position is possible. This section describes the
notion of better/worse for combining operators. In the description
below S
and T
are regular subexpressions.
ST

Consider two possible matches,
AB
andA'B'
,A
andA'
are substrings which can be matched byS
,B
andB'
are substrings which can be matched byT
. IfA
is better match forS
thanA'
,AB
is a better match thanA'B'
. IfA
andA'
coincide:AB
is a better match thanAB'
ifB
is better match forT
thanB'
. ST

When
S
can match, it is a better match than when onlyT
can match. Ordering of two matches forS
is the same as forS
. Similar for two matches forT
. S{REPEAT_COUNT}

Matches as
SSS...S
(repeated as many times as necessary). S{min,max}

Matches as
S{max}S{max1}...S{min+1}S{min}
. S{min,max}?

Matches as
S{min}S{min+1}...S{max1}S{max}
. S?
,S*
,S+

Same as
S{0,1}
,S{0,BIG_NUMBER}
,S{1,BIG_NUMBER}
respectively. S??
,S*?
,S+?

Same as
S{0,1}?
,S{0,BIG_NUMBER}?
,S{1,BIG_NUMBER}?
respectively. (?>S)

Matches the best match for
S
and only that. (?=S)
,(?<=S)

Only the best match for
S
is considered. (This is important only ifS
has capturing parentheses, and backreferences are used somewhere else in the whole regular expression.) (?!S)
,(?<!S)

For this grouping operator there is no need to describe the ordering, since
only whether or not
S
can match is important. (??{ EXPR })
,(?PARNO)
 The ordering is the same as for the regular expression which is the result of EXPR, or the pattern contained by capture buffer PARNO.
(?(condition)yespatternnopattern)

Recall that which of
yespattern
ornopattern
actually matches is already determined. The ordering of the matches is the same as for the chosen subexpression.
The above recipes describe the ordering of matches at a given position. One more rule is needed to understand how a match is determined for the whole regular expression: a match at an earlier position is always better than a match at a later position.
ISBN 9781906966027  Perl Language Reference Manual  See the print edition 