- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

11.9 Combining RE Pieces

Each of the elementary pieces of regular expressions which were described before (such as ab or \Z) could match at most one substring at the given position of the input string. However, in a typical regular expression these elementary pieces are combined into more complicated patterns using combining operators ST, S|T, S* etc (in these examples S and T are regular subexpressions).

Such combinations can include alternatives, leading to a problem of choice: if we match a regular expression a|ab against "abc", will it match substring "a" or "ab"? One way to describe which substring is actually matched is the concept of backtracking (see 11.5). However, this description is too low-level and makes you think in terms of a particular implementation.

Another description starts with notions of "better"/"worse". All the substrings which may be matched by the given regular expression can be sorted from the "best" match to the "worst" match, and it is the "best" match which is chosen. This substitutes the question of "what is chosen?" by the question of "which matches are better, and which are worse?".

Again, for elementary pieces there is no such question, since at most one match at a given position is possible. This section describes the notion of better/worse for combining operators. In the description below S and T are regular subexpressions.

Consider two possible matches, AB and A'B', A and A' are substrings which can be matched by S, B and B' are substrings which can be matched by T. If A is better match for S than A', AB is a better match than A'B'. If A and A' coincide: AB is a better match than AB' if B is better match for T than B'.
When S can match, it is a better match than when only T can match. Ordering of two matches for S is the same as for S. Similar for two matches for T.
Matches as SSS...S (repeated as many times as necessary).
Matches as S{max}|S{max-1}|...|S{min+1}|S{min}.
Matches as S{min}|S{min+1}|...|S{max-1}|S{max}.
S?, S*, S+
Same as S{0,1}, S{0,BIG_NUMBER}, S{1,BIG_NUMBER} respectively.
S??, S*?, S+?
Same as S{0,1}?, S{0,BIG_NUMBER}?, S{1,BIG_NUMBER}? respectively.
Matches the best match for S and only that.
(?=S), (?<=S)
Only the best match for S is considered. (This is important only if S has capturing parentheses, and backreferences are used somewhere else in the whole regular expression.)
(?!S), (?<!S)
For this grouping operator there is no need to describe the ordering, since only whether or not S can match is important.
(??{ EXPR }), (?PARNO)
The ordering is the same as for the regular expression which is the result of EXPR, or the pattern contained by capture buffer PARNO.
Recall that which of yes-pattern or no-pattern actually matches is already determined. The ordering of the matches is the same as for the chosen subexpression.

The above recipes describe the ordering of matches at a given position. One more rule is needed to understand how a match is determined for the whole regular expression: a match at an earlier position is always better than a match at a later position.

ISBN 9781906966027Perl Language Reference ManualSee the print edition