- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

11.4 Special Backtracking Control Verbs

WARNING: These patterns are experimental and subject to change or removal in a future version of Perl. Their usage in production code should be noted to avoid problems during upgrades.

These special patterns are generally of the form (*VERB:ARG). Unless otherwise stated the ARG argument is optional; in some cases, it is forbidden.

Any pattern containing a special backtracking verb that allows an argument has the special behaviour that when executed it sets the current package's $REGERROR and $REGMARK variables. When doing so the following rules apply:

On failure, the $REGERROR variable will be set to the ARG value of the verb pattern, if the verb was involved in the failure of the match. If the ARG part of the pattern was omitted, then $REGERROR will be set to the name of the last (*MARK:NAME) pattern executed, or to TRUE if there was none. Also, the $REGMARK variable will be set to FALSE.

On a successful match, the $REGERROR variable will be set to FALSE, and the $REGMARK variable will be set to the name of the last (*MARK:NAME) pattern executed. See the explanation for the (*MARK:NAME) verb below for more details.

NOTE: $REGERROR and $REGMARK are not magic variables like $1 and most other regex related variables. They are not local to a scope, nor readonly, but instead are volatile package variables similar to $AUTOLOAD. Use local to localize changes to them to a specific scope if necessary.

If a pattern does not contain a special backtracking verb that allows an argument, then $REGERROR and $REGMARK are not touched at all.

Verbs that take an argument
(*PRUNE) (*PRUNE:NAME)
This zero-width pattern prunes the backtracking tree at the current point when backtracked into on failure. Consider the pattern A (*PRUNE) B, where A and B are complex patterns. Until the (*PRUNE) verb is reached, A may backtrack as necessary to match. Once it is reached, matching continues in B, which may also backtrack as necessary; however, should B not match, then no further backtracking will take place, and the pattern will fail outright at the current starting position. The following example counts all the possible matching strings in a pattern (without actually matching any of them).
'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
print "Count=$count\n";
which produces:
aaab
aaa
aa
a
aab
aa
a
ab
a
Count=9
If we add a (*PRUNE) before the count like the following
'aaab' =~ /a+b?(*PRUNE)(?{print "$&\n"; $count++})(*FAIL)/;
print "Count=$count\n";
we prevent backtracking and find the count of the longest matching at each matching starting point like so:
aaab
aab
ab
Count=3
Any number of (*PRUNE) assertions may be used in a pattern. See also (?>pattern) and possessive quantifiers for other ways to control backtracking. In some cases, the use of (*PRUNE) can be replaced with a (?>pattern) with no functional difference; however, (*PRUNE) can be used to handle cases that cannot be expressed using a (?>pattern) alone.
(*SKIP) (*SKIP:NAME)
This zero-width pattern is similar to (*PRUNE), except that on failure it also signifies that whatever text that was matched leading up to the (*SKIP) pattern being executed cannot be part of any match of this pattern. This effectively means that the regex engine "skips" forward to this position on failure and tries to match again, (assuming that there is sufficient room to match). The name of the (*SKIP:NAME) pattern has special significance. If a (*MARK:NAME) was encountered while matching, then it is that position which is used as the "skip point". If no (*MARK) of that name was encountered, then the (*SKIP) operator has no effect. When used without a name the "skip point" is where the match point was when executing the (*SKIP) pattern. Compare the following to the examples in (*PRUNE), note the string is twice as long:
'aaabaaab' =~ /a+b?(*SKIP)(?{print "$&\n"; $count++})(*FAIL)/;
print "Count=$count\n";
outputs
aaab
aaab
Count=2
Once the 'aaab' at the start of the string has matched, and the (*SKIP) executed, the next starting point will be where the cursor was when the (*SKIP) was executed.
(*MARK:NAME) (*:NAME)
(*MARK:NAME) (*:NAME) This zero-width pattern can be used to mark the point reached in a string when a certain part of the pattern has been successfully matched. This mark may be given a name. A later (*SKIP) pattern will then skip forward to that point if backtracked into on failure. Any number of (*MARK) patterns are allowed, and the NAME portion may be duplicated. In addition to interacting with the (*SKIP) pattern, (*MARK:NAME) can be used to "label" a pattern branch, so that after matching, the program can determine which branches of the pattern were involved in the match. When a match is successful, the $REGMARK variable will be set to the name of the most recently executed (*MARK:NAME) that was involved in the match. This can be used to determine which branch of a pattern was matched without using a separate capture buffer for each branch, which in turn can result in a performance improvement, as perl cannot optimize /(?:(x)|(y)|(z))/ as efficiently as something like /(?:x(*MARK:x)|y(*MARK:y)|z(*MARK:z))/. When a match has failed, and unless another verb has been involved in failing the match and has provided its own name to use, the $REGERROR variable will be set to the name of the most recently executed (*MARK:NAME). See (*SKIP) for more details. As a shortcut (*MARK:NAME) can be written (*:NAME).
(*THEN) (*THEN:NAME)
This is similar to the "cut group" operator :: from Perl 6. Like (*PRUNE), this verb always matches, and when backtracked into on failure, it causes the regex engine to try the next alternation in the innermost enclosing group (capturing or otherwise). Its name comes from the observation that this operation combined with the alternation operator (|) can be used to create what is essentially a pattern-based if/then/else block:
( COND (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ )
Note that if this operator is used and NOT inside of an alternation then it acts exactly like the (*PRUNE) operator.
/ A (*PRUNE) B /
is the same as
/ A (*THEN) B /
but
/ ( A (*THEN) B | C (*THEN) D ) /
is not the same as
/ ( A (*PRUNE) B | C (*PRUNE) D ) /
as after matching the A but failing on the B the (*THEN) verb will backtrack and try C; but the (*PRUNE) verb will simply fail.
(*COMMIT)
This is the Perl 6 "commit pattern" <commit> or :::. It's a zero-width pattern similar to (*SKIP), except that when backtracked into on failure it causes the match to fail outright. No further attempts to find a valid match by advancing the start pointer will occur again. For example,
'aaabaaab' =~ 
          /a+b?(*COMMIT)(?{print "$&\n"; $count++})(*FAIL)/;
print "Count=$count\n";
outputs
aaab
Count=1
In other words, once the (*COMMIT) has been entered, and if the pattern does not match, the regex engine will not try any further matching on the rest of the string.
Verbs without an argument
(*FAIL) (*F)
This pattern matches nothing and always fails. It can be used to force the engine to backtrack. It is equivalent to (?!), but easier to read. In fact, (?!) gets optimised into (*FAIL) internally. It is probably useful only when combined with (?{}) or (??{}).
(*ACCEPT)
WARNING: This feature is highly experimental. It is not recommended for production code. This pattern matches nothing and causes the end of successful matching at the point at which the (*ACCEPT) pattern was encountered, regardless of whether there is actually more to match in the string. When inside of a nested pattern, such as recursion, or in a subpattern dynamically generated via (??{}), only the innermost pattern is ended immediately. If the (*ACCEPT) is inside of capturing buffers then the buffers are marked as ended at the point at which the (*ACCEPT) was encountered. For instance:
'AB' =~ /(A (A|B(*ACCEPT)|C) D)(E)/x;
will match, and $1 will be AB and $2 will be B, $3 will not be set. If another branch in the inner parentheses were matched, such as in the string 'ACDE', then the D and E would have to be matched as well.
ISBN 9781906966027Perl Language Reference ManualSee the print edition