- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

7.30 Regexp Quote-Like Operators

Here are the quote-like operators that apply to pattern matching and related activities.

qr/STRING/msixpo
This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in m/PATTERN/. If "'" is used as the delimiter, no interpolation is done. Returns a Perl value which may be used instead of the corresponding /STRING/msixpo expression. The returned value is a normalized version of the original pattern. It magically differs from a string containing the same characters: ref(qr/x/) returns "Regexp", even though dereferencing the result returns undef. For example,
$rex = qr/my.STRING/is;
print $rex;                 # prints (?si-xm:my.STRING)
s/$rex/foo/;
is equivalent to
s/my.STRING/foo/is;
The result may be used as a subpattern in a match:
$re = qr/$pattern/;
$string =~ /foo${re}bar/;   # can be interpolated in other patterns
$string =~ $re;             # or used standalone
$string =~ /$re/;           # or this way
Since Perl may compile the pattern at the moment of execution of qr() operator, using qr() may have speed advantages in some situations, notably if the result of qr() is used standalone:
sub match {
    my $patterns = shift;
    my @compiled = map qr/$_/i, @$patterns;
    grep {
        my $success = 0;
        foreach my $pat (@compiled) {
            $success = 1, last if /$pat/;
        }
        $success;
    } @_;
}
Precompilation of the pattern into an internal representation at the moment of qr() avoids a need to recompile the pattern every time a match /$pat/ is attempted. (Perl has many other internal optimizations, but none would be triggered in the above example if we did not use qr() operator.) Options are:
m Treat string as multiple lines.
s Treat string as single line. (Make . match a newline)
i Do case-insensitive pattern matching.
x Use extended regular expressions.
p When matching preserve a copy of the matched string so that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
o Compile pattern only once.
If a precompiled pattern is embedded in a larger pattern then the effect of 'msixp' will be propagated appropriately. The effect of the 'o' modifier has is not propagated, being restricted to those patterns explicitly using it. See 11 for additional information on valid syntax for STRING, and for a detailed look at the semantics of regular expressions.
m/PATTERN/msixpogc
/PATTERN/msixpogc
Searches a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails. If no string is specified via the =~ or !~ operator, the $_ string is searched. (The string specified with =~ need not be an lvalue--it may be the result of an expression evaluation, but remember the =~ binds rather tightly.) See also 11. See "Perl locale handling (internationalization and localization)" (perllocale) in the Perl Unicode and Locales Manual for discussion of additional considerations that apply when use locale is in effect. Options are as described in qr//; in addition, the following match process modifiers are available:
g Match globally, i.e., find all occurrences.
c Do not reset search position on a failed match when /g is in effect.
If "/" is the delimiter then the initial m is optional. With the m you can use any pair of non-whitespace characters as delimiters. This is particularly useful for matching path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is the delimiter, then the match-only-once rule of ?PATTERN? applies. If "'" is the delimiter, no interpolation is performed on the PATTERN. When using a character valid in an identifier, whitespace is required after the m. PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every time the pattern search is evaluated, except for when the delimiter is a single quote. (Note that $(, $), and $| are not interpolated because they look like end-of-string tests.) If you want such a pattern to be compiled only once, add a /o after the trailing delimiter. This avoids expensive run-time recompilations, and is useful when the value you are interpolating won't change over the life of the script. However, mentioning /o constitutes a promise that you won't change the variables in the pattern. If you change them, Perl won't even notice. See also 7.30.
The empty pattern //
If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. In this case, only the g and c flags on the empty pattern is honoured - the other flags are taken from the original pattern. If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match). Note that it's possible to confuse Perl into thinking // (the empty regex) is really // (the defined-or operator). Perl is usually pretty good about this, but some pathological cases might trigger this, such as $a/// (is that ($a) / (//) or $a // /?) and print $fh // (print $fh(// or print($fh //?). In all of these examples, Perl will assume you meant defined-or. If you meant the empty regex, just use parentheses or spaces to disambiguate, or even prefix the empty regex with an m (so // becomes m//).
Matching in list context
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3...). (Note that here $1 etc. are also set, and that this differs from Perl 4's behavior.) When there are no parentheses in the pattern, the return value is the list (1) for success. With or without parentheses, an empty list is returned upon failure. Examples:
open(TTY, '/dev/tty');
<TTY> =~ /^y/i && foo();    # do foo if desired
if (/Version: *([0-9.]*)/) { $version = $1; }
next if m#^/usr/spool/uucp#;
# poor man's grep
$arg = shift;
while (<>) {
    print if /$arg/o;       # compile only once
}
if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were assigned, i.e., if the pattern matched. The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see . A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (e.g. m//gc). Modifying the target string also resets the search position.
\G assertion
You can intermix m//g matches with m/\G.../g, where \G is a zero-width assertion that matches the exact position where the previous m//g, if any, left off. Without the /g modifier, the \G assertion still anchors at pos(), but the match is of course only attempted once. Using \G without /g on a target string that has not previously had a /g match applied to it is the same as using the \A assertion to match the beginning of the string. Note also that, currently, \G is only properly supported when anchored at the very beginning of the pattern. Examples:
# list context
($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
# scalar context
$/ = "";
while (defined($paragraph = <>)) {
    while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
        $sentences++;
    }
}
print "$sentences\n";
# using m//gc with \G
$_ = "ppooqppqq";
while ($i++ < 2) {
    print "1: '";
    print $1 while /(o)/gc; print "', pos=", pos, "\n";
    print "2: '";
    print $1 if /\G(q)/gc;  print "', pos=", pos, "\n";
    print "3: '";
    print $1 while /(p)/gc; print "', pos=", pos, "\n";
}
print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
The last example should print:
1: 'oo', pos=4
2: 'q', pos=5
3: 'pp', pos=7
1: ”, pos=7
2: 'q', pos=8
3: ”, pos=8
Final: 'q', pos=8
Notice that the final match matched q instead of p, which a match without the \G anchor would have done. Also note that the final match did not update pos. pos is only updated on a /g match. If the final match did indeed match p, it's a good bet that you're running an older (pre-5.6.0) Perl. A useful idiom for lex-like scanners is /\G.../gc. You can combine several regexps like this to process a string part-by-part, doing different actions depending on which regexp matched. Each regexp tries to match where the previous one leaves off.
$_ = <<'EOL';
     $url = URI::URL->new( "http://example.com/" ); 
     die if $url eq "xXx";
EOL
LOOP:
{
    print(" digits"), redo LOOP
      if /\G\d+\b[,.;]?\s*/gc;
    print(" lowercase"), redo LOOP
      if /\G[a-z]+\b[,.;]?\s*/gc;
    print(" UPPERCASE"), redo LOOP
      if /\G[A-Z]+\b[,.;]?\s*/gc;
    print(" Capitalized"), redo LOOP
      if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
    print(" MiXeD"), redo LOOP
      if /\G[A-Za-z]+\b[,.;]?\s*/gc;
    print(" alphanumeric"), redo LOOP
      if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
    print(" line-noise"), redo LOOP
      if /\G[^A-Za-z0-9]+/gc;
    print ". That's all!\n";
}
Here is the output (split into several lines):
line-noise lowercase line-noise lowercase UPPERCASE line-noise
UPPERCASE line-noise lowercase line-noise lowercase line-noise
lowercase lowercase line-noise lowercase lowercase line-noise
MiXeD line-noise. That's all!
?PATTERN?
This is just like the /pattern/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only ?? patterns local to the current package are reset.
while (<>) {
    if (?^$?) {
                        # blank line between header and body
    }
} continue {
    reset if eof;       # clear ?? status for next file
}
This usage is vaguely deprecated, which means it just might possibly be removed in some distant future version of Perl, perhaps somewhere around the year 2168.
s/PATTERN/REPLACEMENT/msixpogce
Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string). If no string is specified via the =~ or !~ operator, the $_ variable is searched and modified. (The string specified with =~ must be scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) If the delimiter chosen is a single quote, no interpolation is done on either the PATTERN or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern at run-time. If you want the pattern compiled only once the first time the variable is interpolated, use the /o option. If the pattern evaluates to the empty string, the last successfully executed regular expression is used instead. See 11 for further explanation on these. See "Perl locale handling (internationalization and localization)" (perllocale) in the Perl Unicode and Locales Manual for discussion of additional considerations that apply when use locale is in effect. Options are as with m// with the addition of the following replacement specific options:
e   Evaluate the right side as an expression.
ee  Evaluate the right side as a string then eval the result
Any non-whitespace delimiter may replace the slashes. Add space after the s when using a character allowed in identifiers. If single quotes are used, no interpretation is done on the replacement string (the /e modifier overrides this, however). Unlike Perl 4, Perl 5 treats backticks as normal delimiters; the replacement text is not evaluated as a command. If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g., s(foo)(bar) or s<foo>/bar/. A /e will cause the replacement portion to be treated as a full-fledged Perl expression and evaluated right then and there. It is, however, syntax checked at compile-time. A second e modifier will cause the replacement portion to be evaled before being run as a Perl expression. Examples:
s/\bgreen\b/mauve/g;                # don't change wintergreen
$path =~ s|/usr/bin|/usr/local/bin|;
s/Login: $foo/Login: $bar/; # run-time pattern
($foo = $bar) =~ s/this/that/;      # copy first, then change
$count = ($paragraph =~ s/Mister\b/Mr./g);  # get change-count
$_ = 'abc123xyz';
s/\d+/$&*2/e;               # yields 'abc246xyz'
s/\d+/sprintf("%5d",$&)/e;  # yields 'abc  246xyz'
s/\w/$& x 2/eg;             # yields 'aabbcc  224466xxyyzz'
s/%(.)/$percent{$1}/g;      # change percent escapes; no /e
s/%(.)/$percent{$1} || $&/ge;       # expr now, so /e
s/^=(\w+)/pod($1)/ge;       # use function call
# expand variables in $_, but dynamics only, using
# symbolic dereferencing
s/\$(\w+)/${$1}/g;
# Add one to the value of any numbers in the string
s/(\d+)/1 + $1/eg;
# This will expand any embedded scalar variable
# (including lexicals) in $_ : First $1 is interpolated
# to the variable name, and then evaluated
s/(\$\w+)/$1/eeg;
# Delete (most) C comments.
$program =~ s {
    /\*     # Match the opening delimiter.
    .*?     # Match a minimal number of characters.
    \*/     # Match the closing delimiter.
} []gsx;
s/^\s*(.*?)\s*$/$1/;        # trim whitespace in $_, expensively
for ($variable) {           # trim whitespace in $variable, cheap
    s/^\s+//;
    s/\s+$//;
}
s/([^ ]*) *([^ ]*)/$2 $1/;  # reverse 1st two fields
Note the use of $ instead of \ in the last example. Unlike sed, we use the \<digit> form in only the left hand side. Anywhere else it's $<digit>. Occasionally, you can't use just a /g to get all the changes to occur that you might want. Here are two common cases:
# put commas in the right places in an integer
1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
# expand tabs to 8-column spacing
1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
ISBN 9781906966027Perl Language Reference ManualSee the print edition