- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

11.10 Creating Custom RE Engines

Overloaded constants (see "Package for overloading Perl operations" (overload) in the Perl Library Reference Manual (Volume 1)) provide a simple way to extend the functionality of the RE engine.

Suppose that we want to enable a new RE escape-sequence \Y| which matches at a boundary between whitespace characters and non-whitespace characters. Note that (?=\S)(?<!\S)|(?!\S)(?<=\S) matches exactly at these positions, so we want to have each \Y| in the place of the more complicated version. We can create a module customre to do this:

package customre;
use overload;
sub import {
  die "No argument to customre::import allowed" if @_;
  overload::constant 'qr' => \&convert;
sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
# We must also take care of not escaping the legitimate \\Y|
# sequence, hence the presence of '\\' in the conversion rules.
my %rules = ( '\\' => '\\\\',
              'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
sub convert {
  my $re = shift;
  $re =~ s{
            \\ ( \\ | Y . )
          { $rules{$1} or invalid($re,$1) }sgex;
  return $re;

Now use customre enables the new escape in constant regular expressions, i.e., those without any runtime variable interpolations. As documented in "Package for overloading Perl operations" (overload) in the Perl Library Reference Manual (Volume 1), this conversion will work only over literal parts of regular expressions. For \Y|$re\Y| the variable part of this regular expression needs to be converted explicitly (but only if the special meaning of \Y| should be enabled inside $re):

use customre;
$re = <>;
chomp $re;
$re = customre::convert $re;
ISBN 9781906966027Perl Language Reference ManualSee the print edition