- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

11.2.4 Character Classes and other Special Escapes

In addition, Perl defines the following:

\w Match a "word" character (alphanumeric plus "_")
\W Match a non-"word" character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
\pP Match P, named property. Use \p{Prop} for longer names.
\PP Match non-P
\X Match Unicode "eXtended grapheme cluster"
\C Match a single C char (octet) even under Unicode. NOTE: breaks up characters into their UTF-8 bytes, so you may end up with malformed pieces of UTF-8. Unsupported in lookbehind.
\1 Backreference to a specific group. '1' may actually be any positive integer.
\g1 Backreference to a specific or previous group,
\g{-1} number may be negative indicating a previous buffer and may optionally be wrapped in curly brackets for safer parsing.
\g{name} Named backreference
\k<name> Named backreference
\K Keep the stuff left of the \K, don't include it in $&
\N Any character but \n (experimental)
\v Vertical whitespace
\V Not vertical whitespace
\h Horizontal whitespace
\H Not horizontal whitespace
\R Linebreak
See 13.2 for details on \w, \W, \s, \S, \d, \D, \p, \P, \N, \v, \V, \h, and \H. See 12.8 for details on \R and \X. Note that \N has two meanings. When of the form \N{NAME}, it matches the character whose name is NAME; and similarly when of the form \N{U+wide hex char}, it matches the character whose Unicode ordinal is wide hex char. Otherwise it matches any character but \n. The POSIX character class syntax
is also available. Note that the [ and ] brackets are literal; they must always be used within a character class expression.
# this is correct:
$string =~ /[[:alpha:]]/;
# this is not, and will generate a warning:
$string =~ /[:alpha:]/;
The following Posix-style character classes are available:
[[:alpha:]] Any alphabetical character.
[[:alnum:]] Any alphanumerical character.
[[:ascii:]] Any character in the ASCII character set.
[[:blank:]] A GNU extension, equal to a space or a horizontal tab
[[:cntrl:]] Any control character.
[[:digit:]] Any decimal digit, equivalent to "\d".
[[:graph:]] Any printable character, excluding a space.
[[:lower:]] Any lowercase character.
[[:print:]] Any printable character, including a space.
[[:punct:]] Any graphical character excluding "word" characters.
[[:space:]] Any whitespace character. "\s" plus vertical tab ("\cK").
[[:upper:]] Any uppercase character.
[[:word:]] A Perl extension, equivalent to "\w".
[[:xdigit:]] Any hexadecimal digit.
You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension. The POSIX character classes [.cc.] and [=cc=] are recognized but not supported and trying to use them will cause an error. Details on POSIX character classes are in 13.3.5.
ISBN 9781906966027Perl Language Reference ManualSee the print edition