- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

12.8 Misc

Here we document the backslash sequences that don't fall in one of the categories above. They are:

\C always matches a single octet, even if the source string is encoded in UTF-8 format, and the character to be matched is a multi-octet character. \C was introduced in perl 5.6. Mnemonic: oCtet.
This is new in perl 5.10.0. Anything that is matched left of \K is not included in $& - and will not be replaced if the pattern is used in a substitution. This will allow you to write s/PAT1 \K PAT2/REPL/x instead of s/(PAT1) PAT2/${1}REPL/x or s/(?<=PAT1) PAT2/REPL/x. Mnemonic: Keep.
This is a new experimental feature in perl 5.12.0. It matches any character that is not a newline. It is a short-hand for writing [^\n], and is identical to the . metasymbol, except under the /s flag, which changes the meaning of ., but not \N. Note that \N{...} can mean a named or numbered character ( 12.3.3). Mnemonic: Complement of \n.
\R matches a generic newline, that is, anything that is considered a newline by Unicode. This includes all characters matched by \v (vertical whitespace), and the multi character sequence "\x0D\x0A" (carriage return followed by a line feed, aka the network newline, or the newline used in Windows text files). \R is equivalent to (?>\x0D\x0A)|\v). Since \R can match a sequence of more than one character, it cannot be put inside a bracketed character class; /[\R]/ is an error; use \v instead. \R was introduced in perl 5.10.0. Mnemonic: none really. \R was picked because PCRE already uses \R, and more importantly because Unicode recommends such a regular expression metacharacter, and suggests \R as the notation.
This matches a Unicode extended grapheme cluster. \X matches quite well what normal (non-Unicode-programmer) usage would consider a single character. As an example, consider a G with some sort of diacritic mark, such as an arrow. There is no such single character in Unicode, but one can be composed by using a G followed by a Unicode "COMBINING UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it were a single character. Mnemonic: eXtended Unicode character.
ISBN 9781906966027Perl Language Reference ManualSee the print edition