- publishing free software manuals
Perl Language Reference Manual
by Larry Wall and others
Paperback (6"x9"), 724 pages
ISBN 9781906966027
RRP £29.95 ($39.95)

Sales of this book support The Perl Foundation! Get a printed copy>>>

12.3.3 Named or numbered characters

All Unicode characters have a Unicode name and numeric ordinal value. Use the \N{} construct to specify a character by either of these values.

To specify by name, the name of the character goes between the curly braces. In this case, you have to use charnames to load the Unicode names of the characters, otherwise Perl will complain.

To specify by Unicode ordinal number, use the form \N{U+wide hex character}, where wide hex character is a number in hexadecimal that gives the ordinal number that Unicode has assigned to the desired character. It is customary (but not required) to use leading zeros to pad the number to 4 digits. Thus \N{U+0041} means Latin Capital Letter A, and you will rarely see it written without the two leading zeros. \N{U+0041} means A even on EBCDIC machines (where the ordinal value of A is not 0x41).

It is even possible to give your own names to characters, and even to short sequences of characters. For details, see "Define character names for \N{named} string literal escapes" (charnames) in the Perl Library Reference Manual (Volume 1).

(There is an expanded internal form that you may see in debug output: \N{U+wide hex character.wide hex character...}. The ... means any number of these wide hex characters separated by dots. This represents the sequence formed by the characters. This is an internal form only, subject to change, and you should not try to use it yourself.)

Mnemonic: Named character.

Note that a character that is expressed as a named or numbered character is considered as a character without special meaning by the regex engine, and will match "as is".

ISBN 9781906966027Perl Language Reference ManualSee the print edition