| The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group Paperback (6"x9"), 454 pages ISBN 9781906966041 RRP £14.95 ($19.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
2.1.2.3 String Constants with Unicode Escapes
PostgreSQL also supports another type
of escape syntax for strings that allows specifying arbitrary
Unicode characters by code point. A Unicode escape string
constant starts with U& (upper or lower case
letter U followed by ampersand) immediately before the opening
quote, without any spaces in between, for
example U&'foo'. (Note that this creates an
ambiguity with the operator &. Use spaces
around the operator to avoid this problem.) Inside the quotes,
Unicode characters can be specified in escaped form by writing a
backslash followed by the four-digit hexadecimal code point
number or alternatively a backslash followed by a plus sign
followed by a six-digit hexadecimal code point number. For
example, the string 'data' could be written as
U&'d\0061t\+000061'
The following less trivial example writes the Russian word “slon” (elephant) in Cyrillic letters:
U&'\0441\043B\043E\043D'
If a different escape character than backslash is desired, it can
be specified using
the UESCAPE
clause after the string, for example:
U&'d!0061t!+000061' UESCAPE '!'
The escape character can be any single character other than a hexadecimal digit, the plus sign, a single quote, a double quote, or a whitespace character.
The Unicode escape syntax works only when the server encoding is
UTF8. When other server encodings are used, only
code points in the ASCII range (up to \007F)
can be specified. Both the 4-digit and the 6-digit form can be
used to specify UTF-16 surrogate pairs to compose characters with
code points larger than U+FFFF, although the availability of the
6-digit form technically makes this unnecessary. (When surrogate
pairs are used when the server encoding is UTF8, they
are first combined into a single code point that is then encoded
in UTF-8.)
Also, the Unicode escape syntax for string constants only works
when the configuration
parameter standard_conforming_strings is
turned on. This is because otherwise this syntax could confuse
clients that parse the SQL statements to the point that it could
lead to SQL injections and similar security issues. If the
parameter is set to off, this syntax will be rejected with an
error message.
To include the escape character in the string literally, write it twice.
| ISBN 9781906966041 | The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference | See the print edition |