8.2.1 Supported Character Sets

Table 8-1 shows the character sets available for use in PostgreSQL.

Table 8-1: PostgreSQL Character Sets
Name (Aliases) Description (Language) Server? Bytes/Char
BIG5 (WIN950, Windows950) Big Five (Traditional Chinese) No 1-2
EUC_CN Extended UNIX Code-CN (Simplified Chinese) Yes 1-3
EUC_JP Extended UNIX Code-JP (Japanese) Yes 1-3
EUC_JIS_2004 Extended UNIX Code-JP, JIS X 0213 (Japanese) Yes 1-3
EUC_KR Extended UNIX Code-KR (Korean) Yes 1-3
EUC_TW Extended UNIX Code-TW (Traditional Chinese, Taiwanese) Yes 1-3
GB18030 National Standard (Chinese) No 1-2
GBK (WIN936, Windows936) Extended National Standard (Simplified Chinese) No 1-2
ISO_8859_5 ISO 8859-5, ECMA 113 (Latin/Cyrillic) Yes 1
ISO_8859_6 ISO 8859-6, ECMA 114 (Latin/Arabic) Yes 1
ISO_8859_7 ISO 8859-7, ECMA 118 (Latin/Greek) Yes 1
ISO_8859_8 ISO 8859-8, ECMA 121 (Latin/Hebrew) Yes 1
JOHAB JOHAB (Korean (Hangul)) No 1-3
KOI8R (KOI8) KOI8-R (Cyrillic (Russian)) Yes 1
KOI8U KOI8-U (Cyrillic (Ukrainian)) Yes 1
LATIN1 (ISO88591) ISO 8859-1, ECMA 94 (Western European) Yes 1
LATIN2 (ISO88592) ISO 8859-2, ECMA 94 (Central European) Yes 1
LATIN3 (ISO88593) ISO 8859-3, ECMA 94 (South European) Yes 1
LATIN4 (ISO88594) ISO 8859-4, ECMA 94 (North European) Yes 1
LATIN5 (ISO88599) ISO 8859-9, ECMA 128 (Turkish) Yes 1
LATIN6 (ISO885910) ISO 8859-10, ECMA 144 (Nordic) Yes 1
LATIN7 (ISO885913) ISO 8859-13 (Baltic) Yes 1
LATIN8 (ISO885914) ISO 8859-14 (Celtic) Yes 1
LATIN9 (ISO885915) ISO 8859-15 (LATIN1 with Euro and accents) Yes 1
LATIN10 (ISO885916) ISO 8859-16, ASRO SR 14111 (Romanian) Yes 1
MULE_INTERNAL Mule internal code (Multilingual Emacs) Yes 1-4
SJIS (Mskanji, ShiftJIS, WIN932, Windows932) Shift JIS (Japanese) No 1-2
SHIFT_JIS_2004 Shift JIS, JIS X 0213 (Japanese) No 1-2
SQL_ASCII unspecified (see text) (any) Yes 1
UHC (WIN949, Windows949) Unified Hangul Code (Korean) No 1-2
UTF8 (Unicode) Unicode, 8-bit (all) Yes 1-4
WIN866 (ALT) Windows CP866 (Cyrillic) Yes 1
WIN874 Windows CP874 (Thai) Yes 1
WIN1250 Windows CP1250 (Central European) Yes 1
WIN1251 (WIN) Windows CP1251 (Cyrillic) Yes 1
WIN1252 Windows CP1252 (Western European) Yes 1
WIN1253 Windows CP1253 (Greek) Yes 1
WIN1254 Windows CP1254 (Turkish) Yes 1
WIN1255 Windows CP1255 (Hebrew) Yes 1
WIN1256 Windows CP1256 (Arabic) Yes 1
WIN1257 Windows CP1257 (Baltic) Yes 1
WIN1258 (ABC, TCVN, TCVN5712, VSCII) Windows CP1258 (Vietnamese) Yes 1

Not all client APIs support all the listed character sets. For example, the PostgreSQL JDBC driver does not support MULE_INTERNAL, LATIN6, LATIN8, and LATIN10.

The SQL_ASCII setting behaves considerably differently from the other settings. When the server character set is SQL_ASCII, the server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is SQL_ASCII. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting because PostgreSQL will be unable to help you by converting or validating non-ASCII characters.

