| PostgreSQL Reference Manual - Volume 1 - SQL Language Reference by The PostgreSQL Global Development Group Paperback (6"x9"), 716 pages ISBN 0954612027 RRP £32.00 ($49.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
8.1 Overview
SQL is a strongly typed language. That is, every data item has an associated data type which determines its behavior and allowed usage. PostgreSQL has an extensible type system that is much more general and flexible than other SQL implementations. Hence, most type conversion behavior in PostgreSQL is governed by general rules rather than by ad hoc heuristics. This allows mixed-type expressions to be meaningful even with user-defined types.
The PostgreSQL scanner/parser divides lexical elements into only five fundamental categories: integers, non-integer numbers, strings, identifiers, and key words. Constants of most non-numeric types are first classified as strings. The SQL language definition allows specifying type names with strings, and this mechanism can be used in PostgreSQL to start the parser down the correct path. For example, the query
SELECT text 'Origin' AS "label", point '(0,0)' AS "value"; label | value --------+------- Origin | (0,0) (1 row)
has two literal constants, of type text and point.
If a type is not specified for a string literal, then the placeholder type
unknown is assigned initially, to be resolved in later
stages as described below.
There are four fundamental SQL constructs requiring distinct type conversion rules in the PostgreSQL parser:
- Function calls
- Much of the PostgreSQL type system is built around a rich set of functions. Functions can have one or more arguments. Since PostgreSQL permits function overloading, the function name alone does not uniquely identify the function to be called; the parser must select the right function based on the data types of the supplied arguments.
- Operators
- PostgreSQL allows expressions with prefix and postfix unary (one-argument) operators, as well as binary (two-argument) operators. Like functions, operators can be overloaded, and so the same problem of selecting the right operator exists.
- Value Storage
-
SQL
INSERTandUPDATEstatements place the results of expressions into a table. The expressions in the statement must be matched up with, and perhaps converted to, the types of the target columns. UNION,CASE, and related constructs-
Since all query results from a unionized
SELECTstatement must appear in a single set of columns, the types of the results of eachSELECTclause must be matched up and converted to a uniform set. Similarly, the result expressions of aCASEconstruct must be converted to a common type so that theCASEexpression as a whole has a known output type. The same holds forARRAYconstructs, and for theGREATESTandLEASTfunctions.
The system catalogs store information about which conversions, called
casts, between data types are valid, and how to
perform those conversions. Additional casts can be added by the user
with the CREATE CAST command. (This is usually
done in conjunction with defining new data types. The set of casts
between the built-in types has been carefully crafted and is best not
altered.)
An additional heuristic is provided in the parser to allow better guesses
at proper behavior for SQL standard types. There are
several basic type categories defined: boolean,
numeric, string, bitstring, datetime, timespan, geometric, network,
and user-defined. Each category, with the exception of user-defined, has
one or more preferred types which are preferentially
selected when there is ambiguity.
In the user-defined category, each type is its own preferred type.
Ambiguous expressions (those with multiple candidate parsing solutions)
can therefore often be resolved when there are multiple possible built-in types, but
they will raise an error when there are multiple choices for user-defined
types.
All type conversion rules are designed with several principles in mind:
- Implicit conversions should never have surprising or unpredictable outcomes.
- User-defined types, of which the parser has no a priori knowledge, should be “higher” in the type hierarchy. In mixed-type expressions, native types shall always be converted to a user-defined type (of course, only if conversion is necessary).
- User-defined types are not related. Currently, PostgreSQL does not have information available to it on relationships between types, other than hardcoded heuristics for built-in types and implicit relationships based on available functions and casts.
- There should be no extra overhead from the parser or executor if a query does not need implicit type conversion. That is, if a query is well formulated and the types already match up, then the query should proceed without spending extra time in the parser and without introducing unnecessary implicit conversion calls into the query. Additionally, if a query usually requires an implicit conversion for a function, and if then the user defines a new function with the correct argument types, the parser should use this new function and will no longer do the implicit conversion using the old function.
| ISBN 0954612027 | PostgreSQL Reference Manual - Volume 1 - SQL Language Reference | See the print edition |