| The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference
by The PostgreSQL Global Development Group Paperback (6"x9"), 454 pages ISBN 9781906966041 RRP £14.95 ($19.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
5.2.3 The GROUP BY and HAVING Clauses
After passing the WHERE filter, the derived input
table might be subject to grouping, using the GROUP BY
clause, and elimination of group rows using the HAVING
clause.
SELECT select_list
FROM ...
[WHERE ...]
GROUP BY grouping_column_reference [,
grouping_column_reference]...
The Volume 1B: GROUP BY Clause is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:
=> SELECT * FROM test1; x | y ---+--- a | 3 c | 2 b | 5 a | 1 (4 rows) => SELECT x FROM test1 GROUP BY x; x --- a b c (3 rows)
In the second query, we could not have written SELECT *
FROM test1 GROUP BY x, because there is no single value
for the column y that could be associated with each
group. The grouped-by columns can be referenced in the select list since
they have a single value in each group.
In general, if a table is grouped, columns that are not
listed in GROUP BY cannot be referenced except in aggregate
expressions. An example with aggregate expressions is:
=> SELECT x, sum(y) FROM test1 GROUP BY x; x | sum ---+----- a | 4 b | 5 c | 2 (3 rows)
Here sum is an aggregate function that
computes a single value over the entire group. More information
about the available aggregate functions can be found in section 7.18 Aggregate Functions.
Tip: Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the
DISTINCTclause (seeDISTINCT).
Here is another example: it calculates the total sales for each product (rather than the total sales of all products):
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
FROM products p LEFT JOIN sales s USING (product_id)
GROUP BY product_id, p.name, p.price;
In this example, the columns product_id,
p.name, and p.price must be
in the GROUP BY clause since they are referenced in
the query select list. (Depending on how the products
table is set up, name and price might be fully dependent on the
product ID, so the additional groupings could theoretically be
unnecessary, though this is not implemented.) The column
s.units does not have to be in the GROUP
BY list since it is only used in an aggregate expression
(sum(...)), which represents the sales
of a product. For each product, the query returns a summary row about
all sales of the product.
In strict SQL, GROUP BY can only group by columns of
the source table but PostgreSQL extends
this to also allow GROUP BY to group by columns in the
select list. Grouping by value expressions instead of simple
column names is also allowed.
If a table has been grouped using GROUP BY,
but only certain groups are of interest, the
HAVING clause can be used, much like a
WHERE clause, to eliminate groups from the result.
The syntax is:
SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression
Expressions in the HAVING clause can refer both to
grouped expressions and to ungrouped expressions (which necessarily
involve an aggregate function).
Example:
=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3; x | sum ---+----- a | 4 b | 5 (2 rows) => SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c'; x | sum ---+----- a | 4 b | 5 (2 rows)
Again, a more realistic example:
SELECT product_id, p.name, (sum(s.units) * (p.price -
p.cost)) AS profit
FROM products p LEFT JOIN sales s USING (product_id)
WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
GROUP BY product_id, p.name, p.price, p.cost
HAVING sum(p.price * s.units) > 5000;
In the example above, the WHERE clause is selecting
rows by a column that is not grouped (the expression is only true for
sales during the last four weeks), while the HAVING
clause restricts the output to groups with total gross sales over
5000. Note that the aggregate expressions do not necessarily need
to be the same in all parts of the query.
If a query contains aggregate function calls, but no GROUP BY
clause, grouping still occurs: the result is a single group row (or
perhaps no rows at all, if the single row is then eliminated by
HAVING).
The same is true if it contains a HAVING clause, even
without any aggregate function calls or GROUP BY clause.
| ISBN 9781906966041 | The PostgreSQL 9.0 Reference Manual - Volume 1A - SQL Language Reference | See the print edition |