| PostgreSQL Reference Manual - Volume 1 - SQL Language Reference by The PostgreSQL Global Development Group Paperback (6"x9"), 716 pages ISBN 0954612027 RRP £32.00 ($49.95) Sales of this book support the PostgreSQL project! Get a printed copy>>> |
5.2.3 The GROUP BY and HAVING Clauses
After passing the WHERE filter, the derived input
table may be subject to grouping, using the GROUP BY
clause, and elimination of group rows using the HAVING
clause.
SELECT select_list
FROM ...
[WHERE ...]
GROUP BY grouping_column_reference [,
grouping_column_reference]...
The GROUP BY Clause is used to group together those rows in a table that share the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows sharing common values into one group row that is representative of all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:
=> SELECT * FROM test1; x | y ---+--- a | 3 c | 2 b | 5 a | 1 (4 rows) => SELECT x FROM test1 GROUP BY x; x --- a b c (3 rows)
In the second query, we could not have written SELECT *
FROM test1 GROUP BY x, because there is no single value
for the column y that could be associated with each
group. The grouped-by columns can be referenced in the select list since
they have a single value in each group.
In general, if a table is grouped, columns that are not used in the grouping cannot be referenced except in aggregate expressions. An example with aggregate expressions is:
=> SELECT x, sum(y) FROM test1 GROUP BY x; x | sum ---+----- a | 4 b | 5 c | 2 (3 rows)
Here sum is an aggregate function that
computes a single value over the entire group. More information
about the available aggregate functions can be found in section 7.15 Aggregate Functions.
Tip: Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the
DISTINCTclause (seeDISTINCT).
Here is another example: it calculates the total sales for each product (rather than the total sales on all products).
SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
FROM products p LEFT JOIN sales s USING (product_id)
GROUP BY product_id, p.name, p.price;
In this example, the columns product_id,
p.name, and p.price must be
in the GROUP BY clause since they are referenced in
the query select list. (Depending on how exactly the products
table is set up, name and price may be fully dependent on the
product ID, so the additional groupings could theoretically be
unnecessary, but this is not implemented yet.) The column
s.units does not have to be in the GROUP
BY list since it is only used in an aggregate expression
(sum(...)), which represents the sales
of a product. For each product, the query returns a summary row about
all sales of the product.
In strict SQL, GROUP BY can only group by columns of
the source table but PostgreSQL extends
this to also allow GROUP BY to group by columns in the
select list. Grouping by value expressions instead of simple
column names is also allowed.
If a table has been grouped using a GROUP BY
clause, but then only certain groups are of interest, the
HAVING clause can be used, much like a
WHERE clause, to eliminate groups from a grouped
table. The syntax is:
SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression
Expressions in the HAVING clause can refer both to
grouped expressions and to ungrouped expressions (which necessarily
involve an aggregate function).
Example:
=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3; x | sum ---+----- a | 4 b | 5 (2 rows) => SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c'; x | sum ---+----- a | 4 b | 5 (2 rows)
Again, a more realistic example:
SELECT product_id, p.name, (sum(s.units) * (p.price -
p.cost)) AS profit
FROM products p LEFT JOIN sales s USING (product_id)
WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
GROUP BY product_id, p.name, p.price, p.cost
HAVING sum(p.price * s.units) > 5000;
In the example above, the WHERE clause is selecting
rows by a column that is not grouped (the expression is only true for
sales during the last four weeks), while the HAVING
clause restricts the output to groups with total gross sales over
5000. Note that the aggregate expressions do not necessarily need
to be the same in all parts of the query.
| ISBN 0954612027 | PostgreSQL Reference Manual - Volume 1 - SQL Language Reference | See the print edition |