- publishing free software manuals
PostgreSQL Reference Manual - Volume 1 - SQL Language Reference
by The PostgreSQL Global Development Group
Paperback (6"x9"), 716 pages
ISBN 0954612027
RRP £32.00 ($49.95)

Sales of this book support the PostgreSQL project! Get a printed copy>>>

5.2.3 The GROUP BY and HAVING Clauses

After passing the WHERE filter, the derived input table may be subject to grouping, using the GROUP BY clause, and elimination of group rows using the HAVING clause.

SELECT select_list
    FROM ...
    [WHERE ...]
    GROUP BY grouping_column_reference [,

The GROUP BY Clause is used to group together those rows in a table that share the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows sharing common values into one group row that is representative of all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:

=> SELECT * FROM test1;
 x | y
 a | 3
 c | 2
 b | 5
 a | 1
(4 rows)

=> SELECT x FROM test1 GROUP BY x;
(3 rows)

In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because there is no single value for the column y that could be associated with each group. The grouped-by columns can be referenced in the select list since they have a single value in each group.

In general, if a table is grouped, columns that are not used in the grouping cannot be referenced except in aggregate expressions. An example with aggregate expressions is:

=> SELECT x, sum(y) FROM test1 GROUP BY x;
 x | sum
 a |   4
 b |   5
 c |   2
(3 rows)

Here sum is an aggregate function that computes a single value over the entire group. More information about the available aggregate functions can be found in section 7.15 Aggregate Functions.

Tip: Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the DISTINCT clause (see DISTINCT).

Here is another example: it calculates the total sales for each product (rather than the total sales on all products).

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
    FROM products p LEFT JOIN sales s USING (product_id)
    GROUP BY product_id, p.name, p.price;

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list. (Depending on how exactly the products table is set up, name and price may be fully dependent on the product ID, so the additional groupings could theoretically be unnecessary, but this is not implemented yet.) The column s.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.

In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed.

If a table has been grouped using a GROUP BY clause, but then only certain groups are of interest, the HAVING clause can be used, much like a WHERE clause, to eliminate groups from a grouped table. The syntax is:

SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING

Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions (which necessarily involve an aggregate function).


=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;
 x | sum
 a |   4
 b |   5
(2 rows)

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';
 x | sum
 a |   4
 b |   5
(2 rows)

Again, a more realistic example:

SELECT product_id, p.name, (sum(s.units) * (p.price -
 p.cost)) AS profit
    FROM products p LEFT JOIN sales s USING (product_id)
    WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
    GROUP BY product_id, p.name, p.price, p.cost
    HAVING sum(p.price * s.units) > 5000;

In the example above, the WHERE clause is selecting rows by a column that is not grouped (the expression is only true for sales during the last four weeks), while the HAVING clause restricts the output to groups with total gross sales over 5000. Note that the aggregate expressions do not necessarily need to be the same in all parts of the query.

ISBN 0954612027PostgreSQL Reference Manual - Volume 1 - SQL Language ReferenceSee the print edition