Databases - Practical PostgreSQL - Grouping Rows with GROUP BY

Databases - Practical PostgreSQL
Previous Page	Home	Next Page

Grouping Rows with GROUP BY

The GROUP BY clause introduces a powerful SQL concept: aggregation . To aggregate means to gather into a sum, or whole. The practical effect of aggregating in a SQL query is that any rows whose results from the GROUP BY expression match identically are grouped together into a single aggregate row . The GROUP BY expression may define a column, but it may also be any operation upon a column as well. If several columns or expressions are specified (delimited by commas), the entire set of specified criteria must be identical for rows to be grouped together.

To effectively use aggregation you must understand that any target columns requested by an aggregating query which are not specified in the GROUP BY clause will be inaccessible, unless selected through an aggregate function . An aggregate function accepts a column name (or expression involving at least one column name) which can represent several values (i.e., from several grouped rows), performs an operation on those values, and returns a single value.

Common aggregate functions include count(), which returns the number of rows in the set, max(), which returns the maximum value in the column, and min(), which returns the minimum value in the column. An aggregate function operates only on rows in the query's result set, and is therefore executed after conditional joins and WHERE conditions have been processed.

Imagine that you wanted to know how many books Book Town stores in its database for each known publisher. You could perform a simple join between the editions and publishers tables in order to associate each publisher name with a title that they publish. It would be tedious to manually count how many titles each publisher maintained, and in cases of larger data sets, it can become difficult to manage larger result sets.

Example 4-40 demonstrates a join between these two Book Town tables, but also introduces two new elements: the count() function, and the GROUP BY clause.

Example 4-40. Using GROUP BY

booktown=# 
SELECT count(e.isbn) AS "number of books",

booktown-# 
       p.name AS publisher

booktown-# 
       FROM editions AS e INNER JOIN publishers AS p

booktown-# 
            ON (e.publisher_id = p.id)

booktown-# 
       GROUP BY p.name;

 number of books |          publisher
-----------------+-----------------------------
               2 | Ace Books
               1 | Books of Wonder
               2 | Doubleday
               1 | HarperCollins
               1 | Henry Holt & Company, Inc.
               1 | Kids Can Press
               1 | Mojo Press
               1 | O'Reilly & Associates
               1 | Penguin
               3 | Random House
               2 | Roc
               1 | Watson-Guptill Publications
(12 rows)

The GROUP BY clause in Example 4-40 instructs PostgreSQL to group the rows in the joined data set by p.name, which in this query is a reference to the name column in the publishers table. Therefore, any rows that have the same publisher name will be grouped together, or aggregated. The count() function then counts the number of isbn values from the editions table that are in each aggregated row, and returns a single numeric value representing the number of rows that were aggregated for each unique publisher.

Note that in Example 4-40 the argument of the editions table's isbn column is chosen simply to indicate the objective of the example (to count how many books there are per publisher). Any column name will return the same number, as count() will always return the number of rows grouped in the current aggregate row.

Something to watch out for when designing aggregate queries is that the WHERE clause cannot accept criteria involving aggregate functions. Instead, use the HAVING clause. It functions identically to the WHERE clause, but its conditions must be on aggregate functions rather than single-row conditions. Syntactically, the HAVING clause follows the GROUP BY clause, as demonstrated in Example 4-41.

Example 4-41. Using the HAVING clause

booktown=# 
SELECT count(e.isbn) AS "number of books",

booktown-# 
       p.name AS publisher

booktown-# 
       FROM editions AS e INNER JOIN publishers AS p

booktown-# 
            ON (e.publisher_id = p.id)

booktown-# 
       GROUP BY publisher

booktown-# 
       HAVING count(e.isbn) > 1;

 number of books |  publisher
-----------------+--------------
               2 | Ace Books
               2 | Doubleday
               3 | Random House
               2 | Roc
(4 rows)

Both Example 4-40 and Example 4-41 create a data set through an inner join between the editions and publishers table. However, Example 4-41 constrains the final result to publishers having more than a single book in the Book Town database, as set by the HAVING clause.

Note: If a result set's column is aliased via an AS clause to a name that overlaps with a real column in one of the source data sets, and used in the GROUP BY clause, PostgreSQL will assume that you are referring to the input column, not the output alias.

Databases - Practical PostgreSQL
Previous Page	Home	Next Page