The optional DISTINCT keyword excludes duplicate rows
from the result set. If
supplied without the ON clause, a query that specifies
DISTINCT will exclude any row whose target columns have already been retrieved
identically. Only columns in the SELECT's target list will be evaluated.
For example, the books table has 15 rows, each with an
author_id. Some authors may have several entries in the
books table, causing there to be several rows with the same
author_id. Supplying the DISTINCT clause,
as shown in the first query in Example 4-31, ensures that the result set will not have two identical rows.
Example 4-31. Using DISTINCT
booktown=#
SELECT DISTINCT author_id
booktown-#
FROM books;
author_id
-----------
16
115
1212
1644
1809
1866
2001
2031
4156
7805
7806
15990
25041
(13 rows)
booktown=#
SELECT DISTINCT ON (author_id)
booktown-#
author_id, title
booktown-#
FROM books;
author_id | title
-----------+-----------------------
16 | Little Women
115 | The Tell-Tale Heart
1212 | Practical PostgreSQL
1644 | Dynamic Anatomy
1809 | The Cat in the Hat
1866 | Dune
2001 | 2001: A Space Odyssey
2031 | Goodnight Moon
4156 | The Shining
7805 | Programming Python
7806 | Perl Cookbook
15990 | Franklin in the Dark
25041 | The Velveteen Rabbit
(13 rows)
As you can see, the first query in Example 4-31 returns only 13 rows from the
books table, even though there are 15 total rows within it. Two authors
with two books each end up being displayed only once.
The second query in Example 4-31 uses a different form of
DISTINCT, which specifies the columns (or expressions) to be checked for redundancies.
In this case, 13 rows are still returned, as the ON clause specifies to
use the author_id column as the basis for determining if a row is redundant or not.
Without the ON clause, the second query would return all 15 rows, because the
DISTINCT clause would cause PostgreSQL to look for rows that are
completely unique.
The titles that are omitted from the resultant data set by ON are arbitrarily determined by PostgreSQL, unless an
ORDER BY clause is specified. If the ORDER BY clause is used with DISTINCT, you can specify the order in which columns are selected; hence, you can select which rows will be considered
distinct first. See the Section called Sorting Rows with ORDER BY
" for information about sorting rows.
If you are interested in grouping rows which have non-unique criteria, rather than omitting all rows but one, see
the description of the GROUP BY clause in the Section called Grouping Rows with GROUP BY
."