The heart of all SQL queries is the SELECT command.
SELECT is used to build queries (also known as SELECT
statements). Queries are the only SQL instructions by which your data can be retrieved from tables and views. The data
returned via a query is called a result set and consists of rows, with columns, similar to a
table.
The columns of a result set are not stored on the disk in any fixed form. They are purely a temporary result of the
query's requested data. A query on a table may return a result set with the same column structure as the table, or it may
differ drastically. Result sets may even have columns which are drawn from several other tables by a single query.
Since it is central to PostgreSQL, SELECT is easily the most complicated single
command, having the most available clauses and parameters. The following is the syntax for
SELECT. The terms used are summarized and described in greater detail within the
following sections. The term expression is used to refer to either a column name, or a general
expression (such as a column being operated upon by a constant, or another column).
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
target [ AS name ] [, ...]
[ FROM source [, ...] ]
[ [ NATURAL ] join_typesource
[ ON condition | USING ( column_list ) ] ]
[, ...]
[ WHERE condition ]
[ GROUP BY expression [, ...] ]
[ HAVING condition [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL ] sub-query ]
[ ORDER BY expression
[ ASC | DESC | USING operator ]
[, ...] ]
[ FOR UPDATE [ OF table [, ...] ] ]
[ LIMIT { count | ALL } [ { OFFSET | , } start ] ]
In this syntax diagram, source may be either a table name or a subselect.
The syntax for these general forms is as follows:
FROM { [ ONLY ] table [ [ AS ] alias [ ( column_alias [, ...] ) ] ] |
( query ) [ AS ] alias [ ( column_alias [, ...] ) ] }
ALL
The ALL keyword may be specified as a noise term to make it
clear that all rows should be returned.
DISTINCT [ ON ( expression [, ...] ) ]
The DISTINCT clause specifies a
column (or expression) for which to retrieve only one row per unique value of expression.
target [ AS name ] [, ...]
The SELECT targets are usually column names, though they can be constants, identifier,
function or general expression. Each target requested must be separated by commas, and may be named
dynamically to name via the AS clause. Supplying the asterisk symbol
(*) as a target is shorthand for requesting all non-system columns, and may be
listed along with other targets.
FROM source [, ...]
The FROM clause dictates the source that PostgreSQL will look in for the specified targets.
The source, in this case, may be a table name or a sub-query.
You can specify numerous sources, separated by commas. (This is roughly equivalent to a cross join).
The syntax for the FROM clause is described in more detail later in this section.
[ NATURAL ] join_typesource [ ON condition | USING ( column_list ) ]
The FROM sources may be joined together via the
JOIN clause, which requires a join_type (e.g.,
INNER, FULL OUTER,
CROSS) and may require a condition or
column_list to further define the nature of the join, depending on the
join_type.
WHERE condition
The WHERE clause constrains the result set from the SELECT statement to specified
criteria, which are defined by condition. Conditions must return a single Boolean value (true
or false), but may consist of several checks combined with logical operators (e.g., with
AND, and OR) to indicate that available
rows must meet all supplied conditions to be included in the statement's results.
GROUP BY expression [, ...]
The GROUP BY clause aggregates (groups) rows together by the criteria described in expression. This can be as
simple as a column name (and often is) or an arbitrary expression applied to values of the result set.
HAVING condition [, ...]
The HAVING clause is similar to the WHERE clause, but checks its conditions on aggregated
(grouped) sets instead of atomic rows.
{ UNION | INTERSECT | EXCEPT } [ ALL ] sub-query
Performs one of three set operations between the
SELECT statement and a second query, returning their result sets in uniform column
structure (which must be compatible).
UNION
Returns the set of collected rows.
INTERSECT
Returns the set of rows where the values of the two sets overlap.
EXCEPT
Returns the set of rows which are found in the SELECT
statement, but not found in the secondary query.
ORDER BY expression
Sorts the results of the SELECT statement by expression.
[ ASC | DESC | USING operator ]
Determines whether or not the ORDER BYexpression
proceeds in ascending order (ASC), or descending order (DESC).
An operator may alternatively be specified with the USING keyword
(e.g., < or >).
FOR UPDATE [ OF table [, ...] ]
Allows for exclusive locking of the returned rows. When used within a transaction block, FOR UPDATE
locks the rows of the specified table until the transaction is committed. While locked, the rows cannot be updated by other transactions.
LIMIT { count | ALL }
Limits the number of rows returned to a maximum of count, or explicitly allows ALL rows.
{ OFFSET | , } start
Instructs the LIMIT clause at what point to begin limiting the
results. For example, a LIMIT with a count set to 100, and
an OFFSET clause with a start value of 50
would return the rows from 50 to 150 (if there are that many results to return).
Terms used in the FROM clause's syntax description are as follows:
[ ONLY ] table
The table name specifies what table to use as a source for the
SELECT statement. Specifying the ONLY
clause causes the rows of any child's table to be omitted from the query.
[ AS ] alias
An alias may optionally be assigned to a FROM
source, in order to simplify a query (e.g., books might be temporarily referenced
with an alias of b). The AS term is considered
noise, and is optional.
( query ) [ AS ] alias
Any valid SELECT statement may be placed in parentheses as the
query. This causes the result set created by the query to be used as a
FROM source, as if it had been a static table. This use of a sub-query requires a specified
alias.
( column_alias [, ...] )
The FROM sources which have assigned aliases may also alias columns by
specifying arbitrary column aliases. Each column_alias must be separated by commas, and grouped
within parentheses following the FROM source's alias. These aliases must match the
order of the defined columns in the table to which it is applied.
A SELECT statement may be as simple as a request for all rows and all columns
from a specified table. Use the following syntax to retrieve all rows and columns from a table:
SELECT * FROM table_name;
The asterisk (*) character, as mentioned in the explanation of SELECT's syntax,
is short-hand for all non-system columns. In essence, the SELECT * requests all
non-system data in the table named table_name ; this retrieves all columns and all rows, because no row limit is specified. To demonstrate,
Example 4-23 requests all columns (*) from
Book Town's books table.
Example 4-23. Selecting all from the books table
booktown=# SELECT * FROM books;
id | title | author_id | subject_id
-------+-----------------------------+-----------+------------
7808 | The Shining | 4156 | 9
4513 | Dune | 1866 | 15
4267 | 2001: A Space Odyssey | 2001 | 15
1608 | The Cat in the Hat | 1809 | 2
1590 | Bartholomew and the Oobleck | 1809 | 2
25908 | Franklin in the Dark | 15990 | 2
1501 | Goodnight Moon | 2031 | 2
190 | Little Women | 16 | 6
1234 | The Velveteen Rabbit | 25041 | 3
2038 | Dynamic Anatomy | 1644 | 0
156 | The Tell-Tale Heart | 115 | 9
41472 | Practical PostgreSQL | 1212 | 4
41473 | Programming Python | 7805 | 4
41477 | Learning Python | 7805 | 4
41478 | Perl Cookbook | 7806 | 4
(15 rows)
While SELECT * is a good example of a basic query, and is sometimes very useful,
you will probably be interested in retrieving only a few columns worth of information at a time. To stay
efficient, and to keep your queries clear, it is a good idea to explicitly specify the intended target columns rather than to
use the asterisk. This is especially true when using the JOIN clause, as will be
discussed in the Section called Joining Data Sets with JOIN."
To specify the target columns for a query, list the names of the columns following the SELECT keyword. The query
will return data for only those columns that you list. The order of these columns need not match their literal order in the table,
and columns may be listed more than once, or not at all, as shown in Example 4-24.
Example 4-24. Re-Ordering columns
booktown=# SELECT id, author_id, title, id
booktown-# FROM books;
id | author_id | title | id
-------+-----------+-----------------------------+-------
7808 | 4156 | The Shining | 7808
4513 | 1866 | Dune | 4513
4267 | 2001 | 2001: A Space Odyssey | 4267
1608 | 1809 | The Cat in the Hat | 1608
1590 | 1809 | Bartholomew and the Oobleck | 1590
25908 | 15990 | Franklin in the Dark | 25908
1501 | 2031 | Goodnight Moon | 1501
190 | 16 | Little Women | 190
1234 | 25041 | The Velveteen Rabbit | 1234
2038 | 1644 | Dynamic Anatomy | 2038
156 | 115 | The Tell-Tale Heart | 156
41472 | 1212 | Practical PostgreSQL | 41472
41473 | 7805 | Programming Python | 41473
41477 | 7805 | Learning Python | 41477
41478 | 7806 | Perl Cookbook | 41478
(15 rows)
As you can see, the data sets returned in both Example 4-24 and Example 4-23 are nearly identical. The second set is returned in a different column arrangement,
(omitting the subject_id column, and repeating the
id column twice) as a result of the target list.
In addition to plain column names, targets in the SELECT statement may be
arbitrary expressions (e.g., involving functions, or operators acting upon identifiers), or constants. The syntax is
simple, and only requires that each identifier, expression, or constant be separated by commas. Conveniently, different types
of targets may be arbitrarily mixed in the target list.
In fact, the SELECT command may be used to retrieve expressions and constants
without the use of a FROM clause or specified columns, as in Example 4-25.
Example 4-25. Using expressions and constants
testdb=# SELECT 2 + 2,
testdb-# pi(),
testdb-# 'PostgreSQL is more than a calculator!';
?column? | pi | ?column?
----------+------------------+---------------------------------------
4 | 3.14159265358979 | PostgreSQL is more than a calculator!
(1 row)
The target list allows the use of an optional AS clause for each
specified target, which re-names a column in the returned result set to an arbitrary name specified in the clause. The rules
and limitations for the specified name are the same as for normal identifiers (e.g., they may be quoted to contain spaces,
may not be keywords unless quoted, and so on).
Using AS has no lasting effect on the column itself, but only on the result set
which is returned by the query. AS can be particularly useful when selecting expressions or constants, rather than
plain columns. Naming result set columns with AS can clarify the meaning of an otherwise
ambiguous expression or constant. This technique is demonstrated in Example 4-26, which shows the same
results as Example 4-25, but with different column headings.
Example 4-26. Using the AS clause with expressions and constants
booktown=# SELECT 2 + 2 AS "2 plus 2",
booktown-# pi() AS "the pi function",
booktown-# 'PostgreSQL is more than a calculator!' AS comments;
2 plus 2 | the pi function | comments
----------+------------------+---------------------------------------
4 | 3.14159265358979 | PostgreSQL is more than a calculator!
(1 row)
The FROM clause allows you to choose either a table or a result set as a source
for your specified target list. Multiple sources may be entered following the FROM
clause, separated by commas. Specifying multiple sources in this fashion is functionally similar to a
CROSS JOIN, discussed in the Section called Joining Data Sets with JOIN."
Take care when specifying multiple FROM sources to PostgreSQL. The result
of performing a SELECT on several comma-delimited sources without a
WHERE or JOIN clause to qualify the relationship
between the sources is that the complete Cartesian product of the sources will be returned. This is a
result set where each column from each source is combined in every possible combination of rows between each other
source.
You must be careful when identifying column names and using multiple sources in the FROM
clause, as it can introduce ambiguity between identifiers. Consider a SELECT that draws from
both the books table and the authors table. Each of these
tables has a column called id. If specified, PostgreSQL will be unable to determine if
the id column refers to the book, or the author:
booktown=# SELECT id FROM books, authors;
ERROR: Column reference "id" is ambiguous
As a result of the potential for ambiguity, "complete" column names can be referenced through a special syntax
called dot-notation. Dot-notation refers to the placement of a dot, or
period, between the table name and a column name, in order to explicitly reference a particular column. For example,
books.id refers to the id column within the
books table.
Dot-notation is only required in instances of ambiguity between data sets. As shown in Example 4-27, you can use the column name as an identifier source, as long as
it is unique among the available sets defined by the FROM clause. (In this case, the
title column, which is unique to the books table, and
the last_name column, which is unique to the authors
tables).
Example 4-27. Selecting from multiple table sources
booktown=# SELECT books.id, title, authors.id, last_name
booktown-# FROM books, authors
booktown-# WHERE books.author_id = authors.id;
id | title | id | last_name
-------+-----------------------------+-------+--------------
190 | Little Women | 16 | Alcott
156 | The Tell-Tale Heart | 115 | Poe
41472 | Practical PostgreSQL | 1212 | Worsley
2038 | Dynamic Anatomy | 1644 | Hogarth
1608 | The Cat in the Hat | 1809 | Geisel
1590 | Bartholomew and the Oobleck | 1809 | Geisel
4513 | Dune | 1866 | Herbert
4267 | 2001: A Space Odyssey | 2001 | Clarke
1501 | Goodnight Moon | 2031 | Brown
7808 | The Shining | 4156 | King
41473 | Programming Python | 7805 | Lutz
41477 | Learning Python | 7805 | Lutz
41478 | Perl Cookbook | 7806 | Christiansen
25908 | Franklin in the Dark | 15990 | Bourgeois
1234 | The Velveteen Rabbit | 25041 | Bianco
(15 rows)
If you wish to use a sub-query to generate a result set as a source for your FROM
clause, the entire query must be surrounded by parentheses. This instructs PostgreSQL to correctly interpret the query as a
sub-SELECT statement and to execute it before the SELECT
statement within which it resides.
Example 4-28 demonstrates a peculiar query which retrieves all column values
(*) from the books table via a
sub-query. The query then retrieves a string constant of test and the id
values from that result set (derived from the sub-query).
Example 4-28. Selecting from a sub-query
booktown=# SELECT 'test' AS test, id
booktown-# FROM (SELECT * FROM books)
booktown-# AS example_sub_query;
test | id
------+-------
test | 7808
test | 4513
test | 4267
test | 1608
test | 1590
test | 25908
test | 1501
test | 190
test | 1234
test | 2038
test | 156
test | 41472
test | 41473
test | 41477
test | 41478
(15 rows)
The query in Example 4-28 is rather peculiar because the net effect is no different than if you
had selected from the books table. This occurs because the result set from the sub-query is identical
to the set of values in the books table. The use of this query demonstrates
the combination of a string constant from one SELECT statement with a value drawn from the result
set of a second SELECT statement. See the Section called Using Sub-Queries" for more
realistic examples of sub-queries once you have a better understanding of the SELECT
statement itself.
Note: When specifying a table that is inherited by other tables, you may provide the optional
ONLY keyword before the table name to indicate that you do not want to draw from any
sub-tables. (See Chapter 7 for more information on inheritance.)
Like columns, FROM sources (e.g., tables, or sub-queries) may be aliased with the
AS clause. This is usually applied as a convenient shorthand for the dot-notation
described in the preceding section. Aliasing a data set allows you to refer to it via dot-notation, which provides a more succinct
and readable SQL statement. Example 4-29 demonstrates the same query used in Example 4-27, however you can see that it simplifies the dot-notation with the
AS clause.
Example 4-29. Aliasing FROM sources
booktown=# SELECT b.id, title, a.id, last_name
booktown-# FROM books AS b, authors AS a
booktown-# WHERE b.author_id = a.id;
id | title | id | last_name
-------+-----------------------------+-------+--------------
190 | Little Women | 16 | Alcott
156 | The Tell-Tale Heart | 115 | Poe
41472 | Practical PostgreSQL | 1212 | Worsley
2038 | Dynamic Anatomy | 1644 | Hogarth
1608 | The Cat in the Hat | 1809 | Geisel
1590 | Bartholomew and the Oobleck | 1809 | Geisel
4513 | Dune | 1866 | Herbert
4267 | 2001: A Space Odyssey | 2001 | Clarke
1501 | Goodnight Moon | 2031 | Brown
7808 | The Shining | 4156 | King
41473 | Programming Python | 7805 | Lutz
41477 | Learning Python | 7805 | Lutz
41478 | Perl Cookbook | 7806 | Christiansen
25908 | Franklin in the Dark | 15990 | Bourgeois
1234 | The Velveteen Rabbit | 25041 | Bianco
(15 rows)
In addition to placing aliases on the FROM clause's data sources, you can place
aliases on the columns within that source. This is done by following a valid data source's alias
with a list of column aliases, grouped in parentheses and separated by commas. A column alias list
therefore consists of a sequence of identifier aliases for each column, which correspond to the literal columns in the
order that the table is defined with (from left to right).
When describing a column alias list, you do not need to specify each column; any column that is left unspecified
is accessible via its normal name within such a query. If the only column you wish to alias is to the right of any
other columns that you do not necessarily wish to alias, you will need to explicitly list the preceding columns (it is
valid to list the same name for an existing column as its "alias"). Otherwise, PostgreSQL will have no way of knowing
which column you were attempting to alias and will assume you were addressing the first column from the left.
Note: The AS keyword is technically considered noise, and may be omitted in practice;
PostgreSQL determines that any stray identifiers following a FROM source may be used as aliases.
Example 4-30 illustrates the same query that is used in Example 4-29 but aliases the id columns in each table to
unique identifiers in order to reference them directly (i.e., without dot-notation). The syntax is functionally
identical, aliasing only the books table's id column,
thus making the authors table's id column
non-ambiguous:
Example 4-30. Aliasing columns
booktown=# SELECT the_book_id, title, id, last_name
booktown-# FROM books AS b (the_book_id), authors
booktown-# WHERE author_id = id;
the_book_id | title | id | last_name
-------------+-----------------------------+-------+--------------
190 | Little Women | 16 | Alcott
156 | The Tell-Tale Heart | 115 | Poe
41472 | Practical PostgreSQL | 1212 | Worsley
2038 | Dynamic Anatomy | 1644 | Hogarth
1608 | The Cat in the Hat | 1809 | Geisel
1590 | Bartholomew and the Oobleck | 1809 | Geisel
4513 | Dune | 1866 | Herbert
4267 | 2001: A Space Odyssey | 2001 | Clarke
1501 | Goodnight Moon | 2031 | Brown
7808 | The Shining | 4156 | King
41473 | Programming Python | 7805 | Lutz
41477 | Learning Python | 7805 | Lutz
41478 | Perl Cookbook | 7806 | Christiansen
25908 | Franklin in the Dark | 15990 | Bourgeois
1234 | The Velveteen Rabbit | 25041 | Bianco
(15 rows)
The optional DISTINCT keyword excludes duplicate rows
from the result set. If
supplied without the ON clause, a query that specifies
DISTINCT will exclude any row whose target columns have already been retrieved
identically. Only columns in the SELECT's target list will be evaluated.
For example, the books table has 15 rows, each with an
author_id. Some authors may have several entries in the
books table, causing there to be several rows with the same
author_id. Supplying the DISTINCT clause,
as shown in the first query in Example 4-31, ensures that the result set will not have two identical rows.
Example 4-31. Using DISTINCT
booktown=# SELECT DISTINCT author_id
booktown-# FROM books;
author_id
-----------
16
115
1212
1644
1809
1866
2001
2031
4156
7805
7806
15990
25041
(13 rows)
booktown=# SELECT DISTINCT ON (author_id)
booktown-# author_id, title
booktown-# FROM books;
author_id | title
-----------+-----------------------
16 | Little Women
115 | The Tell-Tale Heart
1212 | Practical PostgreSQL
1644 | Dynamic Anatomy
1809 | The Cat in the Hat
1866 | Dune
2001 | 2001: A Space Odyssey
2031 | Goodnight Moon
4156 | The Shining
7805 | Programming Python
7806 | Perl Cookbook
15990 | Franklin in the Dark
25041 | The Velveteen Rabbit
(13 rows)
As you can see, the first query in Example 4-31 returns only 13 rows from the
books table, even though there are 15 total rows within it. Two authors
with two books each end up being displayed only once.
The second query in Example 4-31 uses a different form of
DISTINCT, which specifies the columns (or expressions) to be checked for redundancies.
In this case, 13 rows are still returned, as the ON clause specifies to
use the author_id column as the basis for determining if a row is redundant or not.
Without the ON clause, the second query would return all 15 rows, because the
DISTINCT clause would cause PostgreSQL to look for rows that are
completely unique.
The titles that are omitted from the resultant data set by ON are arbitrarily determined by PostgreSQL, unless an
ORDER BY clause is specified. If the ORDER BY clause is used with DISTINCT, you can specify the order in which columns are selected; hence, you can select which rows will be considered
distinct first. See the Section called Sorting Rows with ORDER BY" for information about sorting rows.
If you are interested in grouping rows which have non-unique criteria, rather than omitting all rows but one, see
the description of the GROUP BY clause in the Section called Grouping Rows with GROUP BY."
The WHERE clause allows you to provide Boolean (true or false) conditions that rows must satisfy
to be included in the resulting row set. In practice, a SELECT statement will almost always contain at least one qualification
via the WHERE clause.
For example, suppose that you want to see all of the books in Book Town's Computers section. The
subject_id for the Computers subject is 4.
Therefore, the WHERE clause can be applied with an equivalence operation
(the = operator) to check for all books in the books
table with a subject_id equal to 4. This is demonstrated in
Example 4-32.
The query in Example 4-32 returns only rows whose subject_id
column matches the integer constant value of 4. Thus, only the four rows for computer books are returned, rather than the 15 rows
shown by the simple query in Example 4-23.
The WHERE clause accepts numerous conditions, provided that they are joined by
valid logical keywords (e.g., the AND, and OR
keywords) and returns a single Boolean condition. For example, you may be interested in seeing all Book Town titles
that fall under the Computers subject which are also by the author Mark Lutz, thus joining two
conditions to narrow the focus of your query. Alternatively, you might be interested in seeing each of Book Town's titles
that fall under either the Computers subject or the Arts subject, thereby joining two conditions to broaden the focus of
your intended result set. Example 4-33 demonstrates each of these scenarios using the
AND keyword and OR keyword, respectively.
Example 4-33. Combining conditions in the WHERE clause
booktown=# SELECT title FROM books
booktown-# WHERE subject_id = 4
booktown-# AND author_id = 7805;
title
--------------------
Programming Python
Learning Python
(2 rows)
booktown=# SELECT title FROM books
booktown-# WHERE subject_id = 4
booktown-# OR subject_id = 0;
title
----------------------
Dynamic Anatomy
Practical PostgreSQL
Programming Python
Learning Python
Perl Cookbook
(5 rows)
The first SELECT statement in Example 4-33
combines one condition, which checks for titles in the Computers subject (with a
subject_id of 4), with another condition, which checks if the author
is Mark Lutz (with an author_id of 7805) via the
AND keyword. The result is a smaller data set, constrained to two rows that fit
both specified conditions.
The second SELECT statement in Example 4-33
combines the same first condition (books in the Computers subject) with a second condition: if the title falls
under the Arts subject (with a subject_id of 0). The result is a
slightly larger data set of five rows that matched at least one of these conditions.
WHERE conditions may be grouped together indefinitely, though after two conditions
you may wish to group the conditions with parentheses. Doing so explicitly indicates how the conditions are interrelated.
As a demonstration, the two statements in Example 4-34 have different effects based merely on the addition of parentheses.
Example 4-34. Grouping WHERE conditions with parentheses
booktown=# SELECT * FROM books
booktown-# WHERE author_id = 1866
booktown-# AND subject_id = 15
booktown-# OR subject_id = 3;
id | title | author_id | subject_id
------+----------------------+-----------+------------
4513 | Dune | 1866 | 15
1234 | The Velveteen Rabbit | 25041 | 3
(2 rows)
booktown=# SELECT * FROM books
booktown-# WHERE author_id = 1866
booktown-# AND (subject_id = 15
booktown(# OR subject_id = 3);
id | title | author_id | subject_id
------+-------+-----------+------------
4513 | Dune | 1866 | 15
(1 row)
The preceding example demonstrates two attempts to look up Book Town titles with an
author_id of 1866. The titles also have a
subject_id of either 15, or 3.
As you can see from the first statement, when the three conditions are used without parentheses, the intent
of the statement is ambiguous, and interpreted incorrectly. The addition of parentheses will cause the
evaluations within parentheses to be considered before any surrounding condition.
As demonstrated by the use of the WHERE clause on two table sources in the Section called Selecting Sources with the FROM Clause," you have the ability to retrieve data from different data sources by
combining their columns into joined rows. In SQL, this process is formally called a join.
The essential concept behind a join is that two or more data sets, when joined, have their columns combined into a
new set of rows containing each of the columns requested from each of the data sets. The foundation
of all joins is the Cartesian product, which is the set of all possible combinations between two
data sets. That product may then be refined into a smaller subset by a set of criteria in the
JOIN syntax. These criteria describe a relationship between data sets, though such a definition
is not required.
There are three general types of joins:
Cross joins
Creates a Cartesian product (or cross product) between two sets of data. It is called a product
because it does not define a relationship between the sets; instead, it returns every possible combination of rows
between the joined sets, essentially multiplying the sources by one another.
Inner joins
Creates a subset of the Cartesian product between two sets of data, requiring a
conditional clause to specify criteria upon which to join records. The condition must
return a Boolean value to determine whether or not a row is included in the joined set.
Outer joins
Similar to an inner join, in that it accepts criteria which will match rows between two sets of
data, but returns at least one instance of each row from a specified set. This is either the left set (the
data source to the left of the JOIN keyword), the right set (the data source to
the right of the JOIN keyword), or both sets, depending on the variety of outer
join employed. The missing column values for the empty half of the row which does not meet the join condition are returned as
NULL values.
A cross join is functionally identical to listing comma-delimited sources. It therefore should almost always be
accompanied by a WHERE clause to qualify the relationship between the joined data sets.
Example 4-35 demonstrates the same functional query used in Example 4-27, substituting the comma for the formal JOIN
syntax.
Example 4-35. A simple CROSS JOIN
booktown=# SELECT b.id, title, a.id, last_name
booktown-# FROM books AS b CROSS JOIN authors AS a
booktown-# WHERE b.author_id = a.id;
id | title | id | last_name
-------+-----------------------------+-------+--------------
190 | Little Women | 16 | Alcott
156 | The Tell-Tale Heart | 115 | Poe
41472 | Practical PostgreSQL | 1212 | Worsley
2038 | Dynamic Anatomy | 1644 | Hogarth
1608 | The Cat in the Hat | 1809 | Geisel
1590 | Bartholomew and the Oobleck | 1809 | Geisel
4513 | Dune | 1866 | Herbert
4267 | 2001: A Space Odyssey | 2001 | Clarke
1501 | Goodnight Moon | 2031 | Brown
7808 | The Shining | 4156 | King
41473 | Programming Python | 7805 | Lutz
41477 | Learning Python | 7805 | Lutz
41478 | Perl Cookbook | 7806 | Christiansen
25908 | Franklin in the Dark | 15990 | Bourgeois
1234 | The Velveteen Rabbit | 25041 | Bianco
(15 rows)
This syntax is merely a more formal way of stating the relationship between the two data sets. There is no functional
difference between the CROSS JOIN syntax and using a simple comma delimited list of columns.
More useful are the inner and outer joins, which require a qualification of the relationship
between joined data sets in the JOIN syntax itself. The following is the syntax for an
inner or outer join:
source1 [ NATURAL ] join_typesource2
[ ON ( condition [, ...] ) | USING ( column [, ...] ) ]
source1
Identifies the first data set that is being joined (i.e., a table name or sub-query).
[ NATURAL ]
Implies that the two data sets should be joined on equivalent values between like-named columns
(e.g., if two tables have a column called id, it will join rows where the
id values are equivalent). The NATURAL clause
will respect column aliases, if applied. The use of the NATURAL clause makes it both
unnecessary and invalid to try to specify either of the ON or
USING clauses.
join_type
Specifies the type of JOIN intended. Valid values in this context
are [ INNER ] JOIN (specifying just JOIN implies
an INNER JOIN), LEFT [ OUTER] JOIN,
RIGHT [ OUTER ] JOIN, and FULL [ OUTER ] JOIN.
source2
Identifies the second data set that is being joined (i.e., a table name, or sub-query).
[ ON ( condition [, ...] ) |
Identifies the second data set that is being joined (i.e., a table name, or sub-query).
Specifies the relationship between source1 and
source2. Any arbitrary criteria may be specified within the
ON clause, just as you would specify conditions following a
WHERE clause. Column and table aliases are allowed in this criteria.
USING ( column [, ...] ) ]
Specifies like-named columns between source1 and
source2 with which to join rows by equivalent values. Similar to a
NATURAL JOIN, but allows you to indicate what specific columns to join on,
whereas NATURAL will join on all like-named columns.
Similar to NATURAL joins, column aliases are respected in the
USING clause's parameters.
The SQL92 INNER JOIN syntax is a tool that helps differentiate the conditions with
which you are joining data sources (the JOIN conditions) from the conditions with which
you are evaluating rows for inclusion in your data set (the WHERE conditions). For
example, consider the two SELECT statements in Example 4-36.
Example 4-36. Comparing INNER JOIN to WHERE
booktown=# SELECT title, last_name, first_name
booktown-# FROM books, authors
booktown-# WHERE (books.author_id = authors.id)
booktown-# AND last_name = 'Geisel';
title | last_name | first_name
-----------------------------+-----------+---------------
The Cat in the Hat | Geisel | Theodor Seuss
Bartholomew and the Oobleck | Geisel | Theodor Seuss
(2 rows)
booktown=# SELECT title, last_name, first_name
booktown-# FROM books AS b INNER JOIN authors AS a
booktown-# ON (b.author_id = a.id)
booktown-# WHERE last_name = 'Geisel';
title | last_name | first_name
-----------------------------+-----------+---------------
The Cat in the Hat | Geisel | Theodor Seuss
Bartholomew and the Oobleck | Geisel | Theodor Seuss
(2 rows)
The two forms of syntax in Example 4-36 are functionally identical, and return the same results.
The INNER JOIN syntax allows you to segregate the
relational criteria from your evaluation criteria by only defining the set relationships in the
ON clause. This can make involved queries much easier to read and maintain, as you do
not need to interpret what each condition described by the WHERE clause is conceptually
achieving.
Notice that the second query demonstrates the use of aliases b and
a in the ON clause for the
books and authors tables, respectively. The use of
these aliases in the ON clause is perfectly valid, and often preferable from a
perspective of improved readability.
In cases of simple equivalence joins, it may be more convenient for you to use either the
USING or NATURAL clauses instead of the
ON clause. These are only applicable on data sets with identically named columns. If you
have columns that define a relationship between two sets that are not identically named, you may still use the
USING or NATURAL clauses by employing column aliases,
as demonstrated in Example 4-37, to re-name one or both of the columns to a uniform
name.
Example 4-37. The NATURAL and USING clauses
booktown=# SELECT title, last_name, first_name
booktown-# FROM books INNER JOIN authors AS a (author_id)
booktown-# USING (author_id)
booktown-# WHERE last_name = 'Geisel';
title | last_name | first_name
-----------------------------+-----------+---------------
The Cat in the Hat | Geisel | Theodor Seuss
Bartholomew and the Oobleck | Geisel | Theodor Seuss
(2 rows)
booktown=# SELECT title, last_name, first_name
booktown-# FROM books NATURAL INNER JOIN authors AS a (author_id)
booktown-# WHERE last_name = 'Geisel';
title | last_name | first_name
-----------------------------+-----------+---------------
The Cat in the Hat | Geisel | Theodor Seuss
Bartholomew and the Oobleck | Geisel | Theodor Seuss
(2 rows)
The first SELECT statement in Example 4-37 assigns
the alias of author_id to the first column in the authors
table (which is actually named id). By passing the author_id
identifier to the USING clause, PostgreSQL then searches for a column identifier in each data
set with that name to join rows on values found to be equivalent.
Inner joins are adequate for a wide variety of queries, but there are times when an outer join is required to get
all of the data you need. The key to understanding the difference between inner and outer joins is in knowing how each
type of join handles rows that do not meet their defined relationship.
In short, an inner join will discard any row for which it cannot find a corresponding value between the sets being
joined (as specified by either the ON or USING clause).
In contrast to inner joins, an outer join can retain rows where corresponding values between sets
are not found, populating the missing columns with
NULL values. Whether or not the outer join does retain that row
depends on which set is missing the value and the kind of outer join that is specified.
There are three forms of outer joins:
Left outer joins
Will always return at least one instance of each row in the set of rows to the
left of the JOIN keyword. Missing columns in the right set
are populated with NULL values.
Right outer joins
Will always return at least one instance of each row in the set of rows to the
right of the JOIN keyword. Missing columns in the left set
are populated with NULL values.
Full outer joins
Will always return at least one instance of each row in each joined set. Missing
columns on either side of the new set will be populated with NULL
values.
Consider again Book Town's books table, and another Book Town table called
editions. While the books table stores general information
on a given title, the editions table stores specific information pertaining to each edition,
such as an the book's ISBN, publisher, and publication date. The editions table has a
column called book_id which corresponds to the books table's
primary key column, id.
Suppose that you want to retrieve each of Book Town's titles, along with its isbn, if applicable. Performing a
query with an inner join between the books and editions
tables will correctly return a data set with title and isbn columns. However, as demonstrated in Example 4-38, if a book does not yet have a printed edition (or if that edition has not yet been
entered into Book Town's database), those titles will not be displayed.
In contrast, the statement immediately following the inner join in Example 4-38
employs an outer join, returning 20 rows. Three of the returned rows do not have ISBN numbers, but are not omitted due to
the definition of the join.
Example 4-38. Inner joins versus outer joins
booktown=# SELECT title, isbn
booktown-# FROM books INNER JOIN editions
booktown-# ON (books.id = editions.book_id);
title | isbn
-----------------------------+------------
The Tell-Tale Heart | 1885418035
The Tell-Tale Heart | 0929605942
Little Women | 0760720002
The Velveteen Rabbit | 0679803335
Goodnight Moon | 0694003611
Bartholomew and the Oobleck | 0394800753
The Cat in the Hat | 039480001X
The Cat in the Hat | 0394900014
Dynamic Anatomy | 0823015505
2001: A Space Odyssey | 0451457994
2001: A Space Odyssey | 0451198492
Dune | 0441172717
Dune | 044100590X
The Shining | 0451160916
The Shining | 0385121679
Franklin in the Dark | 0590445065
Programming Python | 0596000855
(17 rows)
booktown=# SELECT title, isbn
booktown-# FROM books LEFT OUTER JOIN editions
booktown-# ON (books.id = editions.book_id);
title | isbn
-----------------------------+------------
The Tell-Tale Heart | 1885418035
The Tell-Tale Heart | 0929605942
Little Women | 0760720002
The Velveteen Rabbit | 0679803335
Goodnight Moon | 0694003611
Bartholomew and the Oobleck | 0394800753
The Cat in the Hat | 039480001X
The Cat in the Hat | 0394900014
Dynamic Anatomy | 0823015505
2001: A Space Odyssey | 0451457994
2001: A Space Odyssey | 0451198492
Dune | 0441172717
Dune | 044100590X
The Shining | 0451160916
The Shining | 0385121679
Franklin in the Dark | 0590445065
Practical PostgreSQL |
Programming Python | 0596000855
Learning Python |
Perl Cookbook |
(20 rows)
The join specified by the second query in Example 4-38 uses the
LEFT OUTER JOIN clause to define its join type. This is because
the query focuses on titles from the books table that
have ISBN numbers, and not those editions having ISBN numbers that do not correspond to titles. As the
books table is to the left of the JOIN
keyword, it is defined as a left outer join to achieve this. If the focus of the query was to see both ISBN numbers
without titles as well as titles without ISBN numbers, the same query could instead be modified to be a full outer join
with the FULL OUTER JOIN clause.
The difference between inner and outer joins illustrated in Example 4-38 is a vital
concept to understand, as misuse of joins can lead to both omitted and unexpected rows.
Note: The actual OUTER keyword is an optional term in a PostgreSQL outer join. Specifying a
join as either a LEFT JOIN, RIGHT JOIN or
FULL JOIN implicitly defines it as an outer join.
It should be understood that while a single JOIN clause connects only two sets of
data, in practice, joins are not restricted to only two data sources. You may arbitrarily specify numerous
JOIN clauses following sets that are themselves constructed from joins, just as you may
specify numerous data sources separated by commas.
When connecting several joins together, it is a good practice to group each join and sub-join within parentheses.
Explicitly grouping joins in this fashion insures that there is no ambiguity, to either PostgreSQL or a developer, as to
which data sets are joined, and in what order.
Example 4-39. Joining many data sources
booktown=# SELECT a.last_name, p.name AS publisher, e.isbn, s.subject
booktown-# FROM ((((authors AS a INNER JOIN books AS b
booktown(# ON (a.id = b.author_id))
booktown(# INNER JOIN editions AS e ON (e.book_id = b.id))
booktown(# INNER JOIN publishers AS p ON (p.id = e.publisher_id))
booktown(# INNER JOIN subjects AS s ON (s.id = b.subject_id));
last_name | publisher | isbn | subject
-----------+-----------------------------+------------+------------------
Hogarth | Watson-Guptill Publications | 0823015505 | Arts
Brown | HarperCollins | 0694003611 | Children's Books
Geisel | Random House | 0394800753 | Children's Books
Geisel | Random House | 039480001X | Children's Books
Geisel | Random House | 0394900014 | Children's Books
Bourgeois | Kids Can Press | 0590445065 | Children's Books
Bianco | Penguin | 0679803335 | Classics
Lutz | O'Reilly & Associates | 0596000855 | Computers
Alcott | Henry Holt & Company, Inc. | 0760720002 | Drama
Poe | Mojo Press | 1885418035 | Horror
Poe | Books of Wonder | 0929605942 | Horror
King | Doubleday | 0451160916 | Horror
King | Doubleday | 0385121679 | Horror
Clarke | Roc | 0451457994 | Science Fiction
Clarke | Roc | 0451198492 | Science Fiction
Herbert | Ace Books | 0441172717 | Science Fiction
Herbert | Ace Books | 044100590X | Science Fiction
(17 rows)
An interesting observation to be made about Example 4-39 is that, while the
books table is itself deeply involved in the join, none of its columns are retrieved in
the final result set. The books table is included in the JOIN clauses
in order to provide criteria through which other tables are joined together. Each of the tables whose columns are retrieved in the query rely on
the books table in order to draw relationships with any other table through the id
column (with the exception of the publishers table, which relates to the publisher_id
column in the editions table).
The GROUP BY clause introduces a powerful SQL concept:
aggregation. To aggregate means to gather into a sum, or whole. The practical effect of aggregating
in a SQL query is that any rows whose results from the GROUP BY expression match
identically are grouped together into a single aggregate row. The GROUP BY
expression may define a column, but it may also be any operation upon a column as well. If several columns or expressions
are specified (delimited by commas), the entire set of specified criteria must be identical for rows to be
grouped together.
To effectively use aggregation you must understand that any target columns requested by an aggregating query
which are not specified in the GROUP BY clause will be
inaccessible, unless selected through an aggregate function. An aggregate function accepts a column
name (or expression involving at least one column name) which can represent several values (i.e., from
several grouped rows), performs an operation on those values, and returns a single value.
Common aggregate functions include count(), which returns the number of rows in the
set, max(), which returns the maximum value in the column, and
min(), which returns the minimum value in the column. An aggregate function operates
only on rows in the query's result set, and is therefore executed after conditional joins and
WHERE conditions have been processed.
Imagine that you wanted to know how many books Book Town stores in its database for each known publisher. You
could perform a simple join between the editions and
publishers tables in order to associate each publisher name with a title that they publish.
It would be tedious to manually count how many titles each publisher maintained, and in cases of larger data sets,
it can become difficult to manage larger result sets.
Example 4-40 demonstrates a join between these two Book Town tables, but also introduces two new
elements: the count() function, and the GROUP BY clause.
Example 4-40. Using GROUP BY
booktown=# SELECT count(e.isbn) AS "number of books",
booktown-# p.name AS publisher
booktown-# FROM editions AS e INNER JOIN publishers AS p
booktown-# ON (e.publisher_id = p.id)
booktown-# GROUP BY p.name;
number of books | publisher
-----------------+-----------------------------
2 | Ace Books
1 | Books of Wonder
2 | Doubleday
1 | HarperCollins
1 | Henry Holt & Company, Inc.
1 | Kids Can Press
1 | Mojo Press
1 | O'Reilly & Associates
1 | Penguin
3 | Random House
2 | Roc
1 | Watson-Guptill Publications
(12 rows)
The GROUP BY clause in Example 4-40 instructs PostgreSQL to group
the rows in the joined data set by p.name, which in this query is a reference to the
name column in the publishers table. Therefore, any
rows that have the same publisher name will be grouped together, or aggregated. The
count() function then counts the number of isbn values
from the editions table that are in each aggregated row, and returns a single numeric
value representing the number of rows that were aggregated for each unique publisher.
Note that in Example 4-40 the argument of the editions table's
isbn column is chosen simply to indicate the objective of the example
(to count how many books there are per publisher). Any column name will return the same number, as
count() will always return the number of rows grouped in the current aggregate row.
Something to watch out for when designing aggregate queries is that the WHERE clause
cannot accept criteria involving aggregate functions. Instead, use the HAVING clause. It functions identically to the WHERE clause, but its conditions must be on aggregate
functions rather than single-row conditions. Syntactically, the HAVING clause follows the
GROUP BY clause, as demonstrated in Example 4-41.
Example 4-41. Using the HAVING clause
booktown=# SELECT count(e.isbn) AS "number of books",
booktown-# p.name AS publisher
booktown-# FROM editions AS e INNER JOIN publishers AS p
booktown-# ON (e.publisher_id = p.id)
booktown-# GROUP BY publisher
booktown-# HAVING count(e.isbn) > 1;
number of books | publisher
-----------------+--------------
2 | Ace Books
2 | Doubleday
3 | Random House
2 | Roc
(4 rows)
Both Example 4-40 and Example 4-41 create a data set through an inner join
between the editions and publishers table. However,
Example 4-41 constrains the final result to publishers having more than a single book in the
Book Town database, as set by the HAVING clause.
Note: If a result set's column is aliased via an AS clause to a name that overlaps
with a real column in one of the source data sets, and used in the GROUP BY clause,
PostgreSQL will assume that you are referring to the input column, not the output alias.
As described in Chapter 3, row data is not stored in a consistent order
within tables. In fact, an identical query executed twice is in no way guaranteed to return the rows in the same order
each time. As order is commonly an important part of retrieving data for database-dependent applications, use the
ORDER BY clause to allow flexible sorting of your result set.
The ORDER BY clause accepts as its parameters a list of comma-delimited column
names (or expressions upon columns), which are used as sorting criteria. For each sort criteria, you may optionally apply either the
ASC, DESC, or USING
keywords to control the type of sorting employed:
ASC
Causes the rows to sort by the related criteria in an ascending fashion (e.g., numbers will be
sorted lowest to highest, text will be sorted alphabetically from a to z). ASC is
equivalent to specifying USING <. Since it is the default behavior, specifying
ASC is only useful for explicit readability.
DESC
Causes the rows to sort by the related criteria in a descending fashion (e.g., numbers will be
sorted highest to lowest, text will be sorted alphabetically from z to a). DESC is
equivalent to specifying USING >.
USING operator
Allows the specification of the operator operator to be used to compare each
column for precedence. This can be particularly useful for custom operators.
Example 4-42 demonstrates the use of the ORDER BY clause on the
editions table. It specifies the publication column as
the source of values to sort by, and explicitly declares the ordering method as an ascending (ASC)
sort.
As you can see in the result set from Example 4-42, the rows return in ascending order, from
the oldest date to the newest. It should be noted that even columns and expressions that do not appear in the target
list of the SELECT statement may be used to sort the retrieved rows. Furthermore,
aggregate functions and expressions are allowed by the ORDER BY clause if the
query involves aggregation. The ability to sort by such a wide scope of sources thus allows for a great deal of
flexibility in ordering results from a variety of query approaches.
Warning
If a column alias in the result set has the same name as a literal column in an input source from which it is
drawing rows, and it is used in the ORDER BY clause, PostgreSQL will assume that it is
a reference to the named column in the result set, not the column in the source set. This is an accepted inconsistency
compared against the default behavior of the GROUP BY clause, as specified by the
SQL92 standard.
When specifying multiple expressions to sort by, the result set will be ordered by the first criteria (from left to
right), and will only process subsequent sorting criteria if the first condition's sort is inconclusive. For example,
consider the sorting performed in Example 4-43.
Example 4-43. Using ORDER BY with multiple expressions
The query in Example 4-43 selects the numeric
edition and publication date of each book
from the editions table. The ORDER BY clause
then specifies two columns to sort by: edition, in ascending order, and
publication, in descending order.
As you can see in the result set for Example 4-43, each row is first sorted
by edition, proceeding from the lower editions to the higher editions. Subsequently, wherever the editions are identical,
the publication date is used to then sort again, from the most recent publication date to the least recent.
Sorting is extremely relevant when using the DISTINCT keyword, as discussed in
the Section called Removing Duplicate Rows with DISTINCT." If you are only interested in seeing the most recently published
copy of each edition in the editions table, the ORDER BY
and DISTINCT clauses can be combined to achieve an effect somewhat similar to the
GROUP BY clause, as shown in Example 4-44.
Example 4-44. Using DISTINCT with ORDER BY
booktown=# SELECT DISTINCT ON (edition)
booktown-# edition, publication
booktown-# FROM editions
booktown-# ORDER BY edition ASC,
booktown-# publication DESC;
edition | publication
---------+-------------
1 | 1995-03-28
2 | 2001-03-01
3 | 2000-09-12
(3 rows)
booktown=# SELECT edition, max(publication)
booktown-# FROM editions
booktown-# GROUP BY edition;
edition | max
---------+------------
1 | 1995-03-28
2 | 2001-03-01
3 | 2000-09-12
(3 rows)
Since the ORDER BY occurring before the DISTINCT
clause eliminates duplicate rows, the net effect can be very similar to using the max() or
min() with a GROUP BY clause. This technique can sometimes be
more efficient, depending on the complexity of the aggregation and sorting involved.
Note: While never strictly necessary, PostgreSQL can accept integer constants as expressions in the
ORDER BY clause, instead of column names or expressions. Such a constant will be
interpreted as representing the column that is at the numbered position in the target list, from left to right,
starting at 1 (e.g., ORDER BY 1 ASC references the first column in the result
set).
PostgreSQL enforces no limit upon the number of rows retrievable from a SQL query. If you attempt to execute a
query that returns several million rows, it may take a while, but the server will not stop until it has returned the
entire result set (or until it is interrupted).
Applications could conceivably be written to programmatically "page" through large sets of data after retrieval, but SQL
provides as a convenience the LIMIT and OFFSET clauses,
which allow for the retrieval of a specified portion of the generated result set.
When the LIMIT clause is specified, no more than the requested number of rows will
be returned (though there may be fewer if the result set is smaller than the passed parameter). When the
OFFSET clause is specified, it skips the number of rows defined by its parameters before
returning rows. If both are specified, the number of rows to be included as per the LIMIT
clause will not be counted until the number of rows dictated by the OFFSET clause have
been skipped.
Example 4-45. Using LIMIT and OFFSET
booktown=# SELECT isbn, title, publication
booktown-# FROM editions NATURAL JOIN books AS b (book_id)
booktown-# ORDER BY publication DESC
booktown-# LIMIT 5;
isbn | title | publication
------------+-----------------------+-------------
0596000855 | Programming Python | 2001-03-01
0451457994 | 2001: A Space Odyssey | 2000-09-12
0451198492 | 2001: A Space Odyssey | 1999-10-01
044100590X | Dune | 1999-10-01
0929605942 | The Tell-Tale Heart | 1998-12-01
(5 rows)
booktown=# SELECT isbn, title, publication
booktown-# FROM editions NATURAL JOIN books AS b (book_id)
booktown-# ORDER BY publication DESC
booktown-# LIMIT 5
booktown-# OFFSET 2;
isbn | title | publication
------------+-----------------------+-------------
0451198492 | 2001: A Space Odyssey | 1999-10-01
044100590X | Dune | 1999-10-01
0929605942 | The Tell-Tale Heart | 1998-12-01
0441172717 | Dune | 1998-09-01
1885418035 | The Tell-Tale Heart | 1995-03-28
(5 rows)
Example 4-45 demonstrates, in the first query, a simple use of
LIMIT, by retrieving only 5 rows from the joined set of the
editions and books table. Ordinarily, such a join would
result in 17 rows.
The second query in Example 4-45 shows the use of the
OFFSET clause, to shift the scope of the result set down by two rows. You can see that
the last three rows of the first query's result set overlap with the first three rows of the second query's result set. The
ORDER BY clause in each of these queries insures the consistency of the sets returned.
Note: The ORDER BY clause can be a helpful tool for making sure that the
results of a limited query are relevant. This is because sorting occurs before limiting, allowing you to determine which rows end up being limited.
While joins are used in SQL to combine column values into a single row, the UNION,
INTERSECT and EXCEPT clauses exist to merge or omit row
data by comparing column values, returning a new result set based on this comparison. Each of these keywords may be used
at the end of a valid SQL query and followed by a second query, in order to compare the resultant data sets, and then either
merge or omit rows based on that comparison.
When comparing data sets in this manner, it is required that they each have the same number of columns, as well as the same column type.
Note that they do not need to have the same name, or be queried from the same table or data source.
UNION
A pair of queries merged with the UNION keyword will combine all
non-distinct rows into a single data set. Like rows will not be duplicated.
INTERSECT
A pair of queries merged with the INTERSECT keyword will cause
any rows not found in both data sets to be omitted. As such, the only rows returned are those that overlap
between the two query result sets.
EXCEPT
A pair of queries merged with the EXCEPT keyword will cause any rows
found in both data sets to be omitted from the returned data set. As such, only rows found in the query to the
left of the EXCEPT clause that are not
found in the query to the right of the clause will be returned.
Example 4-46, Example 4-47, and Example 4-48 each demonstrate
these keywords by combining and omitting rows from comparative data sets. Example 4-46 creates a result
set by combining several authors' last names with book titles via the UNION
keyword.
Example 4-47 demonstrates the selection of ISBN numbers from the books
table, limited to rows which intersect with the query on the shipments table for books which
have records of more than two shipments. Finally, Example 4-48 demonstrates the removal of any rows from the
first query which are matched completely in the second.
Example 4-46. Using UNION
booktown=# SELECT title FROM books
booktown-# UNION
booktown-# SELECT last_name FROM authors
booktown-# LIMIT 11;
title
-----------------------------
2001: A Space Odyssey
Alcott
Bartholomew and the Oobleck
Bianco
Bourgeois
Brautigan
Brite
Brown
Christiansen
Clarke
Denham
(11 rows)
Example 4-47. Using INTERSECT
booktown=# SELECT isbn FROM editions
booktown-# INTERSECT
booktown-# SELECT isbn FROM shipments
booktown-# GROUP BY isbn
booktown-# HAVING count(id) > 2;
isbn
------------
039480001X
0394800753
0451160916
0590445065
0694003611
(5 rows)
Example 4-48. Using EXCEPT
booktown=# SELECT last_name, first_name
booktown-# FROM authors
booktown-# EXCEPT
booktown-# SELECT last_name, first_name
booktown-# FROM authors AS a (author_id)
booktown-# NATURAL INNER JOIN books
booktown-# ORDER BY first_name ASC;
last_name | first_name
-----------+------------
Denham | Ariel
Gorey | Edward
Brite | Poppy Z.
Brautigan | Richard
(4 rows)
In Example 4-48, only rows that do not match the second query are returned. Notice that the
effective result of this is that only authors who do not have a book in the books table
are returned. This is due to the INNER JOIN clause, which causes the second query to
omit any authors whose author_id is not found in the
books table.
While the use of these keywords in a single SQL query precludes the ability to use the
LIMIT clause, this limitation can be circumvented by PostgreSQL's support for
sub-queries. By grouping in parentheses each of the queries involved between a UNION,
EXCEPT, or EXCEPT clause, the returned result sets
from the sub-queries are compared, as demonstrated in Example 4-49.
Example 4-49. Comparing sub-query result sets
booktown=# (SELECT title FROM books ORDER BY title DESC LIMIT 7)
booktown-# EXCEPT
booktown-# (SELECT title FROM books ORDER BY title ASC LIMIT 11)
booktown-# ORDER BY title DESC;
title
----------------------
The Velveteen Rabbit
The Tell-Tale Heart
The Shining
The Cat in the Hat
(4 rows)
Notice that the query used in Example 4-49 creates a set from the books
table that is constrained to the last seven rows and sorted alphabetically by title.
The EXCEPT clause then removes from that data set the first eleven rows, sorted
alphabetically in an ascending fashion. The result consists of the
last four rows from the table, sorted from the bottom by the final ORDER BY clause on
the new exception set.
In order to achieve simple programmatic transformations without having to call out to a procedural language,
PostgreSQL supports standard SQL case expressions. These use the SQL keywords
CASE, WHEN, THEN,
and END to allow basic conditional transformations per each row.
The entirety of a case expression is syntactically placed within the SELECT
statement's target list. A case expression's result column is named case by default, but
it may be aliased in the same manner as any normal target list. The general syntax for a case expression in a
SELECT statement's target list is as follows:
CASE WHEN condition1 THEN result1
WHEN condition2 THEN result2
[ ... ]
[ ELSE default_result ]
END [ AS alias ]
The CASE, WHEN,
THEN, and ELSE keywords are somewhat similar to the
if-then-else logic in programming languages. The condition of a
WHEN clause must return a Boolean result.
When a WHEN condition is met, the result from its corresponding
THEN clause will return in the result column for that row. If no conditions are
met, the ELSE clause may be used to specify a default result value. If there are no
results found for a case expression, NULL is returned.
Example 4-50. Using case expressions in statements
booktown=# SELECT isbn,
booktown-# CASE WHEN cost > 20 THEN 'over $20.00 cost'
booktown-# WHEN cost = 20 THEN '$20.00 cost'
booktown-# ELSE 'under $20.00 cost'
booktown-# END AS cost_range
booktown-# FROM stock
booktown-# LIMIT 8;
isbn | cost_range
------------+-------------------
0385121679 | over $20.00 cost
039480001X | over $20.00 cost
044100590X | over $20.00 cost
0451198492 | over $20.00 cost
0394900014 | over $20.00 cost
0441172717 | under $20.00 cost
0451160916 | over $20.00 cost
0679803335 | $20.00 cost
(8 rows)
Adding to the power of case expressions are PostgreSQL's sub-queries, described in the Section called Using Sub-Queries." As demonstrated in Example 4-51, a sub-query
may be provided as a result within a conditional expression.
Example 4-51. Using case expressions with sub-queries
booktown=# SELECT isbn,
booktown-# CASE WHEN cost > 20 THEN 'N/A - (Out of price range)'
booktown-# ELSE (SELECT title FROM books b JOIN editions e
booktown(# ON (b.id = e.book_id)
booktown(# WHERE e.isbn = stock.isbn)
booktown-# END AS cost_range
booktown-# FROM stock
booktown-# ORDER BY cost_range ASC
booktown-# LIMIT 8;
isbn | cost_range
------------+-----------------------------
0451457994 | 2001: A Space Odyssey
0394800753 | Bartholomew and the Oobleck
0441172717 | Dune
0760720002 | Little Women
0385121679 | N/A - (Out of price range)
039480001X | N/A - (Out of price range)
044100590X | N/A - (Out of price range)
0451198492 | N/A - (Out of price range)
(8 rows)
In Example 4-51, any book found to have a cost of less than
20 has its title returned via a sub-select to the books table,
along with its ISBN from the main query to the stock table.
The INTO TABLE clause may be used with any valid SELECT query in order to create a
new table with the column structure and row data of the returned result set. The syntax for this is as follows:
SELECT select_targets
INTO [ TABLE ] new_table
FROM old_table;
This syntax performs an implicit CREATE TABLE command, creating a table with the same
column names, value types, and row data as the result set from the original table. When the message
SELECT is returned, you will know that the statement was successfully performed, and
the new table created. This is demonstrated in Example 4-52, which creates a backup table
called stock_backup out of the data in the stock table.
Example 4-52. Using SELECT INTO
booktown=# SELECT * INTO stock_backup
booktown-# FROM stock;
SELECT
The table specified by the INTO clause must not exist, or else an error will be
returned. Upon the error, the values of the query will not be inserted and the
query will fail. Note that the TABLE keyword, in this query, is an optional noise term.