PostgreSQL is a sophisticated object-relational database management system (ORDBMS).
An ORDBMS is an extension of the
more traditional relational database management systems (RDBMS).
An RDBMS enables users to store related
pieces of data in two-dimensional data structures called
tables. This data may consist of many defined
types, such as integers, floating-point numbers,
character strings, and timestamps. Data inserted in the table can be
categorized using a grid-like system of vertical
columns, and horizontal rows.
The relational model was built on a strong premise of conceptual
simplicity, which is arguably both its most prominent strength and
weakness.
The object-relational aspect of PostgreSQL adds numerous enhancements to
the straight relational data model. These include support for
arrays (multiple values in a single column),
inheritance (child-parent relationships between
tables), and functions (programmatic methods
invoked by SQL statements). For the advanced developer, PostgreSQL even
supports extensibility of its data types and procedural languages.
Due to this object-relational concept, tables are sometimes called
classes, while rows and columns can be referred to as object-instances
and object-attributes, respectively. We will use this terminology
interchangeably in this book. Other SQL data structures, such as indices
and views, can be referred to as database objects.
Note: Take care to observe that object-relational is not synonymous with
object-oriented, a term pertaining to many modern programming languages. While PostgreSQL supports
several objective improvements to the relational model, it is still accurate to refer to PostgreSQL as a relational
database management system (RDBMS).
While PostgreSQL is commonly considered an RDBMS, or a
"database," it may not be commonly understood what is meant
specifically by the word database. A database
within PostgreSQL is an object-relational implementation of what is
formally called a schema in SQL99.
Put simply, a database is a stored set of data that is
logically interrelated. Typically, this data can be accessed in a
multiuser environment. This is the case with PostgreSQL, though there
are well-defined rights and restrictions enforced with that
access.
It may not be commonly understood that PostgreSQL can
have several databases concurrently
available, each with their own owner, and each with their own unique
tables, views, indices, sequences, and functions.
In order to create a table, function, or any other database
object, you must connect to a specific database via a PostgreSQL
client. Once connected, you can create an object,
which is then owned by the connected database,
and therefore is inaccessible from any other database (though a
client may have several connections open to different
databases).
By keeping fundamental data objects segregated into their own
databases in this fashion, you run a smaller risk of running into a
naming collision by choosing a table name already chosen for another
purpose (e.g., if two users each wanted to have a table called
products for two separate applications). This is because neither
database has any knowledge of the other database's components, and
will not attempt to make any kind of logical relationship between
them. Furthermore, as the same rule applies to object-relational data
objects, users may even create functions and language definitions
within their database that are inaccessible to other users connected
to other databases running within PostgreSQL.
By default, PostgreSQL installs only one functional database, which is called
template1 to represent the
template nature of the database. Any database created after template1 is essentially a clone, inheriting any of its
database objects, including table structure, functions, languages, etc. It is not uncommon to create a default database for
new PostgreSQL users with the same name as their PostgreSQL username, as PostgreSQL will attempt to connect to a database
with the same name as the connecting user if a database name is not specified.
Tables are quite possibly the most important aspect of SQL to
understand inside and out, as all of your data will reside within
them. In order to be able to correctly plan and design your SQL data
structures, and any programmatic routines toward accessing and
applying that data, a thorough understanding of tables is an absolute
pre-requisite.
A table is composed of columns and rows, and their intersections are fields. If you have ever used spreadsheet software before (such as Excel), these two terms are visually represented in the same manner, and the fields within a table are equivalent to the cells within a spreadsheet. From a general perspective, columns within a table describe the name and type of data that will be found (and can be entered) by row for that column's fields. Rows within a table represent records composed of fields that are described from left to right by their corresponding column's name and type. Each field in a row is implicitly correlated with each other field in that row. In this sense, columns can be thought of as descriptors for the discrete, sequential elements of a row, and each row can be thought of as a stored record matching that description.
Table 3-1 illustrates a simple table called books, used by
our imaginary bookstore, Book Town. We will frequently refer to this table in later examples. Each of its stored records describes a book by a numeric identifier, title, author identifier, and subject
identifier. These characteristics, from left to right, are described by its columns (id,
title, author_id, and
subject_id).
Table 3-1. An example SQL table
id | title | author_id | subject_id |
---|
7808 | The Shining | 4156 | 9 |
156 | The Tell-Tale Heart | 15 | 9 |
4513 | Dune | 1866 | 15 |
4267 | 2001: A Space Odyssey | 2001 | 15 |
1608 | The Cat in the Hat | 1809 | 2 |
1590 | Bartholomew and the Oobleck | 1809 | 2 |
As you can see, this describes a table with four columns, in a fixed, left-to-right order, currently populated by six
rows (also known as tuples, or records). It is essential to understand that in a relational database, while a table has a
fixed column order, rows themselves are inherently unordered. You will see later, as the SQL's query structure is explained
in Chapter 4, that there are ways within SQL to order selected rows. However, the rows in the database
itself are not automatically ordered in any consistently predictable way. When order is meaningful for a SQL query, you must carefully consider and explicitly order records.
Every table must have at least one column, but tables may at times contain no rows, because each vertical
column corresponds to a relatively fixed attribute of the data represented in that table (such as the
title column in the previous example's books table).
Without a column, a row's contents would be ambiguous; without a row, a table is merely lacking recorded
data. As of PostgreSQL 7.1, there is a maximum of 1600 columns to a table, and an unlimited number of rows (i.e., you are limited
only by hardware limitations, such as disk space).
In Table 3-1, the column names fairly clearly indicate the significance of each column. The
decision of how to name columns is fairly arbitrary, though, and care must be taken in planning table names and conventions
to avoid ambiguity.
Though it may not be immediately obvious, each of the columns of a table have an associated data type.
While a column's data type helps to further describe the sort of information it contains, it
constrains the kind of data that may be inserted into the column. For example, the
author_id column is of type integer; this signifies that
any insertion attempts not consisting of pure a integer (e.g., 110a) will fail. These types are described in more detail
in the Section called Data Types."
This section introduced the general concepts of how data is logically arranged in a relational database and within
tables. The next section explains why statements are the basis for all interactions
with the database.