While PostgreSQL abstracts the literal data files from its users, all of the data within PostgreSQL databases can be
found in normal system files. These files are constantly in a state of flux when the database is running, as not all
changes made in PostgreSQL are immediately written to disk. These files are stored within the PostgreSQL directory (e.g.,
/usr/local/pgsql/data
, or whatever path is specified by the
PGDATA environment variable).
Rather than creating a set of SQL instructions to re-create a database, you may instead opt to stop the PostgreSQL
server (in order to ensure that all changes are synchronized with the hard disk) and create a backup of that area of the
filesystem. Typically this is done with the
tar
utility, and optionally compressed with a compression
utility such as
gzip
.
Example 9-25 demonstrates the backing up of a filesystem in which PostgreSQL
keeps its data, as performed by the user who owns the data files (which is the same user that runs PostgreSQL's backend). In
this case, the path is
/usr/local/pgsql/data
, and the system user is
postgres
.
Example 9-25. Backing up the PostgreSQL filesystem
[postgres@booktown ~]$
cd /usr/local/pgsql
[postgres@booktown pgsql]$
pg_ctl stop
Smart Shutdown request at Fri Sep 14 14:54:15 2001
DEBUG: shutting down
waiting for postmaster to shut down......DEBUG: database system is shut down
done
postmaster successfully shut down
[postgres@booktown pgsql]$
tar czf pgsql.bak.tar.gz data/
[postgres@booktown pgsql]$
ls -l *.tar.gz
-rw-rw-r- - 1 postgres postgres 66124795 Sep 14 14:36 pgsql.bak.tar.gz
Notice that the
pg_ctl
command is called before the
tar
process, to stop
the PostgreSQL backend (if installed, the SysV script may be invoked with the
service
command to the
same end). As stated, this is to ensure that the most recent changes to the database have been synchronized with the hard
disk, as well as to verify that none of the data files are modified while backed up.
The primary advantage to backing up your data in this fashion is that you have a
literal
backup of
PostgreSQL's data files. In order to restore a crashed database from this kind of file, it needs to be decompressed in the
appropriate location, and for the backend to be re-started. There is no need for negotiation of options, ownership, or
potential conflicts between the
pg_dump
output and restoring it to a live PostgreSQL server through
sequential SQL statements.
However, while this method is easier to implement, it presents several limitations. First, the database must be shut
down completely to backup or restore the data, eliminating the primary advantage of a hot backup-capable DBMS, which is
limited downtime. Further, it is not possible to backup only specific databases, or tables. The entire data directory must
be backed up for a complete restoration of the filesystem. This is because there are many files associated with a
particular database, and it is not obvious which files correlate to which databases.
Finally, because more than abstract information is represented on disk by a live database, a much greater amount of
disk space is required to backup even a compressed copy of the entire data directory.