Version Control with Subversion - Repository Maintenance - Repository Backup
Repository Backup
Despite numerous advances in technology since the birth of
the modern computer, one thing unfortunately rings true with
crystalline clarity—sometimes, things go very, very
awry. Power outages, network connectivity dropouts, corrupt
RAM and crashed hard drives are but a taste of the evil that
Fate is poised to unleash on even the most conscientious
administrator. And so we arrive at a very important
topic—how to make backup copies of your repository
data.
There are generally two types of backup methods available
for Subversion repository administrators—incremental and
full. We discussed in an earlier section of this chapter how
to use
svnadmin dump --incremental
to
perform an incremental backup (see
the section called “Migrating a Repository”). Essentially, the idea is to
only backup at a given time the changes to the repository
since the last time you made a backup.
A full backup of the repository is quite literally a
duplication of the entire repository directory (which includes
either Berkeley database or FSFS environment). Now, unless
you temporarily disable all other access to your repository,
simply doing a recursive directory copy runs the risk of
generating a faulty backup, since someone might be currently
writing to the database.
In the case of Berkeley DB, Sleepycat documents describe a
certain order in which database files can be copied that will
guarantee a valid backup copy. And a similar ordering exists
for FSFS data. Better still, you don't have to implement
these algorithms yourself, because the Subversion development
team has already done so. The
hot-backup.py
script is found in the
tools/backup/ directory of the Subversion
source distribution. Given a repository path and a backup
location,
hot-backup.py
—which is
really just a more intelligent wrapper around the
svnadmin hotcopy
command—will perform
the necessary steps for backing up your live
repository—without requiring that you bar public
repository access at all—and then will clean out the
dead Berkeley log files from your live repository.
Even if you also have an incremental backup, you might
want to run this program on a regular basis. For example, you
might consider adding
hot-backup.py
to a
program scheduler (such as
cron
on Unix
systems). Or, if you prefer fine-grained backup solutions,
you could have your post-commit hook script call
hot-backup.py
(see
the section called “Hook Scripts”), which will then cause a new
backup of your repository to occur with every new revision
created. Simply add the following to the
hooks/post-commit script in your live
repository directory:
(cd /path/to/hook/scripts; ./hot-backup.py ${REPOS} /path/to/backups &)
The resulting backup is a fully functional Subversion
repository, able to be dropped in as a replacement for your
live repository should something go horribly wrong.
There are benefits to both types of backup methods. The
easiest is by far the full backup, which will always result in
a perfect working replica of your repository. This again
means that should something bad happen to your live
repository, you can restore from the backup with a simple
recursive directory copy. Unfortunately, if you are
maintaining multiple backups of your repository, these full
copies will each eat up just as much disk space as your live
repository.
Incremental backups using the repository dump format are
excellent to have on hand if the database schema changes
between successive versions of Subversion itself. Since a
complete repository dump and load are generally required to
upgrade your repository to the new schema, it's very
convenient to already have half of that process (the dump
part) finished. Unfortunately, the creation of—and
restoration from—incremental backups takes longer, as
each commit is effectively replayed into either the dump file
or the repository.
In either backup scenario, repository administrators need
to be aware of how modifications to unversioned revision
properties affect their backups. Since these changes do not
themselves generate new revisions, they will not trigger
post-commit hooks, and may not even trigger the
pre-revprop-change and post-revprop-change hooks.
[19]
And since you can change revision properties without respect
to chronological order—you can change any revision's
properties at any time—an incremental backup of the
latest few revisions might not catch a property modification
to a revision that was included as part of a previous
backup.
Generally speaking, only the truly paranoid would need to
backup their entire repository, say, every time a commit
occurred. However, assuming that a given repository has some
other redundancy mechanism in place with relatively fine
granularity (like per-commit emails), a hot backup of the
database might be something that a repository administrator
would want to include as part of a system-wide nightly backup.
For most repositories, archived commit emails alone provide
sufficient redundancy as restoration sources, at least for the
most recent few commits. But it's your data—protect it
as much as you'd like.
Often, the best approach to repository backups is a
diversified one. You can leverage combinations of full and
incremental backups, plus archives of commit emails. The
Subversion developers, for example, back up the Subversion
source code repository after every new revision is created,
and keep an archive of all the commit and property change
notification emails. Your solution might be similar, but
should be catered to your needs and that delicate balance of
convenience with paranoia. And while all of this might not
save your hardware from the iron fist of Fate,
[20]
it should certainly help you recover from those trying
times.
[an error occurred while processing this directive]
|