|
Version Control with Subversion - Chapter 5. Repository Administration - Berkeley DB
Berkeley DB
When the initial design phase of Subversion was in
progress, the developers decided to use Berkeley DB for a
variety of reasons, including its open-source license,
transaction support, reliability, performance, API
simplicity, thread-safety, support for cursors, and so
on.
Berkeley DB provides real transaction
support—perhaps its most powerful feature. Multiple
processes accessing your Subversion repositories don't have
to worry about accidentally clobbering each other's data.
The isolation provided by the transaction system is such
that for any given operation, the Subversion repository code
sees a static view of the database—not a database that
is constantly changing at the hand of some other
process—and can make decisions based on that view. If
the decision made happens to conflict with what another
process is doing, the entire operation is rolled back as if
it never happened, and Subversion gracefully retries the
operation against a new, updated (and yet still static) view
of the database.
Another great feature of Berkeley DB is hot
backups—the ability to backup the database
environment without taking it “offline”. We'll
discuss how to backup your repository in
the section called “Repository Backup”, but the benefits of being
able to make fully functional copies of your repositories
without any downtime should be obvious.
Berkeley DB is also a very reliable database system.
Subversion uses Berkeley DB's logging facilities, which
means that the database first writes to on-disk log files a
description of any modifications it is about to make, and
then makes the modification itself. This is to ensure that
if anything goes wrong, the database system can back up to
a previous checkpoint—a
location in the log files known not to be corrupt—and
replay transactions until the data is restored to a usable
state. See
the section called “Managing Disk Space” for more
about Berkeley DB log files.
But every rose has its thorn, and so we must note some
known limitations of Berkeley DB. First, Berkeley DB
environments are not portable. You cannot simply copy a
Subversion repository that was created on a Unix system onto
a Windows system and expect it to work. While much of the
Berkeley DB database format is architecture independent,
there are other aspects of the environment that are not.
Secondly, Subversion uses Berkeley DB in a way that will not
operate on Windows 95/98 systems—if you need to house
a repository on a Windows machine, stick with Windows 2000
or Windows XP. Also, you should never keep a Berkeley DB
repository on a network share. While Berkeley DB promises
to behave correctly on network shares that meet a particular
set of specifications, almost no known shares actually meet
all those specifications.
Finally, because Berkeley DB is a library linked
directly into Subversion, it's more sensitive to
interruptions than a typical relational database system.
Most SQL systems, for example, have a dedicated server
process that mediates all access to tables. If a program
accessing the database crashes for some reason, the database
daemon notices the lost connection and cleans up any mess
left behind. And because the database daemon is the only
process accessing the tables, applications don't need to
worry about permission conflicts. These things are not the
case with Berkeley DB, however. Subversion (and programs
using Subversion libraries) access the database tables
directly, which means that a program crash can leave the
database in a temporarily inconsistent, inaccessible state.
When this happens, an administrator needs to ask Berkeley DB
to restore to a checkpoint, which is a bit of an annoyance.
Other things can cause a repository to “wedge”
besides crashed processes, such as programs conflicting over
ownership and permissions on the database files. So while a
Berkeley DB repository is quite fast and scalable, it's best
used by a single server process running as one
user—such as Apache's
httpd
or
svnserve
(see
Chapter 6, Server Configuration
)—rather than accessing it as
many different users via file:/// or
svn+ssh:// URLs. If using a Berkeley DB
repository directly as multiple users, be sure to read
the section called “Supporting Multiple Repository Access Methods”.
[an error occurred while processing this directive]
|
|