Version Control with Subversion - Repository Maintenance - svndumpfilter
svndumpfilter
Since Subversion stores everything in an opaque database
system, attempting manual tweaks is unwise, if not quite
difficult. And once data has been stored in your
repository, Subversion generally doesn't provide an
easy way to remove that data.
[15]
But inevitably, there will be times when you would like to
manipulate the history of your repository. You might need
to strip out all instances of a file that was accidentally
added to the repository (and shouldn't be there for whatever
reason). Or, perhaps you have multiple projects sharing a
single repository, and you decide to split them up into
their own repositories. To accomplish tasks like this,
administrators need a more manageable and malleable
representation of the data in their repositories—the
Subversion repository dump format.
The Subversion repository dump format is a
human-readable representation of the changes that you've
made to your versioned data over time. You use the
svnadmin dump
command to generate the
dump data, and
svnadmin load
to populate
a new repository with it (see
the section called “Migrating a Repository”). The great thing about the
human-readability aspect of the dump format is that, if you
aren't careless about it, you can manually inspect and
modify it. Of course, the downside is that if you have two
years' worth of repository activity encapsulated in what is
likely to be a very large dump file, it could take you a
long, long time to manually inspect and modify it.
While it won't be the most commonly used tool at the
administrator's disposal,
svndumpfilter
provides a very particular brand of useful
functionality—the ability to quickly and easily modify
that dump data by acting as a path-based filter. Simply
give it either a list of paths you wish to keep, or a list
of paths you wish to not keep, then pipe your repository
dump data through this filter. The result will be a
modified stream of dump data that contains only the
versioned paths you (explicitly or implicitly) requested.
The syntax of
svndumpfilter
is as
follows:
$ svndumpfilter help
general usage: svndumpfilter SUBCOMMAND [ARGS & OPTIONS ...]
Type "svndumpfilter help <subcommand>" for help on a specific subcommand.
Available subcommands:
exclude
include
help (?, h)
There are only two interesting subcommands. They allow
you to make the choice between explicit or implicit
inclusion of paths in the stream:
-
exclude
-
Filter out a set of paths from the dump data
stream.
-
include
-
Allow only the requested set of paths to pass
through the dump data stream.
Let's look a realistic example of how you might use this
program. We discuss elsewhere (see
the section called “Choosing a Repository Layout”) the process of deciding how to
choose a layout for the data in your
repositories—using one repository per project or
combining them, arranging stuff within your repository, and
so on. But sometimes after new revisions start flying in,
you rethink your layout and would like to make some changes.
A common change is the decision to move multiple projects
which are sharing a single repository into separate
repositories for each project.
Our imaginary repository contains three projects:
calc , calendar , and
spreadsheet . They have been living
side-by-side in a layout like this:
/
calc/
trunk/
branches/
tags/
calendar/
trunk/
branches/
tags/
spreadsheet/
trunk/
branches/
tags/
To get these three projects into their own repositories,
we first dump the whole repository:
$ svnadmin dump /path/to/repos > repos-dumpfile
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
* Dumped revision 3.
…
$
Next, run that dump file through the filter, each time
including only one of our top-level directories, and
resulting in three new dump files:
$ cat repos-dumpfile | svndumpfilter include calc > calc-dumpfile
…
$ cat repos-dumpfile | svndumpfilter include calendar > cal-dumpfile
…
$ cat repos-dumpfile | svndumpfilter include spreadsheet > ss-dumpfile
…
$
At this point, you have to make a decision. Each of
your dump files will create a valid repository,
but will preserve the paths exactly as they were in the
original repository. This means that even though you would
have a repository solely for your calc
project, that repository would still have a top-level
directory named calc . If you want
your trunk , tags ,
and branches directories to live in the
root of your repository, you might wish to edit your
dump files, tweaking the Node-path and
Node-copyfrom-path headers to no longer have
that first calc/ path component. Also,
you'll want to remove the section of dump data that creates
the calc directory. It will look
something like:
Node-path: calc
Node-action: add
Node-kind: dir
Content-length: 0
Warning
If you do plan on manually editing the dump file to
remove a top-level directory, make sure that your editor is
not set to automatically convert end-lines to the native
format (e.g. \r\n to \n) as the content will then not agree
with the metadata and this will render the dump file
useless.
All that remains now is to create your three new
repositories, and load each dump file into the right
repository:
$ svnadmin create calc; svnadmin load calc < calc-dumpfile
<<< Started new transaction, based on original revision 1
* adding path : Makefile ... done.
* adding path : button.c ... done.
…
$ svnadmin create calendar; svnadmin load calendar < cal-dumpfile
<<< Started new transaction, based on original revision 1
* adding path : Makefile ... done.
* adding path : cal.c ... done.
…
$ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile
<<< Started new transaction, based on original revision 1
* adding path : Makefile ... done.
* adding path : ss.c ... done.
…
$
Both of
svndumpfilter
's subcommands
accept options for deciding how to deal with
“empty” revisions. If a given revision
contained only changes to paths that were filtered out, that
now-empty revision could be considered uninteresting or even
unwanted. So to give the user control over what to do with
those revisions,
svndumpfilter
provides
the following command-line options:
-
--drop-empty-revs
-
Do not generate empty revisions at all—just
omit them.
-
--renumber-revs
-
If empty revisions are dropped (using the
--drop-empty-revs option), change the
revision numbers of the remaining revisions so that
there are no gaps in the numeric sequence.
-
--preserve-revprops
-
If empty revisions are not dropped, preserve the
revision properties (log message, author, date, custom
properties, etc.) for those empty revisions.
Otherwise, empty revisions will only contain the
original datestamp, and a generated log message that
indicates that this revision was emptied by
svndumpfilter
.
While
svndumpfilter
can be very
useful, and a huge timesaver, there are unfortunately a
couple of gotchas. First, this utility is overly sensitive
to path semantics. Pay attention to whether paths in your
dump file are specified with or without leading slashes.
You'll want to look at the Node-path and
Node-copyfrom-path headers.
…
Node-path: spreadsheet/Makefile
…
If the paths have leading slashes, you should
include leading slashes in the paths you pass to
svndumpfilter include
and
svndumpfilter exclude
(and if they don't,
you shouldn't). Further, if your dump file has an inconsistent
usage of leading slashes for some reason,
[16]
you should probably normalize those paths so they all
have, or lack, leading slashes.
Also, copied paths can give you some trouble.
Subversion supports copy operations in the repository, where
a new path is created by copying some already existing path.
It is possible that at some point in the lifetime of your
repository, you might have copied a file or directory from
some location that
svndumpfilter
is
excluding, to a location that it is including. In order to
make the dump data self-sufficient,
svndumpfilter
needs to still show the
addition of the new path—including the contents of any
files created by the copy—and not represent that
addition as a copy from a source that won't exist in your
filtered dump data stream. But because the Subversion
repository dump format only shows what was changed in each
revision, the contents of the copy source might not be
readily available. If you suspect that you have any copies
of this sort in your repository, you might want to rethink
your set of included/excluded paths.
[an error occurred while processing this directive]
|