Binary Files and Translation
In the most general sense, Subversion handles binary files
more gracefully than CVS does. Because CVS uses RCS, it can
only store successive full copies of a changing binary file.
Subversion, however, expresses differences between files using a
binary-differencing algorithm, regardless of whether they
contain textual or binary data. That means that all files are
stored differentially (compressed) in the repository.
CVS users have to mark binary files with
-kb
flags, to prevent data from being garbled
(due to keyword expansion and line-ending translations). They
sometimes forget to do this.
Subversion takes the more paranoid route—first, it never
performs any kind of keyword or line-ending translation unless
you explicitly ask it do so (see
the section called “svn:keywords
” and
the section called “svn:eol-style
” for more details). By default,
Subversion treats all file data as literal byte strings, and
files are always stored in the repository in an untranslated
state.
Second, Subversion maintains an internal notion of whether a
file is “text” or “binary” data, but
this notion is
only
extant in the working
copy. During an
svn update
, Subversion will
perform contextual merges on locally modified text files, but
will not attempt to do so for binary files.
To determine whether a contextual merge is possible,
Subversion examines the svn:mime-type
property. If the file has no svn:mime-type
property, or has a mime-type that is textual (e.g.
text/*
),
Subversion assumes it is text. Otherwise, Subversion assumes
the file is binary. Subversion also helps users by running a
binary-detection algorithm in the
svn import
and
svn add
commands. These commands will
make a good guess and then (possibly) set a binary
svn:mime-type
property on the file being
added. (If Subversion guesses wrong, the user can always remove
or hand-edit the property.)