This option causes all files to be put in the archive to be tested for
sparseness, and handled specially if they are. The --sparse
(-S) option is useful when many dbm files, for example, are being
backed up. Using this option dramatically decreases the amount of
space needed to store such a file.
In later versions, this option may be removed, and the testing and
treatment of sparse files may be done automatically with any special
GNU options. For now, it is an option needing to be specified on
the command line with the creation or updating of an archive.
Files in the file system occasionally have “holes.” A hole in a file
is a section of the file's contents which was never written. The
contents of a hole read as all zeros. On many operating systems,
actual disk storage is not allocated for holes, but they are counted
in the length of the file. If you archive such a file, tar
could create an archive longer than the original. To have tar
attempt to recognize the holes in a file, use --sparse (-S). When
you use this option, then, for any file using less disk space than
would be expected from its length, tar searches the file for
consecutive stretches of zeros. It then records in the archive for
the file where the consecutive stretches of zeros are, and only
archives the “real contents” of the file. On extraction (using
--sparse is not needed on extraction) any such
files have holes created wherever the continuous stretches of zeros
were found. Thus, if you use --sparse, tar archives
won't take more space than the original.
A file is sparse if it contains blocks of zeros whose existence is
recorded, but that have no space allocated on disk. When you specify
the --sparse option in conjunction with the --create
(-c) operation, tar tests all files for sparseness
while archiving. If tar finds a file to be sparse, it uses a
sparse representation of the file in the archive. See create, for
more information about creating archives.
--sparse is useful when archiving files, such as dbm files,
likely to contain many nulls. This option dramatically
decreases the amount of space needed to store such an archive.
Please Note: Always use --sparse when performing file
system backups, to avoid archiving the expanded forms of files stored
sparsely in the system.
Even if your system has no sparse files currently, some may be
created in the future. If you use --sparse while making file
system backups as a matter of course, you can be assured the archive
will never take more space on the media than the files take on disk
(otherwise, archiving a disk filled with sparse files might take
hundreds of tapes).
tar ignores the --sparse option when reading an archive.
--sparse
-S
Files stored sparsely in the file system are represented sparsely in
the archive. Use in conjunction with write operations.
However, users should be well aware that at archive creation time,
GNU tar still has to read whole disk file to
locate the holes, and so, even if sparse files use little space
on disk and in the archive, they may sometimes require inordinate
amount of time for reading and examining all-zero blocks of a file.
Although it works, it's painfully slow for a large (sparse) file, even
though the resulting tar archive may be small. (One user reports that
dumping a core file of over 400 megabytes, but with only about
3 megabytes of actual data, took about 9 minutes on a Sun Sparcstation
ELC, with full CPU utilization.)
This reading is required in all cases and is not related to the fact
the --sparse option is used or not, so by merely not
using the option, you are not saving time1.
Programs like dump do not have to read the entire file; by
examining the file system directly, they can determine in advance
exactly where the holes are and thus avoid reading through them. The
only data it need read are the actual allocated data blocks.
GNU tar uses a more portable and straightforward
archiving approach, it would be fairly difficult that it does
otherwise. Elizabeth Zwicky writes to comp.unix.internals, on
1990-12-10:
What I did say is that you cannot tell the difference between a hole and an
equivalent number of nulls without reading raw blocks. st_blocks at
best tells you how many holes there are; it doesn't tell you where.
Just as programs may, conceivably, care what st_blocks is (care
to name one that does?), they may also care where the holes are (I have
no examples of this one either, but it's equally imaginable).
I conclude from this that good archivers are not portable. One can
arguably conclude that if you want a portable program, you can in good
conscience restore files with as many holes as possible, since you can't
get it right.
Footnotes
[1] Well! We should say
the whole truth, here. When --sparse is selected while creating
an archive, the current tar algorithm requires sparse files to be
read twice, not once. We hope to develop a new archive format for saving
sparse files in which one pass will be sufficient.
Published under the terms of the GNU General Public License