The cpio archive formats, like tar, do have maximum
pathname lengths. The binary and old ASCII formats have a max path
length of 256, and the new ASCII and CRC ASCII formats have a max
path length of 1024. GNU cpio can read and write archives
with arbitrary pathname lengths, but other cpio implementations
may crash unexplainedly trying to read them.
tar handles symbolic links in the form in which it comes in BSD;
cpio doesn't handle symbolic links in the form in which it comes
in System V prior to SVR4, and some vendors may have added symlinks
to their system without enhancing cpio to know about them.
Others may have enhanced it in a way other than the way I did it
at Sun, and which was adopted by AT&T (and which is, I think, also
present in the cpio that Berkeley picked up from AT&T and put
into a later BSD release—I think I gave them my changes).
(SVR4 does some funny stuff with tar; basically, its cpio
can handle tar format input, and write it on output, and it
probably handles symbolic links. They may not have bothered doing
anything to enhance tar as a result.)
cpio handles special files; traditional tar doesn't.
tar comes with V7, System III, System V, and BSD source;
cpio comes only with System III, System V, and later BSD
(4.3-tahoe and later).
tar's way of handling multiple hard links to a file can handle
file systems that support 32-bit inumbers (e.g., the BSD file system);
cpios way requires you to play some games (in its "binary"
format, i-numbers are only 16 bits, and in its "portable ASCII" format,
they're 18 bits—it would have to play games with the "file system ID"
field of the header to make sure that the file system ID/i-number pairs
of different files were always different), and I don't know which
cpios, if any, play those games. Those that don't might get
confused and think two files are the same file when they're not, and
make hard links between them.
tars way of handling multiple hard links to a file places only
one copy of the link on the tape, but the name attached to that copy
is the only one you can use to retrieve the file; cpios
way puts one copy for every link, but you can retrieve it using any
of the names.
What type of check sum (if any) is used, and how is this calculated.
See the attached manual pages for tar and cpio format.
tar uses a checksum which is the sum of all the bytes in the
tar header for a file; cpio uses no checksum.
If anyone knows why cpio was made when tar was present
at the unix scene,
It wasn't. cpio first showed up in PWB/UNIX 1.0; no
generally-available version of UNIX had tar at the time. I don't
know whether any version that was generally available within AT&T
had tar, or, if so, whether the people within AT&T who did
cpio knew about it.
On restore, if there is a corruption on a tape tar will stop at
that point, while cpio will skip over it and try to restore the
rest of the files.
The main difference is just in the command syntax and header format.
tar is a little more tape-oriented in that everything is blocked
to start on a record boundary.
Is there any differences between the ability to recover crashed
archives between the two of them. (Is there any chance of recovering
crashed archives at all.)
Theoretically it should be easier under tar since the blocking
lets you find a header with some variation of ‘dd skip=nn’.
However, modern cpio's and variations have an option to just
search for the next file header after an error with a reasonable chance
of resyncing. However, lots of tape driver software won't allow you to
continue past a media error which should be the only reason for getting
out of sync unless a file changed sizes while you were writing the
archive.
If anyone knows why cpio was made when tar was present
at the unix scene, please tell me about this too.
Probably because it is more media efficient (by not blocking everything
and using only the space needed for the headers where tar
always uses 512 bytes per file header) and it knows how to archive
special files.
You might want to look at the freely available alternatives. The
major ones are afio, GNU tar, and
pax, each of which have their own extensions with some
backwards compatibility.
Sparse files were tarred as sparse files (which you can
easily test, because the resulting archive gets smaller, and
GNU cpio can no longer read it).
Published under the terms of the GNU General Public License