The standard UNIX archiving utility.
[1]
Originally a
Tape ARchiving program, it has
developed into a general purpose package that can handle
all manner of archiving with all types of destination
devices, ranging from tape drives to regular files to even
stdout (see Example 3-4). GNU
tar has been patched to accept various compression
filters, such as tar czvf archive_name.tar.gz
*, which recursively archives and gzips all files in a directory
tree except dotfiles
in the current working directory ($PWD).
[2]
Some useful tar options:
-c create (a new
archive)
-x extract (files from
existing archive)
--delete delete (files
from existing archive)
This option will not work on magnetic tape
devices.
It may be difficult to recover data from a
corrupted gzipped tar
archive. When archiving important files, make multiple
backups.
shar
Shell archiving utility. The files in a shell archive
are concatenated without compression, and the
resultant archive is essentially a shell script,
complete with #!/bin/sh header,
and containing all the necessary unarchiving
commands. Shar archives
still show up in Internet newsgroups, but otherwise
shar has been pretty well replaced by
tar/gzip. The
unshar command unpacks
shar archives.
ar
Creation and manipulation utility for archives, mainly
used for binary object file libraries.
rpm
The Red Hat Package Manager, or
rpm utility provides a wrapper for
source or binary archives. It includes commands for
installing and checking the integrity of packages, among
other things.
A simple rpm -i package_name.rpm
usually suffices to install a package, though there are many
more options available.
rpm -qf identifies which package a
file originates from.
bash$ rpm -qf /bin/lscoreutils-5.2.1-31
rpm -qa gives a
complete list of all installed rpm packages
on a given system. An rpm -qa package_name
lists only the package(s) corresponding to
package_name.
This specialized archiving copy command
(copy
input and output)
is rarely seen any more, having been supplanted by
tar/gzip. It still
has its uses, such as moving a directory tree.
Example 12-27. Using cpio to move a directory tree
#!/bin/bash
# Copying a directory tree using 'cpio.'
# Advantages of using 'cpio':
# Speed of copying. It's faster than 'tar' with pipes.
# Well suited for copying special files (named pipes, etc.)
#+ that 'cp' may choke on.
ARGS=2
E_BADARGS=65
if [ $# -ne "$ARGS" ]
then
echo "Usage: `basename $0` source destination"
exit $E_BADARGS
fi
source=$1
destination=$2
find "$source" -depth | cpio -admvp "$destination"
# ^^^^^ ^^^^^
# Read the 'find' and 'cpio' man page to decipher these options.
# Exercise:
# --------
# Add code to check the exit status ($?) of the 'find | cpio' pipe
#+ and output appropriate error messages if anything went wrong.
exit 0
rpm2cpio
This command extracts a
cpio archive from an rpm one.
Example 12-28. Unpacking an rpm archive
#!/bin/bash
# de-rpm.sh: Unpack an 'rpm' archive
: ${1?"Usage: `basename $0` target-file"}
# Must specify 'rpm' archive name as an argument.
TEMPFILE=$$.cpio # Tempfile with "unique" name.
# $$ is process ID of script.
rpm2cpio < $1 > $TEMPFILE # Converts rpm archive into cpio archive.
cpio --make-directories -F $TEMPFILE -i # Unpacks cpio archive.
rm -f $TEMPFILE # Deletes cpio archive.
exit 0
# Exercise:
# Add check for whether 1) "target-file" exists and
#+ 2) it is really an rpm archive.
# Hint: parse output of 'file' command.
Compression
gzip
The standard GNU/UNIX compression utility, replacing
the inferior and proprietary
compress. The corresponding decompression
command is gunzip, which is the equivalent of
gzip -d.
The zcat filter decompresses a
gzipped file to
stdout, as possible input to a pipe or
redirection. This is, in effect, a cat
command that works on compressed files (including files
processed with the older compress
utility). The zcat command is equivalent to
gzip -dc.
On some commercial UNIX systems, zcat
is a synonym for uncompress -c,
and will not work on gzipped
files.
An alternate compression utility, usually more efficient
(but slower) than gzip, especially on
large files. The corresponding decompression command is
bunzip2.
Newer versions of tar have been patched with
bzip2 support.
compress, uncompress
This is an older, proprietary compression
utility found in commercial UNIX distributions. The
more efficient gzip has largely
replaced it. Linux distributions generally include a
compress workalike for compatibility,
although gunzip can unarchive files
treated with compress.
The znew command transforms
compressed files into
gzipped ones.
sq
Yet another compression utility, a filter that works
only on sorted ASCII word lists. It uses the standard
invocation syntax for a filter, sq < input-file >
output-file. Fast, but not nearly as efficient
as gzip. The corresponding
uncompression filter is unsq, invoked
like sq.
The output of sq may be
piped to gzip for further
compression.
zip, unzip
Cross-platform file archiving and compression utility
compatible with DOS pkzip.exe.
"Zipped" archives seem to be a more
acceptable medium of exchange on the Internet than
"tarballs".
unarc, unarj, unrar
These Linux utilities permit unpacking archives
compressed with the DOS arc.exe,
arj.exe, and
rar.exe programs.
File Information
file
A utility for identifying file types. The command
file file-name will return a
file specification for file-name,
such as ascii text or
data. It references
the magic numbers
found in /usr/share/magic,
/etc/magic, or
/usr/lib/magic, depending on the
Linux/UNIX distribution.
The -f option causes
file to run in batch mode, to read from
a designated file a list of filenames to analyze. The
-z option, when used on a compressed
target file, forces an attempt to analyze the uncompressed
file type.
bash$ file test.tar.gztest.tar.gz: gzip compressed data, deflated, last modified: Sun Sep 16 13:34:51 2001, os: Unixbash file -z test.tar.gztest.tar.gz: GNU tar archive (gzip compressed data, deflated, last modified: Sun Sep 16 13:34:51 2001, os: Unix)
# Find sh and Bash scripts in a given directory:
DIRECTORY=/usr/local/bin
KEYWORD=Bourne
# Bourne and Bourne-Again shell scripts
file $DIRECTORY/* | fgrep $KEYWORD
# Output:
# /usr/local/bin/burn-cd: Bourne-Again shell script text executable
# /usr/local/bin/burnit: Bourne-Again shell script text executable
# /usr/local/bin/cassette.sh: Bourne shell script text executable
# /usr/local/bin/copy-cd: Bourne-Again shell script text executable
# . . .
Example 12-29. Stripping comments from C program files
#!/bin/bash
# strip-comment.sh: Strips out the comments (/* COMMENT */) in a C program.
E_NOARGS=0
E_ARGERROR=66
E_WRONG_FILE_TYPE=67
if [ $# -eq "$E_NOARGS" ]
then
echo "Usage: `basename $0` C-program-file" >&2 # Error message to stderr.
exit $E_ARGERROR
fi
# Test for correct file type.
type=`file $1 | awk '{ print $2, $3, $4, $5 }'`
# "file $1" echoes file type . . .
# Then awk removes the first field of this, the filename . . .
# Then the result is fed into the variable "type".
correct_type="ASCII C program text"
if [ "$type" != "$correct_type" ]
then
echo
echo "This script works on C program files only."
echo
exit $E_WRONG_FILE_TYPE
fi
# Rather cryptic sed script:
#--------
sed '
/^\/\*/d
/.*\*\//d
' $1
#--------
# Easy to understand if you take several hours to learn sed fundamentals.
# Need to add one more line to the sed script to deal with
#+ case where line of code has a comment following it on same line.
# This is left as a non-trivial exercise.
# Also, the above code deletes non-comment lines with a "*/" --
#+ not a desirable result.
exit 0
# ----------------------------------------------------------------
# Code below this line will not execute because of 'exit 0' above.
# Stephane Chazelas suggests the following alternative:
usage() {
echo "Usage: `basename $0` C-program-file" >&2
exit 1
}
WEIRD=`echo -n -e '\377'` # or WEIRD=$'\377'
[[ $# -eq 1 ]] || usage
case `file "$1"` in
*"C program text"*) sed -e "s%/\*%${WEIRD}%g;s%\*/%${WEIRD}%g" "$1" \
| tr '\377\n' '\n\377' \
| sed -ne 'p;n' \
| tr -d '\n' | tr '\377' '\n';;
*) usage;;
esac
# This is still fooled by things like:
# printf("/*");
# or
# /* /* buggy embedded comment */
#
# To handle all special cases (comments in strings, comments in string
#+ where there is a \", \\" ...) the only way is to write a C parser
#+ (using lex or yacc perhaps?).
exit 0
which
which command-xxx gives the full path
to "command-xxx". This is useful for finding
out whether a particular command or utility is installed
on the system.
$bash which rm
/usr/bin/rm
whereis
Similar to which, above,
whereis command-xxx gives the
full path to "command-xxx", but also to its
manpage.
$bash whereis rm
rm: /bin/rm /usr/share/man/man1/rm.1.bz2
whatis
whatis filexxx looks up
"filexxx" in the
whatis database. This is useful
for identifying system commands and important configuration
files. Consider it a simplified man
command.
$bash whatis whatis
whatis (1) - search the whatis database for complete words
Example 12-30. Exploring /usr/X11R6/bin
#!/bin/bash
# What are all those mysterious binaries in /usr/X11R6/bin?
DIRECTORY="/usr/X11R6/bin"
# Try also "/bin", "/usr/bin", "/usr/local/bin", etc.
for file in $DIRECTORY/*
do
whatis `basename $file` # Echoes info about the binary.
done
exit 0
# You may wish to redirect output of this script, like so:
# ./what.sh >>whatis.db
# or view it a page at a time on stdout,
# ./what.sh | less
The locate command searches for files using a
database stored for just that purpose. The
slocate command is the secure version of
locate (which may be aliased to
slocate).
$bash locate hickson
/usr/lib/xephem/catalogs/hickson.edb
readlink
Disclose the file that a symbolic link points to.
bash$ readlink /usr/bin/awk../../bin/gawk
strings
Use the strings command to find
printable strings in a binary or data file. It will list
sequences of printable characters found in the target
file. This might be handy for a quick 'n dirty examination
of a core dump or for looking at an unknown graphic image
file (strings image-file | more might
show something like JFIF,
which would identify the file as a jpeg
graphic). In a script, you would probably
parse the output of strings
with grep or sed. See Example 10-7
and Example 10-9.
Example 12-31. An "improved"strings
command
#!/bin/bash
# wstrings.sh: "word-strings" (enhanced "strings" command)
#
# This script filters the output of "strings" by checking it
#+ against a standard word list file.
# This effectively eliminates gibberish and noise,
#+ and outputs only recognized words.
# ===========================================================
# Standard Check for Script Argument(s)
ARGS=1
E_BADARGS=65
E_NOFILE=66
if [ $# -ne $ARGS ]
then
echo "Usage: `basename $0` filename"
exit $E_BADARGS
fi
if [ ! -f "$1" ] # Check if file exists.
then
echo "File \"$1\" does not exist."
exit $E_NOFILE
fi
# ===========================================================
MINSTRLEN=3 # Minimum string length.
WORDFILE=/usr/share/dict/linux.words # Dictionary file.
# May specify a different
#+ word list file
#+ of one-word-per-line format.
wlist=`strings "$1" | tr A-Z a-z | tr '[:space:]' Z | \
tr -cs '[:alpha:]' Z | tr -s '\173-\377' Z | tr Z ' '`
# Translate output of 'strings' command with multiple passes of 'tr'.
# "tr A-Z a-z" converts to lowercase.
# "tr '[:space:]'" converts whitespace characters to Z's.
# "tr -cs '[:alpha:]' Z" converts non-alphabetic characters to Z's,
#+ and squeezes multiple consecutive Z's.
# "tr -s '\173-\377' Z" converts all characters past 'z' to Z's
#+ and squeezes multiple consecutive Z's,
#+ which gets rid of all the weird characters that the previous
#+ translation failed to deal with.
# Finally, "tr Z ' '" converts all those Z's to whitespace,
#+ which will be seen as word separators in the loop below.
# ****************************************************************
# Note the technique of feeding the output of 'tr' back to itself,
#+ but with different arguments and/or options on each pass.
# ****************************************************************
for word in $wlist # Important:
# $wlist must not be quoted here.
# "$wlist" does not work.
# Why not?
do
strlen=${#word} # String length.
if [ "$strlen" -lt "$MINSTRLEN" ] # Skip over short strings.
then
continue
fi
grep -Fw $word "$WORDFILE" # Match whole words only.
# ^^^ # "Fixed strings" and
#+ "whole words" options.
done
exit $?
Comparison
diff, patch
diff: flexible file comparison
utility. It compares the target files line-by-line
sequentially. In some applications, such as comparing
word dictionaries, it may be helpful to filter the
files through sort
and uniq before piping them
to diff. diff file-1
file-2 outputs the lines in the files that
differ, with carets showing which file each particular
line belongs to.
The --side-by-side option to
diff outputs each compared file, line by
line, in separate columns, with non-matching lines marked. The
-c and -u options likewise
make the output of the command easier to interpret.
There are available various fancy frontends for
diff, such as sdiff,
wdiff, xdiff, and
mgdiff.
The diff command returns an exit
status of 0 if the compared files are identical, and 1 if
they differ. This permits use of diff
in a test construct within a shell script (see
below).
A common use for diff is generating
difference files to be used with patch
The -e option outputs files suitable
for ed or ex
scripts.
patch: flexible versioning
utility. Given a difference file generated by
diff, patch can
upgrade a previous version of a package to a newer version.
It is much more convenient to distribute a relatively
small "diff" file than the entire body of a
newly revised package. Kernel "patches" have
become the preferred method of distributing the frequent
releases of the Linux kernel.
patch -p1 <patch-file
# Takes all the changes listed in 'patch-file'
# and applies them to the files referenced therein.
# This upgrades to a newer version of the package.
Patching the kernel:
cd /usr/src
gzip -cd patchXX.gz | patch -p0
# Upgrading kernel source using 'patch'.
# From the Linux kernel docs "README",
# by anonymous author (Alan Cox?).
The diff command can also
recursively compare directories (for the filenames
present).
bash$ diff -r ~/notes1 ~/notes2Only in /home/bozo/notes1: file02
Only in /home/bozo/notes1: file03
Only in /home/bozo/notes2: file04
Use zdiff to compare
gzipped files.
diff3
An extended version of diff that compares
three files at a time. This command returns an exit value
of 0 upon successful execution, but unfortunately this gives
no information about the results of the comparison.
bash$ diff3 file-1 file-2 file-3====
1:1c
This is line 1 of "file-1".
2:1c
This is line 1 of "file-2".
3:1c
This is line 1 of "file-3"
sdiff
Compare and/or edit two files in order to merge
them into an output file. Because of its interactive nature,
this command would find little use in a script.
cmp
The cmp command is a simpler version of
diff, above. Whereas diff
reports the differences between two files,
cmp merely shows at what point they
differ.
Like diff, cmp
returns an exit status of 0 if the compared files are
identical, and 1 if they differ. This permits use in a test
construct within a shell script.
Example 12-32. Using cmp to compare two files
within a script.
#!/bin/bash
ARGS=2 # Two args to script expected.
E_BADARGS=65
E_UNREADABLE=66
if [ $# -ne "$ARGS" ]
then
echo "Usage: `basename $0` file1 file2"
exit $E_BADARGS
fi
if [[ ! -r "$1" || ! -r "$2" ]]
then
echo "Both files to be compared must exist and be readable."
exit $E_UNREADABLE
fi
cmp $1 $2 &> /dev/null # /dev/null buries the output of the "cmp" command.
# cmp -s $1 $2 has same result ("-s" silent flag to "cmp")
# Thank you Anders Gustavsson for pointing this out.
#
# Also works with 'diff', i.e., diff $1 $2 &> /dev/null
if [ $? -eq 0 ] # Test exit status of "cmp" command.
then
echo "File \"$1\" is identical to file \"$2\"."
else
echo "File \"$1\" differs from file \"$2\"."
fi
exit 0
Use zcmp on
gzipped files.
comm
Versatile file comparison utility. The files must be
sorted for this to be useful.
comm
-optionsfirst-filesecond-file
comm file-1 file-2 outputs three columns:
column 1 = lines unique to file-1
column 2 = lines unique to file-2
column 3 = lines common to both.
The options allow suppressing output of one or more columns.
-1 suppresses column
1
-2 suppresses column
2
-3 suppresses column
3
-12 suppresses both columns
1 and 2, etc.
Utilities
basename
Strips the path information from a file name, printing
only the file name. The construction basename
$0 lets the script know its name, that is, the name it
was invoked by. This can be used for "usage" messages if,
for example a script is called with missing arguments:
echo "Usage: `basename $0` arg1 arg2 ... argn"
dirname
Strips the basename from
a filename, printing only the path information.
basename and dirname
can operate on any arbitrary string. The argument
does not need to refer to an existing file, or
even be a filename for that matter (see Example A-7).
Example 12-33. basename and dirname
#!/bin/bash
a=/home/bozo/daily-journal.txt
echo "Basename of /home/bozo/daily-journal.txt = `basename $a`"
echo "Dirname of /home/bozo/daily-journal.txt = `dirname $a`"
echo
echo "My own home is `basename ~/`." # `basename ~` also works.
echo "The home of my home is `dirname ~/`." # `dirname ~` also works.
exit 0
split, csplit
These are utilities for splitting a file into smaller
chunks. They are usually used for splitting up large files
in order to back them up on floppies or preparatory to
e-mailing or uploading them.
The csplit command splits a file
according to context, the split occuring
where patterns are matched.
sum, cksum, md5sum, sha1sum
These are utilities for generating checksums. A
checksum is a number mathematically
calculated from the contents of a file, for the purpose
of checking its integrity. A script might refer to a list
of checksums for security purposes, such as ensuring
that the contents of key system files have not been
altered or corrupted. For security applications, use the
md5sum (message
digest 5
checksum) command, or better yet,
the newer sha1sum (Secure Hash
Algorithm).
The cksum command shows the size,
in bytes, of its target, whether file or
stdout.
The md5sum and
sha1sum commands display a
dash when they receive their input from
stdout.
Example 12-34. Checking file integrity
#!/bin/bash
# file-integrity.sh: Checking whether files in a given directory
# have been tampered with.
E_DIR_NOMATCH=70
E_BAD_DBFILE=71
dbfile=File_record.md5
# Filename for storing records (database file).
set_up_database ()
{
echo ""$directory"" > "$dbfile"
# Write directory name to first line of file.
md5sum "$directory"/* >> "$dbfile"
# Append md5 checksums and filenames.
}
check_database ()
{
local n=0
local filename
local checksum
# ------------------------------------------- #
# This file check should be unnecessary,
#+ but better safe than sorry.
if [ ! -r "$dbfile" ]
then
echo "Unable to read checksum database file!"
exit $E_BAD_DBFILE
fi
# ------------------------------------------- #
while read record[n]
do
directory_checked="${record[0]}"
if [ "$directory_checked" != "$directory" ]
then
echo "Directories do not match up!"
# Tried to use file for a different directory.
exit $E_DIR_NOMATCH
fi
if [ "$n" -gt 0 ] # Not directory name.
then
filename[n]=$( echo ${record[$n]} | awk '{ print $2 }' )
# md5sum writes records backwards,
#+ checksum first, then filename.
checksum[n]=$( md5sum "${filename[n]}" )
if [ "${record[n]}" = "${checksum[n]}" ]
then
echo "${filename[n]} unchanged."
elif [ "`basename ${filename[n]}`" != "$dbfile" ]
# Skip over checksum database file,
#+ as it will change with each invocation of script.
# ---
# This unfortunately means that when running
#+ this script on $PWD, tampering with the
#+ checksum database file will not be detected.
# Exercise: Fix this.
then
echo "${filename[n]} : CHECKSUM ERROR!"
# File has been changed since last checked.
fi
fi
let "n+=1"
done <"$dbfile" # Read from checksum database file.
}
# =================================================== #
# main ()
if [ -z "$1" ]
then
directory="$PWD" # If not specified,
else #+ use current working directory.
directory="$1"
fi
clear # Clear screen.
echo " Running file integrity check on $directory"
echo
# ------------------------------------------------------------------ #
if [ ! -r "$dbfile" ] # Need to create database file?
then
echo "Setting up database file, \""$directory"/"$dbfile"\"."; echo
set_up_database
fi
# ------------------------------------------------------------------ #
check_database # Do the actual work.
echo
# You may wish to redirect the stdout of this script to a file,
#+ especially if the directory checked has many files in it.
exit 0
# For a much more thorough file integrity check,
#+ consider the "Tripwire" package,
#+ https://sourceforge.net/projects/tripwire/.
Securely erase a file by overwriting it multiple times with
random bit patterns before deleting it. This command has
the same effect as Example 12-55, but does it
in a more thorough and elegant manner.
This is one of the GNU fileutils.
Advanced forensic technology may still be able to
recover the contents of a file, even after application of
shred.
Encoding and Encryption
uuencode
This utility encodes binary files into ASCII characters, making them
suitable for transmission in the body of an e-mail message or in a
newsgroup posting.
uudecode
This reverses the encoding, decoding uuencoded files back into the
original binaries.
Example 12-35. Uudecoding encoded files
#!/bin/bash
# Uudecodes all uuencoded files in current working directory.
lines=35 # Allow 35 lines for the header (very generous).
for File in * # Test all the files in $PWD.
do
search1=`head -$lines $File | grep begin | wc -w`
search2=`tail -$lines $File | grep end | wc -w`
# Uuencoded files have a "begin" near the beginning,
#+ and an "end" near the end.
if [ "$search1" -gt 0 ]
then
if [ "$search2" -gt 0 ]
then
echo "uudecoding - $File -"
uudecode $File
fi
fi
done
# Note that running this script upon itself fools it
#+ into thinking it is a uuencoded file,
#+ because it contains both "begin" and "end".
# Exercise:
# --------
# Modify this script to check each file for a newsgroup header,
#+ and skip to next if not found.
exit 0
The fold -s command
may be useful (possibly in a pipe) to process long uudecoded
text messages downloaded from Usenet newsgroups.
mimencode, mmencode
The mimencode and
mmencode commands process
multimedia-encoded e-mail attachments. Although
mail user agents (such as
pine or kmail)
normally handle this automatically, these particular
utilities permit manipulating such attachments manually
from the command line or in a batch by means of a shell
script.
crypt
At one time, this was the standard UNIX file encryption
utility.
[3]
Politically motivated government regulations
prohibiting the export of encryption software resulted
in the disappearance of crypt
from much of the UNIX world, and it is still
missing from most Linux distributions. Fortunately,
programmers have come up with a number of decent
alternatives to it, among them the author's very own cruft
(see Example A-4).
Miscellaneous
mktemp
Create a temporary file[4]
with a "unique" filename. When invoked
from the command line without additional arguments,
it creates a zero-length file in the /tmp directory.
bash$ mktemp/tmp/tmp.zzsvql3154
PREFIX=filename
tempfile=`mktemp $PREFIX.XXXXXX`
# ^^^^^^ Need at least 6 placeholders
#+ in the filename template.
# If no filename template supplied,
#+ "tmp.XXXXXXXXXX" is the default.
echo "tempfile name = $tempfile"
# tempfile name = filename.QA2ZpY
# or something similar...
# Creates a file of that name in the current working directory
#+ with 600 file permissions.
# A "umask 177" is therefore unnecessary,
# but it's good programming practice anyhow.
make
Utility for building and compiling binary packages.
This can also be used for any set of operations that is
triggered by incremental changes in source files.
The make command checks a
Makefile, a list of file dependencies and
operations to be carried out.
install
Special purpose file copying command, similar to
cp, but capable of setting permissions
and attributes of the copied files. This command seems
tailormade for installing software packages, and as such it
shows up frequently in Makefiles
(in the make install :
section). It could likewise find use in installation
scripts.
dos2unix
This utility, written by Benjamin Lin and collaborators,
converts DOS-formatted text files (lines terminated by
CR-LF) to UNIX format (lines terminated by LF only),
and vice-versa.
ptx
The ptx [targetfile] command
outputs a permuted index (cross-reference list) of the
targetfile. This may be further filtered and formatted in a
pipe, if necessary.
more, less
Pagers that display a text file or stream to
stdout, one screenful at a time.
These may be used to filter the output of
stdout . . . or of a script.
An interesting application of more
is to "test drive" a command sequence,
to forestall potentially unpleasant consequences.
ls /home/bozo | awk '{print "rm -rf " $1}' | more
# ^^^^
# Testing the effect of the following (disastrous) command line:
# ls /home/bozo | awk '{print "rm -rf " $1}' | sh
# Hand off to the shell to execute . . . ^^
This is a symmetric block cipher, used to
encrypt files on a single system or local network, as opposed
to the "public key" cipher class, of which
pgp is a well-known
example.