Node:
Binary,
Previous:
Brief,
Up:
Comparison
Binary Files and Forcing Text Comparisons
If diff
thinks that either of the two files it is comparing is
binary (a non-text file), it normally treats that pair of files much as
if the summary output format had been selected (see Brief), and
reports only that the binary files are different. This is because line
by line comparisons are usually not meaningful for binary files.
diff
determines whether a file is text or binary by checking the
first few bytes in the file; the exact number of bytes is system
dependent, but it is typically several thousand. If every byte in
that part of the file is non-null, diff
considers the file to be
text; otherwise it considers the file to be binary.
Sometimes you might want to force diff
to consider files to be
text. For example, you might be comparing text files that contain
null characters; diff
would erroneously decide that those are
non-text files. Or you might be comparing documents that are in a
format used by a word processing system that uses null characters to
indicate special formatting. You can force diff
to consider all
files to be text files, and compare them line by line, by using the
-a
or --text
option. If the files you compare using this
option do not in fact contain text, they will probably contain few
newline characters, and the diff
output will consist of hunks
showing differences between long lines of whatever characters the files
contain.
You can also force diff
to consider all files to be binary files,
and report only whether they differ (but not how). Use the
-q
or --brief
option for this.
Differing binary files are considered to cause trouble because the
resulting diff
output does not capture all the differences.
This trouble causes diff
to exit with status 2. However,
this trouble cannot occur with the --a
or --text
option, or with the -q
or --brief
option, as these
options both cause diff
to treat binary files like text
files.
In operating systems that distinguish between text and binary files,
diff
normally reads and writes all data as text. Use the
--binary
option to force diff
to read and write binary
data instead. This option has no effect on a POSIX-compliant system
like GNU or traditional Unix. However, many personal computer
operating systems represent the end of a line with a carriage return
followed by a newline. On such systems, diff
normally ignores
these carriage returns on input and generates them at the end of each
output line, but with the --binary
option diff
treats
each carriage return as just another input character, and does not
generate a carriage return at the end of each output line. This can be
useful when dealing with non-text files that are meant to be
interchanged with POSIX-compliant systems.
The --strip-trailing-cr
causes diff
to treat input
lines that end in carriage return followed by newline as if they end
in plain newline. This can be useful when comparing text that is
imperfectly imported from many personal computer operating systems.
This option affects how lines are read, which in turn affects how they
are compared and output.
If you want to compare two files byte by byte, you can use the
cmp
program with the -l
option to show the values
of each differing byte in the two files. With GNU cmp
,
you can also use the -b
option to show the ASCII
representation of those bytes. See Invoking cmp, for more
information.
If diff3
thinks that any of the files it is comparing is binary
(a non-text file), it normally reports an error, because such
comparisons are usually not useful. diff3
uses the same test as
diff
to decide whether a file is binary. As with diff
, if
the input files contain a few non-text bytes but otherwise are like
text files, you can force diff3
to consider all files to be text
files and compare them line by line by using the -a
or
--text
options.