The zlib, gzip and
bz2 modules provide essential data and file
compression tools. Data files are often built for speedy processing, and
may contain characters which are meaningless spacing. This extraneous
data can be reduced in size, or compressed.
For example, a .tar file is often compressed
using GZip to create a .tar.gz file, sometimes
called a .tgz file. In the case of the ZIP file
archive, the compression algorithms are already part of the
zipfile module.
These modules are very flexible and can be used in a variety of
ways by an application program. The interface to zlib and bz2 are very
similar. The interface to gzip is greatly simplified.
Data Compression and Decompression. The zlib and bz2
modules provide essential data compression and decompression
algorithms. Each module provides a compress and
decompress function which will compress a
string into a sequence of bytes. For relatively
short strings, there may be no reduction in
size.
Note that the interfaces for the zlib and
bz2 modules are designed to be nearly identical.
The results, however, are not compatible at all; these are different
algorithms for data compression.
zlib.compress(string, 〈level〉)
→ string
Compress the given string. The
level provides a speed vs. size
tradeoff: a value of 1 is very fast, but may not compress very
well; a value of 9 will be slower but provide the best
compression. The default value for
level is 6.
zlib.decompress(string →
string)
Decompress the given
string.
bz2.compress →
string
Compress the given string. The
level provides a speed vs. size
tradeoff: a value of 1 is very fast, but may not compress very
well; a value of 9 will be slower but provide the best
compression. The default value for
level is 9.
bz2.decompress →
string
Decompress the given
string.
In addition to simple compression and decompression,
zlib provides some additional features for
computing various kinds of checksums of the data.
zlib.adler32(string)
→ number
Compute the 32-bit Adler checksum of the given
string.
zlib.crc32(string →
number)
Compute the 32-bit CRC (Cyclic Redundancy Check) checksum
the given string.
Simple File Compression and Decompression. The zlib and bz2
modules provide functions for compression and decompression of files
or streams. Each module provides a way to create a compressor object,
as well as process a file while compressing or decompressing
it.
The following methods will create a compressor object or a
decompressor object. A compressor objects has methods which will
compress a sequence of bytes incrementally. A decompressor object,
conversely, has methods to uncomress a sequence of bytes. These objects
can work with chunks of data rather than the complete contents of a
file, making it possible to compress or uncompress large amounts of
data.
zlib.compressobj〈level〉
→ Compress
Creates a ew compressor object, an instance of class
Compress. A compressor object has two
methods that can be used to incrementally compress a sequence of
bytes. The level is the integer
compression level, between 1 and 9. In the following summaries,
c is a compressor.
c.compress(bytes)
→ bytes
Compresses the given sequence of bytes; returns the
next sequence of compressed bytes. The idea is to feed
blocks of data into the compressor, getting blocks of
compressed data out. Some data is retained as part of the
compression, so flush must be
called to finalize the compression and get the last of the
bytes.
c.flush〈mode〉
→ bytes
Finishes compression processing, returning the
remaining bytes. While there are three values for
mode, offering some data recovery
capability, the default is zlib.Z_FINISH,
which completely finishes all compression. Providing no
mode is compatible with the
BZ2Compressor.
zlib.decompressobj →
Decompress
Creates a new decompressor object, an instance of class
Decompress. A decompressor object has two
methods that can be used to decompress a sequence of bytes.
d., (decompressbytes)
→ bytes
Decompresses the given sequence of bytes; returns the
next sequence of uncompressed bytes. The idea is to feed
blocks of compressed data into the decompressor, getting
blocks of uncompressed data out. Some data is retained as
part of the compression, so flush
must be called to finalize the decompression and get the
last of the bytes.
d.(flush)
→ bytes
Finishes decompression processing, returning the
remaining bytes.
bz2.BZ2Compressorlevel
Creates a new compressor object, an instance of class
BZ2Compressor. A compressor object has two
methods that can be used to incrementally compress a sequence of
bytes. The level is the integer
compression level, between 1 and 9. A
bz2.BZ2Compressor
has the same methods as a
zlib.Compress,
described above.
bz2.BZ2Decompressor
Creates a new decompressor object, an instance of class
BZ2Decompressor. A decompressor object has
two methods that can be used to decompress a sequence of bytes. A
bz2.BZ2Decompressor
has the same methods as a
zlib.Decompress,
described above.
We define a function, compdecomp,
which applies a compressor object to a source file and produces
a destination file. This reads the file in small blocks of 2048
bytes, compresses each block, and writes the compressed block to
the destination file.
The final block will generally be less than 2048 bytes.
The final call to
compObj.flush notifies
the compressor object that there will be no more data; the value
returned is the tail end of the compressed data.
We create a compressor, compObj1, from
the zlib.compressobj function. This object
is used by our compDecomp function to
compress a sample file. In this case, the sample file is 1,208K,
the resulting file is 301K, about ¼ the original size.
We create a compressor, compObj2, from
the bz2.BZ2Compressor class. This object
is used by our compDecomp function to
compress a sample file. The resulting file is 228K, about ⅕ the
original size.
Decompression uses a function nearly identical to the
compDecomp function. The decompress version calls
the decompress method of a
decompObj instead of the
compress method of a
compObj.
Gzip File Handling. The gzip module provides function and
class definitions that make it easy to handle simple Gzip files. These
allow you to open a compressed file and read it as if it were already
decompressed. They also allow you to open a file and write to it,
having the data automatically compressed as you write.
Open the file named filename with
the given mode ('r',
'w' or 'a'). The compression
level is an integer that provides a
preference for speed versus size. As an alternative, you can open
a file or socket separately and provide a
fileobj, for example, f=
open('somefile','r'); zf= gzip.open( fileobj=f ).
Once this file is open, it can perform ordinary
read, readline,
readlines, write,
writeline, and writelines
operations. If you open the file with 'rb' mode, the
various read functions will decompress the file's contents as the file
is read. If you open the file with 'wb' or
'ab' modes, the various write functions will compress the
data as they write to the file.
Published under the terms of the Open Publication License