Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

The Data Compression Modules: zlib, gzip, bz2

The zlib, gzip and bz2 modules provide essential data and file compression tools. Data files are often built for speedy processing, and may contain characters which are meaningless spacing. This extraneous data can be reduced in size, or compressed.

For example, a .tar file is often compressed using GZip to create a .tar.gz file, sometimes called a .tgz file. In the case of the ZIP file archive, the compression algorithms are already part of the zipfile module.

These modules are very flexible and can be used in a variety of ways by an application program. The interface to zlib and bz2 are very similar. The interface to gzip is greatly simplified.

Data Compression and Decompression. The zlib and bz2 modules provide essential data compression and decompression algorithms. Each module provides a compress and decompress function which will compress a string into a sequence of bytes. For relatively short strings, there may be no reduction in size.

Note that the interfaces for the zlib and bz2 modules are designed to be nearly identical. The results, however, are not compatible at all; these are different algorithms for data compression.

zlib.compress( string , 〈 level 〉) → string

Compress the given string . The level provides a speed vs. size tradeoff: a value of 1 is very fast, but may not compress very well; a value of 9 will be slower but provide the best compression. The default value for level is 6.

zlib.decompress( string → string )

Decompress the given string .

bz2.compress → string

Compress the given string . The level provides a speed vs. size tradeoff: a value of 1 is very fast, but may not compress very well; a value of 9 will be slower but provide the best compression. The default value for level is 9.

bz2.decompress → string

Decompress the given string .

In addition to simple compression and decompression, zlib provides some additional features for computing various kinds of checksums of the data.

zlib.adler32( string ) → number

Compute the 32-bit Adler checksum of the given string .

zlib.crc32( string → number )

Compute the 32-bit CRC (Cyclic Redundancy Check) checksum the given string .

Simple File Compression and Decompression. The zlib and bz2 modules provide functions for compression and decompression of files or streams. Each module provides a way to create a compressor object, as well as process a file while compressing or decompressing it.

The following methods will create a compressor object or a decompressor object. A compressor objects has methods which will compress a sequence of bytes incrementally. A decompressor object, conversely, has methods to uncomress a sequence of bytes. These objects can work with chunks of data rather than the complete contents of a file, making it possible to compress or uncompress large amounts of data.

zlib.compressobj〈 level Compress

Creates a ew compressor object, an instance of class Compress. A compressor object has two methods that can be used to incrementally compress a sequence of bytes. The level is the integer compression level, between 1 and 9. In the following summaries, c is a compressor.

c. compress ( bytes ) → bytes

Compresses the given sequence of bytes; returns the next sequence of compressed bytes. The idea is to feed blocks of data into the compressor, getting blocks of compressed data out. Some data is retained as part of the compression, so flush must be called to finalize the compression and get the last of the bytes.

c. flush mode → bytes

Finishes compression processing, returning the remaining bytes. While there are three values for mode , offering some data recovery capability, the default is zlib.Z_FINISH, which completely finishes all compression. Providing no mode is compatible with the BZ2Compressor.

zlib.decompressobjDecompress

Creates a new decompressor object, an instance of class Decompress. A decompressor object has two methods that can be used to decompress a sequence of bytes.

d. , (decompress bytes ) → bytes

Decompresses the given sequence of bytes; returns the next sequence of uncompressed bytes. The idea is to feed blocks of compressed data into the decompressor, getting blocks of uncompressed data out. Some data is retained as part of the compression, so flush must be called to finalize the decompression and get the last of the bytes.

d. (flush) → bytes

Finishes decompression processing, returning the remaining bytes.

bz2. BZ2Compressor level

Creates a new compressor object, an instance of class BZ2Compressor. A compressor object has two methods that can be used to incrementally compress a sequence of bytes. The level is the integer compression level, between 1 and 9. A bz2. BZ2Compressor has the same methods as a zlib. Compress, described above.

bz2. BZ2Decompressor

Creates a new decompressor object, an instance of class BZ2Decompressor. A decompressor object has two methods that can be used to decompress a sequence of bytes. A bz2. BZ2Decompressor has the same methods as a zlib. Decompress, described above.

Example 33.4. compdecomp.py

#/usr/bin/env python
"""Compress a file using a compressor object."""
import zlib, bz2, os

def compDecomp( compObj, srcName, dstName ):
    source= file( srcName, "r" )
    dest= file( dstName, "w" )
    block= source.read( 2048 )
    while block:
        cBlock= compObj.compress( block )
        dest.write(cBlock)
        block= source.read( 2048 )
    cBlock= compObj.flush()
    dest.write( cBlock )
    source.close()
    dest.close()

compObj1= zlib.compressobj()
compDecomp( compObj1, "../python.xml", "python.xml.gz" )
print "source", os.stat("../python.xml").st_size/1024, "k"
print "dest", os.stat("python.xml.gz").st_size/1024, "k"

compObj2= bz2.BZ2Compressor()
compDecomp( compObj2, "../python.xml", "python.xml.bz" )
print "source", os.stat("../python.xml").st_size/1024, "k"
print "dest", os.stat("python.xml.bz").st_size/1024, "k"
1

We define a function, compdecomp, which applies a compressor object to a source file and produces a destination file. This reads the file in small blocks of 2048 bytes, compresses each block, and writes the compressed block to the destination file.

The final block will generally be less than 2048 bytes. The final call to compObj.flush notifies the compressor object that there will be no more data; the value returned is the tail end of the compressed data.

2

We create a compressor, compObj1, from the zlib.compressobj function. This object is used by our compDecomp function to compress a sample file. In this case, the sample file is 1,208K, the resulting file is 301K, about ¼ the original size.

3

We create a compressor, compObj2, from the bz2.BZ2Compressor class. This object is used by our compDecomp function to compress a sample file. The resulting file is 228K, about ⅕ the original size.

Decompression uses a function nearly identical to the compDecomp function. The decompress version calls the decompress method of a decompObj instead of the compress method of a compObj.

Gzip File Handling. The gzip module provides function and class definitions that make it easy to handle simple Gzip files. These allow you to open a compressed file and read it as if it were already decompressed. They also allow you to open a file and write to it, having the data automatically compressed as you write.

gzip.open( filename , mode , level , fileobj ) → gzip.GzipFile

Open the file named filename with the given mode ('r', 'w' or 'a'). The compression level is an integer that provides a preference for speed versus size. As an alternative, you can open a file or socket separately and provide a fileobj , for example, f= open('somefile','r'); zf= gzip.open( fileobj=f ).

Once this file is open, it can perform ordinary read, readline, readlines, write, writeline, and writelines operations. If you open the file with 'rb' mode, the various read functions will decompress the file's contents as the file is read. If you open the file with 'wb' or 'ab' modes, the various write functions will compress the data as they write to the file.


 
 
  Published under the terms of the Open Publication License Design by Interspire