Home > PYTHON > Data Compression and Archiving Using Python

Data Compression and Archiving Using Python

August 30th, 2010

python
bzip2 compression

bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

Compressing a file:

import bz2
import fileinput
 
output = bz2.BZ2File('a.txt.bz2', 'wb')
 
for line in fileinput.input('a.txt'):
        output.write(line)
 
output.close()

This will compess a.txt to a.txt.bz2.

Decompress file.

import bz2
 
input_file = bz2.BZ2File('a.txt.bz2', 'rb')
try:
    print input_file.read()
finally:
    input_file.close()

gzip compression

gzip (GNU zip) is a compression utility designed to be a replacement for compress. Its main advantages over compress are much better compression and freedom from patented algorithms.

Compress file using gzip

import gzip
import fileinput
 
output = gzip.open('a.txt.gz', 'wb')
 
for line in fileinput.input('a.txt'):
        output.write(line)
 
output.close()

Decompress the file.

import gzip
 
input_file = gzip.open('a.txt.gz', 'rb')
try:
    print input_file.read()
finally:
    input_file.close()

Tar archive access
List the contents of a tar file.

import tarfile
tar = tarfile.open("sample.tar", "r")
for tarinfo in tar:
    print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
    if tarinfo.isreg():
        print "a regular file."
    elif tarinfo.isdir():
        print "a directory."
    else:
        print "something else."
tar.close()

Untar an archive file.

import tarfile
tar = tarfile.open("sample.tar")
tar.extractall()
tar.close()

Create an archive file

import tarfile
import os, fnmatch
 
tar = tarfile.open("sample.tar", "w")
files = os.listdir('.')
for file in files:
    if os.path.isdir(file):
        print file,' is a dir.'
    if fnmatch.fnmatch ( file, '*.txt' ):
        print file
        tar.add(file)
tar.close()

Using with gzip and bz2
You can use tar with gzip or bz2
use

tar = tarfile.open("sample.tar.gz", "r:gz")

Or

tar = tarfile.open("sample.tar.bz2", "r:bz2")

to work with gzip or bz2 file.

Categories: PYTHON Tags: , , , ,
Comments are closed.