Archive

Archive for the ‘PYTHON’ Category

Shorten URLs using goo.gl and Python

October 1st, 2010 4 comments

As we all know it, Google has its own URL shortening service called goo.gl. Google’s URL shortener still doesn’t have an official API and it doesn’t offer all the features that are available at bit.ly, but it works well.

Here is a Python script to shorten URL using goo.gl.

#!/usr/bin/python
# use Google's http://goo.gl/ URL shortener
# requires urllib, urllib2, re, simplejson
def shorten(url):
  try:
    from re import match
    from urllib2 import urlopen, Request, HTTPError
    from urllib import quote
    from simplejson import loads
  except ImportError, e:
    raise Exception('Required module missing: %s' % e.args[0])
  if not match('http://',url):
    raise Exception('URL must start with "http://"')
  try:
    urlopen(Request('http://goo.gl/api/url','url=%s'%quote(url),{'User-Agent':'toolbar'}))
  except HTTPError, e:
    j = loads(e.read())
    if 'short_url' not in j:
      try:
        from pprint import pformat
        j = pformat(j)
      except ImportError:
        j = j.__dict__
      raise Exception('Didn\'t get a correct-looking response. How\'s it look to you?\n\n%s'%j)
    return j['short_url']
  raise Exception('Unknown eror forming short URL.')
 
if __name__ == '__main__':
  from sys import argv
  print shorten(argv[1])

Usage:

$ python g.py http://segfault.in
http://goo.gl/Uh5h

Update:
Expand URLs

def expand(url):
  try:
    import urllib
  except ImportError, e:
    raise Exception('Required module missing: %s' % e.args[0])
 
  f = urllib.urlopen(url)
  return f.geturl()
Categories: PYTHON Tags: , ,

Data Compression and Archiving Using Python

August 30th, 2010 No comments

python
bzip2 compression

bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

Compressing a file:

import bz2
import fileinput
 
output = bz2.BZ2File('a.txt.bz2', 'wb')
 
for line in fileinput.input('a.txt'):
        output.write(line)
 
output.close()

This will compess a.txt to a.txt.bz2.

Decompress file.

import bz2
 
input_file = bz2.BZ2File('a.txt.bz2', 'rb')
try:
    print input_file.read()
finally:
    input_file.close()

gzip compression

gzip (GNU zip) is a compression utility designed to be a replacement for compress. Its main advantages over compress are much better compression and freedom from patented algorithms.

Compress file using gzip

import gzip
import fileinput
 
output = gzip.open('a.txt.gz', 'wb')
 
for line in fileinput.input('a.txt'):
        output.write(line)
 
output.close()

Decompress the file.

import gzip
 
input_file = gzip.open('a.txt.gz', 'rb')
try:
    print input_file.read()
finally:
    input_file.close()

Tar archive access
List the contents of a tar file.

import tarfile
tar = tarfile.open("sample.tar", "r")
for tarinfo in tar:
    print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
    if tarinfo.isreg():
        print "a regular file."
    elif tarinfo.isdir():
        print "a directory."
    else:
        print "something else."
tar.close()

Untar an archive file.

import tarfile
tar = tarfile.open("sample.tar")
tar.extractall()
tar.close()

Create an archive file

import tarfile
import os, fnmatch
 
tar = tarfile.open("sample.tar", "w")
files = os.listdir('.')
for file in files:
    if os.path.isdir(file):
        print file,' is a dir.'
    if fnmatch.fnmatch ( file, '*.txt' ):
        print file
        tar.add(file)
tar.close()

Using with gzip and bz2
You can use tar with gzip or bz2
use

tar = tarfile.open("sample.tar.gz", "r:gz")

Or

tar = tarfile.open("sample.tar.bz2", "r:bz2")

to work with gzip or bz2 file.

Categories: PYTHON Tags: , , , ,

Playing With Python And Gmail – Part 2

August 19th, 2010 4 comments

python
This is the second part of the article series ‘Playing With Python And Gmail’. If you didn’t read the first part I would recomend you to read it.

This time we will see how to fetch mails from Gmail using Python.

Reading Mails

The IMAP4.fetch method fetch (parts of) messages. message_parts should be a string of message part names enclosed within parentheses, eg: “(UID BODY[TEXT])”. Returned data are tuples of message part envelope and data.

Here is a minimal example (without error checking) that opens a mailbox and retrieves and prints all messages:

import imaplib
 
M = imaplib.IMAP4('imap.gmail.com', 993)
M.login('myname@gmail.com', 'pa$$word')
M.select()
typ, data = M.search(None, 'ALL')
for num in data[0].split():
    typ, data = M.fetch(num, '(RFC822)')
    print 'Message %s\n%s\n' % (num, data[0][1])
M.close()
M.logout()

The email package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a string or a file object, and the parser will return to you the root Message instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME messages, the root object will return True from its is_multipart() method, and the subparts can be accessed via the get_payload() and walk() methods.

Extract Mail Headers
Here is method to retrieve from, to and subject from from an email message:

from email.parser import HeaderParser
 
resp, data = M.FETCH(1, '(RFC822)')
msg = HeaderParser().parsestr(data[0][1])
 
print msg['From']
print msg['To']
print msg['Subject']
 
M.LOGOUT()

Output will be something like.

Gmail Team
My Name
Gmail is different. Here's what you need to know.

Identifying the content type
The Content-Type header indicates the Internet media type of the message content, consisting of a type and subtype, for example text/plain is the default value for “Content-Type:”
Gmail uses alternative content, such as a message sent in both plain text and another format such as HTML (multipart/alternative with the same content in text/plain and text/html forms).

import email
 
resp, data = M.FETCH(1, '(RFC822)')
mail = email.message_from_string(data[0][1])
 
for part in mail.walk():
  print 'Content-Type:',part.get_content_type()
  print 'Main Content:',part.get_content_maintype()
  print 'Sub Content:',part.get_content_subtype()

Out put will be

Content-Type: multipart/alternative
Main Content: multipart
Sub Content: alternative
Content-Type: text/plain
Main Content: text
Sub Content: plain
Content-Type: text/html
Main Content: text
Sub Content: html

Extract Message Body.
Using the walk() method we can iterate through Message parts. The get_payload() method will return the current payload, which will be a list of Message objects when is_multipart() is True, or a string when is_multipart() is False.

import email
 
resp, data = M.FETCH(1, '(RFC822)')
mail = email.message_from_string(data[0][1])
 
for part in mail.walk():
  # multipart are just containers, so we skip them
  if part.get_content_maintype() == 'multipart':
      continue
 
  # we are interested only in the simple text messages
  if part.get_content_subtype() != 'plain':
    continue
 
  payload = part.get_payload()
  print payload
 
M.LOGOUT()

Extracting Attachmets
The below code will extract and save attached images to disk.

import re
 
name_pat = re.compile('name=\".*\"')
 
for part in mail.walk():
  if part.get_content_maintype() != 'image':
    continue
 
  file_type = part.get_content_type().split('/')[1]
  if not file_type:
    file_type = 'jpg'
 
  filename = part.get_filename()
  if not filename:
    filename = name_pat.findall(part.get('Content-Type'))[0][6:-1]
 
  counter = 1
  if not filename:
    filename = 'img-%03d%s' % (counter, file_type)
    counter += 1
 
  payload = part.get_payload(decode=True)
 
  if not os.path.isfile(filename) :
      # finally write the stuff
      fp = open(filename, 'wb')
      fp.write(part.get_payload(decode=True))
      fp.close()

That’s it. In the next part I will explain searching and moving your mails using Python. Dont forget to subscribe :-)

Categories: PYTHON Tags: , , ,

Switch to our mobile site