Playing With Python And Gmail – Part 2

This is the second part of the article series ‘Playing With Python And Gmail’. If you didn’t read the first part I would recomend you to read it.
This time we will see how to fetch mails from Gmail using Python.
Reading Mails
The IMAP4.fetch method fetch (parts of) messages. message_parts should be a string of message part names enclosed within parentheses, eg: “(UID BODY[TEXT])”. Returned data are tuples of message part envelope and data.
Here is a minimal example (without error checking) that opens a mailbox and retrieves and prints all messages:
import imaplib M = imaplib.IMAP4('imap.gmail.com', 993) M.login('myname@gmail.com', 'pa$$word') M.select() typ, data = M.search(None, 'ALL') for num in data[0].split(): typ, data = M.fetch(num, '(RFC822)') print 'Message %s\n%s\n' % (num, data[0][1]) M.close() M.logout() |
The email package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a string or a file object, and the parser will return to you the root Message instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME messages, the root object will return True from its is_multipart() method, and the subparts can be accessed via the get_payload() and walk() methods.
Extract Mail Headers
Here is method to retrieve from, to and subject from from an email message:
from email.parser import HeaderParser resp, data = M.FETCH(1, '(RFC822)') msg = HeaderParser().parsestr(data[0][1]) print msg['From'] print msg['To'] print msg['Subject'] M.LOGOUT() |
Output will be something like.
Gmail Team My Name Gmail is different. Here's what you need to know.
Identifying the content type
The Content-Type header indicates the Internet media type of the message content, consisting of a type and subtype, for example text/plain is the default value for “Content-Type:”
Gmail uses alternative content, such as a message sent in both plain text and another format such as HTML (multipart/alternative with the same content in text/plain and text/html forms).
import email resp, data = M.FETCH(1, '(RFC822)') mail = email.message_from_string(data[0][1]) for part in mail.walk(): print 'Content-Type:',part.get_content_type() print 'Main Content:',part.get_content_maintype() print 'Sub Content:',part.get_content_subtype() |
Out put will be
Content-Type: multipart/alternative Main Content: multipart Sub Content: alternative Content-Type: text/plain Main Content: text Sub Content: plain Content-Type: text/html Main Content: text Sub Content: html
Extract Message Body.
Using the walk() method we can iterate through Message parts. The get_payload() method will return the current payload, which will be a list of Message objects when is_multipart() is True, or a string when is_multipart() is False.
import email resp, data = M.FETCH(1, '(RFC822)') mail = email.message_from_string(data[0][1]) for part in mail.walk(): # multipart are just containers, so we skip them if part.get_content_maintype() == 'multipart': continue # we are interested only in the simple text messages if part.get_content_subtype() != 'plain': continue payload = part.get_payload() print payload M.LOGOUT() |
Extracting Attachmets
The below code will extract and save attached images to disk.
import re name_pat = re.compile('name=\".*\"') for part in mail.walk(): if part.get_content_maintype() != 'image': continue file_type = part.get_content_type().split('/')[1] if not file_type: file_type = 'jpg' filename = part.get_filename() if not filename: filename = name_pat.findall(part.get('Content-Type'))[0][6:-1] counter = 1 if not filename: filename = 'img-%03d%s' % (counter, file_type) counter += 1 payload = part.get_payload(decode=True) if not os.path.isfile(filename) : # finally write the stuff fp = open(filename, 'wb') fp.write(part.get_payload(decode=True)) fp.close() |
That’s it. In the next part I will explain searching and moving your mails using Python. Dont forget to subscribe ![]()
That’s a nice blog you have here
Keep up the good job and thank you for the code !
This is what I was looking for. Thanks!
I want to doa copy of a inbox somewhere else, but I don’t want to affect
the unread state of any unread messages. How to do that?
@CS
Sorry for the delay.
You can use the copy() method.
IMAP_SERVER=’imap.gmail.com’
IMAP_PORT=993
G = imaplib.IMAP4_SSL(IMAP_SERVER, IMAP_PORT)
rc, response = G.login(username, password)
# Find the “All” messages in INBOX
G.select(‘INBOX’)
typ, [response] = G.search(None, ‘ALL’)
if typ != ‘OK’:
raise RuntimeError(response)
# Create a new mailbox, “NewFolder”
msg_ids = ‘,’.join(response.split(‘ ‘))
typ, create_response = G.create(‘NewFolder’)
G.copy(msg_ids, ‘NewFolder’)
# Look at the results
G.select(‘NewFolder’)
typ, [response] = G.search(None, ‘ALL’)
print ‘COPIED:’, response