rss.png profile for ebal on Stack Exchange, a network of free, community-driven Q&A sites
Jul
25
2010
How to remove specific mails from mbox with python

This is a better and improved version of one of my previous perl script:


#!/usr/bin/env python
# Created by Evaggelos Balaskas on Sun Jul 25 06:36:29 EEST 2010
# Remove mails from mbox by subject
 
import sys
import mailbox
import re
 
SUBJECTS = (
        'automatically rejected mail',
        'delivery failure',
        'delivery notification',
        'delivery status notification',
        'failure notice',
        'mail delivery failed',
        'mail delivery failure',
        'nondeliverable',
        'returned mail',
        'undeliverable',
        'undelivered',
        'warning: could not send message for past'
)
 
if len(sys.argv) == 2:
    for message in mailbox.mbox( sys.argv[1] ) :
        s = message['subject']
        flag = 0
        for i in SUBJECTS:
            m = re.search ( i, str(s), re.I )
            if m != None :
                flag = 1
                break
        print message
else:
        print "Usage should be: " + sys.argv[0] + " mbox > new.mbox"

  1. Avatar di Giorgos Keramidas Giorgos Keramidas

    Sunday, July 25, 2010 - 10:36:43

    Nice script. Thanks for sharing…

    Reading it I could help but notice that there is a lot of looping that could probably be written in a cleaner, more ‘Pythonic’ manner. There is a nice pattern in Common Lisp that you can copy to reduce the number of ‘boilerplate’ lines in Python code like this. With two functions that accept a function-argument and check a list of items one by one for a match like this:

    <pre>def every(check, args):
    for a in args:
    if not check(a):
    return False
    return True

    def some(check, args):
    for a in args:
    if check(a):
    return True
    return False</pre>

    You can write code like this:

    <pre>def null(arg):
    return arg is None

    # Check if *every* item of a list matches a predicate/check function.
    if every(null, [None, None, None]):
    print “Only null items found.”

    # Check if at least *one* item of a list matches a predicate/check function.
    if some(null, [1, 2, 3, None, 5]):
    print “At least one null item found.”</pre>

    Having a ‘higher order’ function like every() and some() means that you can construct matching functions for each regexp pattern on the fly, e.g.:

    <pre> patterns = (
    r’automatically rejected mail’,
    r’delivery failure’,
    r’delivery notification’,
    r’delivery status notification’,
    r’failure notice’,
    r’mail delivery failed’,
    r’mail delivery failure’,
    r’nondeliverable’,
    r’returned mail’,
    r’undeliverable’,
    r’undelivered’,
    r’warning: could not send message for past’
    )

    for mboxfile in sys.argv[1:]:
    for msg in mailbox.mbox(mboxfile):
    subject = msg[’subject’]
    if some(lambda (pat): re.search(pat, subject, re.I), patterns):
    continue
    print msg</pre>

    The resulting code generates anonymous many lambda functions and may not be the fastest option. It uses a more ‘functional’ style though. Looking at this version of the code it seems then more ‘natural’ to work with list comprehensions and check the regexp patterns with something like this:

    <pre>[re.search(pat, subject, re.I) for pat in patterns]</pre>

    The third iteration of the script code is then:

    <pre> def null(arg):
    return arg is None

    def every(check, args):
    for a in args:
    if not check(a):
    return False
    return True

    patterns = ( r’foo’, r’bar’ … )

    for mboxfile in sys.argv[1:]:
    for msg in mailbox.mbox(mboxfile):
    s = msg[’subject’]
    if every(null, [re.search(pat, s, re.I) for pat in patterns]):
    print msg</pre>

    I think this version of the script looks moderately cleaner. More supporting functions, but a much smaller ‘main loop’. Smaller main loops are usually good, because they are easier to read, with less boilerplate code to ‘hide’ the actual logic of the program.

  2. Avatar di Giorgos Keramidas Giorgos Keramidas

    Sunday, July 25, 2010 - 10:41:24

    Argh! The comment was mangled by the web UI. I’ve posted a plain text version of the same code at http://paste.lisp.org/display/112806