A simple python script to deduplicate a mailbox (mbox format).
#!/usr/bin/env python
# Created by Evaggelos Balaskas on Thu Jul 29 21:22:41 EEST 2010
# Remove duplicate mails from mbox using message-id
import sys
import mailbox
if len(sys.argv) == 2:
mid = []
for message in mailbox.mbox( sys.argv[1] ) :
s = message['message-id']
if s not in mid:
mid.append(s)
print message
else:
print "Usage should be: " + sys.argv[0] + " mbox > new.mbox"
You can take a look, also, on my other python script: How to remove specific mails from a mbox by subject
This is a better and improved version of one of my previous perl script:
#!/usr/bin/env python
# Created by Evaggelos Balaskas on Sun Jul 25 06:36:29 EEST 2010
# Remove mails from mbox by subject
import sys
import mailbox
import re
SUBJECTS = (
'automatically rejected mail',
'delivery failure',
'delivery notification',
'delivery status notification',
'failure notice',
'mail delivery failed',
'mail delivery failure',
'nondeliverable',
'returned mail',
'undeliverable',
'undelivered',
'warning: could not send message for past'
)
if len(sys.argv) == 2:
for message in mailbox.mbox( sys.argv[1] ) :
s = message['subject']
flag = 0
for i in SUBJECTS:
m = re.search ( i, str(s), re.I )
if m != None :
flag = 1
break
print message
else:
print "Usage should be: " + sys.argv[0] + " mbox > new.mbox"
A couple of days back, i wrote a perl script to remove backscatter mails from a mailbox file using perl.
You can take a look on the code here:
Remove backscatter mails from mailbox.
Today i wanted to remove the duplicate mails from a mailbox. I’ve used, till now, mergembox, but i wanted to write something on my own.
So, without further ado:
1 #!/usr/bin/perl -w 2 3 use strict; 4 use Mail::MboxParser; 5 6 die $0 =~ /([^/]+)$/, "> n" unless @ARGV == 1; 7 8 my $mb = Mail::MboxParser->new($ARGV[0]); 9 my $field = "message-id"; 10 my @MessageIds = (); 11 12 while ( my $msg = $mb->next_message ) { 13 14 my @msgid = split(/@/, $msg->header->{$field}); 15 16 if ( grep(/$msgid[0]/, @MessageIds ) ) { 17 warn "Duplicate Message-ID: " . $msgid[0] . ", already exists ! n" ; 18 } else { 19 push( @MessageIds, $msgid[0] ); 20 print $msg."n" ; 21 } 22 }
You can see the code with syntax highlight here:
Remove duplicate mails from mailbox
Φίλος μου ζήτησε βοήθεια ώστε να καθαρίσει διάφορα backscatter mails από το mailbox του. Μία από τις πλέον γνωστές επιθέσεις είναι ο spammer να χρησιμοποιεί διαφορετική mail address ή ακόμα και reply-address με αποτέλεσμα τα back scatters να μην έρχονται σε αυτόν αλλά να πηγαίνουν σε κάποιον άλλο.
Με βοήθησε αρκετά το συγκεκριμένο άρθρο:mbox_selective_deletion και πάνω σε αυτό βασίστηκα για να γράψω την δική μου παραλλαγή:
#!/usr/bin/perl -w
# Created by Ben Okopnik on Thu Jan 14 21:55:46 EST 2010
# Updated by Evaggelos Balaskas on Sun Jun 27 20:50:11 EEST 2010
use strict;
use Mail::MboxParser;
die $0 =~ /([^/]+)$/, " <mbox> n" unless @ARGV == 1;
my $mb = Mail::MboxParser->new($ARGV[0]);
my @subjects = (
"Undeliverable",
"Warning: could not send message for past 12 hours",
"Returned mail: see transcript for details",
"Delivery Status Notification (Failure)",
"Undelivered Mail Returned to Sender"
);
while ( my $msg = $mb->next_message ) {
my $s = $msg->header->{subject};
$s ||= "empty_subject";
my $flag = 0;
foreach (@subjects) {
if ( $s =~ $_ ) {
$flag = 1;
last;
}
}
print $msg."n" unless $flag ;
}
Η χρήση του είναι η εξής:
./remove.pl mailbox > newmailbox
και φυσικά μπορείτε να προσθέσετε στην λίστα strong>@subjects</strong όσα περισσότερα subjects θέλετε.
Ο πλήρες κώδικας βρίσκεται εδώ: How to remove a specific mail from a mbox
ΥΓ: Θα χαρώ να λάβω παρατηρήσεις.