rss.png profile for ebal on Stack Exchange, a network of free, community-driven Q&A sites
Jul
09
2010
Remove duplicate mails from mailbox

A couple of days back, i wrote a perl script to remove backscatter mails from a mailbox file using perl.
You can take a look on the code here:
Remove backscatter mails from mailbox.

Today i wanted to remove the duplicate mails from a mailbox. I’ve used, till now, mergembox, but i wanted to write something on my own.

So, without further ado:

  1 #!/usr/bin/perl -w
  2 
  3 use strict;
  4 use Mail::MboxParser;
  5 
  6 die $0 =~ /([^/]+)$/, "  >  n" unless @ARGV == 1;
  7 
  8 my $mb = Mail::MboxParser->new($ARGV[0]);
  9 my $field = "message-id";
 10 my @MessageIds = ();
 11 
 12 while ( my $msg = $mb->next_message ) {
 13 
 14         my @msgid = split(/@/, $msg->header->{$field});
 15          
 16         if ( grep(/$msgid[0]/, @MessageIds ) ) {
 17                 warn "Duplicate Message-ID: " . $msgid[0] . ", already exists ! n" ;
 18         } else { 
 19                 push( @MessageIds, $msgid[0] );
 20                 print $msg."n" ;
 21         }
 22 }

You can see the code with syntax highlight here:
Remove duplicate mails from mailbox