Home

SameMail - Anti-Spam tool

Japanese

SameMail is a tool to identify same mail as other person received, which is very likely spam. It works with server and client program.

We aim to propose a protocol how to handle the contents of mail in a secure way, and detect spam easily. We show Open Source (GPL) implementation of server-client program, but this is not the only one.


definition of spam

There exist various spam definitions. Here we define spam as the almost same electric mail sent to unspecified people.

This definition enable to identify spam only by comparing mails received by others, without text analysis or blacklists. Exception is a mail from mailing list.


mechanism of SameMail

SameMail is similar to collaborative spam filter, but is not affected by human. All actions are done automatically.

When SameMail client receives an e-mail, it asks SameMail server if the e-mail was received by other clients or not. The client should not send e-mail itself to the server not to invade privacy. It sends hashed e-mail (by the way like MD5).

SameMail server replies comparing query and stored ones. Then SameMail server stores requested hashed e-mail for future query.

SameMail client may be mailer, POP proxy, MTA, etc. When a client find same e-mail, the next action depends on client. It may mark, erase, or return to sender. Our implemented client is POP3 proxy in Java, and it changes 'From' header.


How to make hash (outline)

Spam is NOT exact same as other person received. It may contain user name or random string. To perform text diff in server side, we make a hash per line. We can't trust mail headers. From, Subject, and To is often deceived.

First remove mail header. Then remove empty lines. Make MD5 hash per line. Take 3rd and 6th byte and join them.

MD5original body
e7a8454eeba72d27616de4ad59979d4b

6b2d6ba175122e506d24b3de372f1517
0f0071ff3f942dfc080907ace298e199
e05ba1e18506890e2c17d7478bf51011

92e1c64e607ef5d692bd7c189fb82b32
bdb775a40155a0f31fbeb3c8dae09869
9f3bee92b3c1d3a5892214f3956508b1
7ba08a27518b1a06971f7387ebc45b53
cef0e4b5ee772745f20ac3a26506ff1e

6b15ef79954297112d78683e157f7be0

771b93b3381347ba7583ce505711e8b4



03c18f72cf5215be855ba0cb53564794
ad49cbf8fcb4b7a810727d1023934d6f
97284a20de7c7c8b8298260802ccad99

Important: Must Read for ALL.

Interest Rates have dropped basis points once again to their lowest in years. We are now 
offering the lowest debt consolidation interest rates in history. Even 
if you just consolidated, we can save you more MONEY, faster! We can:

* Consolidate All Loans Effectively & Efficiently
* Give Loan Advice on the Best courses of Action
* Allow for one New Low monthly payment (saving you even more!)
* 99.9% of all Loans qualify & we do NO CREDIT CHECKS! All are approved 
in our program!

TODAY'S LOW RATE IS 1.9%

http://hogehoge/d1b2t3/?RefID=422904



To be removed from all our future corporate mailings please click 
below.
http://hogehgoe/auto/index.htm

hash = 45a76b127194a106c67e7555eec18a8be477ef4293138f52cbb44a7c

We show basic idea, but in fact, real hash is more complex. Hash should have ckeck digit. Hash shoud be enough long not to conflict. Hash shoud be enough short to handle easily.


download

Our client program is written in Java as POP3 proxy. Our server program is written in C and perl as HTTP CGI, and it's service is available on this site. Anyone can access the following address.
http://www.misojiro.t.u-tokyo.ac.jp/~tutimura/SameMail/decide.cgi

Client and server program are included in the following archive. For ordinal purpose, you have only to compile and run client program (You need Java 2 JRE).

C language client is available. It acts as filter now.

screen shot

Headers colored by orange are rewrited by SameMail client. In this case, I received almost same mail 8 times.


link

collaborative spam filter black list

Home
Nobuyuki Tsuchimura(tutimura(a)mist.i.u-tokyo.ac.jp) Replace '(a)' with '@'
modified on 1/ 5 13:18, 2004