SNARE: Spatio-temporal Network-level Automatic Reputation Engine
Gray, Alexander G.
Syed, Nadeem Ahmed
MetadataShow full item record
Current spam filtering techniques classify email based on content and IP reputation blacklists or whitelists. Unfortunately, spammers can alter spam content to evade content based filters, and spammers continually change the IP addresses from which they send spam. Previous work has suggested that filters based on network-level behavior might be more efficient and robust, by making decisions based on how messages are sent, as opposed to what is being sent or who is sending them. This paper presents a technique to identify spammers based on features that exploit the network-level spatio temporal behavior of email senders to differentiate the spamming IPs from legitimate senders. Our behavioral classifier has two benefits: (1) it is early (i.e., it can automatically detect spam without seeing a large amount of email from a sending IP address-sometimes even upon seeing only a single packet); (2) it is evasion-resistant (i.e., it is based on spatial and temporal features that are difficult for a sender to change). We build classifiers based on these features using two different machine learning methods, support vector machine and decision trees, and we study the efficacy of these classifiers using labeled data from a deployed commercial spam-filtering system. Surprisingly, using only features from a single IP packet header (i.e., without looking at packet contents), our classifier can identify spammers with about 93% accuracy and a reasonably low false-positive rate (about 7%). After looking at a single message spammer identification accuracy improves to more than 94% with a false rate of just over 5%. These suggest an effective sender reputation mechanism.