Early detection of spam-related activity
MetadataShow full item record
Spam, the distribution of unsolicited bulk email, is a big security threat on the Internet. Recent studies show approximately 70-90% of the worldwide email traffic—about 70 billion messages a day—is spam. Spam consumes resources on the network and at mail servers, and it is also used to launch other attacks on users, such as distributing malware or phishing. Spammers have increased their virulence and resilience by sending spam from large collections of compromised machines (“botnets”). Spammers also make heavy use of URLs and domains to direct victims to point-of-sale Web sites, and miscreants register large number of domains to evade blacklisting efforts. To mitigate the threat of spam, users and network administrators need proactive techniques to distinguish spammers from legitimate senders and to take down online spam-advertised sites. In this dissertation, we focus on characterizing spam-related activities and developing systems to detect them early. Our work builds on the observation that spammers need to acquire attack agility to be profitable, which presents differences in how spammers and legitimate users interact with Internet services and exposes detectable during early period of attack. We examine several important components across the spam life cycle, including spam dissemination that aims to reach users' inboxes, the hosting process during which spammers set DNS servers and Web servers, and the naming process to acquire domain names via registration services. We first develop a new spam-detection system based on network-level features of spamming bots. These lightweight features allow the system to scale better and to be more robust. Next, we analyze DNS resource records and lookups from top-level domain servers during the initial stage after domain registrations, which provides a global view across the Internet to characterize spam hosting infrastructure. We further examine the domain registration process and present the unique registration behavior of spammers. Finally, we build an early-warning system to identify spammer domains at time-of-registration rather than later at time-of-use. We have demonstrated that our detection systems are effective by using real-world datasets. Our work has also had practical impact. Some of the network-level features that we identified have since been incorporated into spam filtering products at Yahoo! and McAfee, and our work on detecting spammer domains at time-of-registration has directly influenced new projects at Verisign to investigate domain registrations.