Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and Nick Feamster Defense Team:
Agenda Introduction Background and Related Work Data Collection Network-level Characteristics of Spammers Spam from Botnets Spam from Transient BGP Announcements Lessons from Better Spam Mitigation Conclusion
Introduction Spam Multiple s sent to many recipients Multiple s sent to many recipients Unsolicited commercial messages Unsolicited commercial messages Study based on network level behavior of spammers IP address ranges IP address ranges Spamming modes (route hijacking, bots, etc.) Spamming modes (route hijacking, bots, etc.) Temporal persistence of spamming hosts Temporal persistence of spamming hosts Characteristics of spamming botnets Characteristics of spamming botnets Much attention has been paid to studying the content of spam
Introduction Cont. Study posits that Network Level properties need to be investigated in order to determine creative ways to mitigate spam Paper analyzes network properties of spam that is observed at a large spam “sinkhole” BGP route advertisements BGP route advertisements Traces of command and control messages of a Bobax botnet Traces of command and control messages of a Bobax botnet Legitimate s Legitimate s Surprising Conclusions Most spam comes from a small IP address space (but so does legitimate ) Most spam comes from a small IP address space (but so does legitimate ) Most spam comes from Microsoft Windows hosts – bots Most spam comes from Microsoft Windows hosts – bots Small set of spammers use short-lived route announcements to remain untraceable Small set of spammers use short-lived route announcements to remain untraceable
Background Methods and Mitigation Spamming Methods Spamming Methods Direct Spamming – via spam friendly ISPs or dial-up IPs Open Relays and Proxies – mail serves that allow unauthenticated to relay Botnets – hijacked machines acting under the control of centralized ‘botmaster’ BGP Spectrum Agility – short-lived route announcements to the IP addresses from which they send spam; hampers traceability Mitigation Techniques Filtering: Content based and IP Blacklists Filtering: Content based and IP Blacklists
Related Work Related Work – Previous Studies Packet traces to determine bandwidth bottlenecks from spam sources Packet traces to determine bandwidth bottlenecks from spam sources Project Honeypot Project Honeypot Sink for traffic and hands out trap addresses to determine harvesting behavior and identity of spammers Time monitoring from harvesting to receipt of first spam message Countries where harvesting infrastructure is located Persistence of spam harvesters
Related Work Cont. Mitigation SpamAssassin Project – reverse engineering via mail content analysis SpamAssassin Project – reverse engineering via mail content analysis DNS blacklist – 80% of IPs sending spam were in the blacklist DNS blacklist – 80% of IPs sending spam were in the blacklist Unusual Route Announcements Bogus Well-Known addresses Bogus Well-Known addresses Suggestions of short lived route announcements Suggestions of short lived route announcements
Data Collection Reserve a “sinkhole” Reserve a “sinkhole” Registered domain with no legitimate addresses Registered domain with no legitimate addresses Establish a DNS Mail Exchange record for it. Establish a DNS Mail Exchange record for it. All s received by the server are spam All s received by the server are spam Run metrics on incoming s Run metrics on incoming s IP address of the relay; also run a traceroute IP address of the relay; also run a traceroute TPC fingerprint to get the source OS TPC fingerprint to get the source OS Results of DNS blacklist from 8 different blacklist servers Results of DNS blacklist from 8 different blacklist servers
Data Collection Cont. Spam received per day at sinkhole (Aug – Dec. 2005)
Data Collection Cont. “Hijack” the DNS server for the domain running a botnet Have botnet commands go to a known machine instead. Have botnet commands go to a known machine instead. M onitor the BGP update from the networks where the spams are received M onitor the BGP update from the networks where the spams are received Collect logs from large provider (40 million mailboxes) Collect logs from large provider (40 million mailboxes) Allows analysis of network characteristics for spam and non-spam Allows analysis of network characteristics for spam and non-spam
Data Analysis Study focuses on network level characteristics Study focuses on network level characteristics Distribution of spam across IP address space is similar to legitimate s (although not exact) Distribution of spam across IP address space is similar to legitimate s (although not exact) Spam over IP address range is not uniform Spam over IP address range is not uniform 12% of all received spam comes from two Autonomous Systems (AS) 12% of all received spam comes from two Autonomous Systems (AS) 37% come from top 20 ASes. 37% come from top 20 ASes. Offers insight into spam prevention Offers insight into spam prevention Classifying spam by country: China, Korea, & US dominate Classifying spam by country: China, Korea, & US dominate Defense suggestion Defense suggestion Correlate originating country with IP range to estimate probability of spam. Correlate originating country with IP range to estimate probability of spam.
Cumulative Distribution Function (CDF) of Spam and Legitimate Greater probability of legitimate s Big increase in probability of received spam
Spam Persistence 85% of unique spammers send 10 s or less If this is true for all, what’s the value in filtering by a specific IP address?
Effectiveness of Blacklists About 80% of spam listed in at least one major blacklist
Effectiveness of Blacklists Cont. Most spam bots are detected by at least one DNSRBL Only 50% of spammers using transient BGP announcements detected by one DNSRBL
Spam from Botnets Circumstantial evidence suggests that most spam originates from bots Spamming hosts and Bobax drones have very similar distributions across IP address space Suggests that much spam received may be due to botnets such as Bobax Suggests that much spam received may be due to botnets such as Bobax
More on Bots Most individual bots send low volume of spam individually
Operating Systems Used by Spammers Used OS fingerprinting tool “p0f” in Mail Avenger Able to identify OS of 75% of hosts that sent spam Of this 75% identifiable segment, 95% run Windows Of this 75% identifiable segment, 95% run Windows Consistent with percentage of hosts on Internet that run Windows Consistent with percentage of hosts on Internet that run Windows Only about 4% run other OS, but are responsible for 8% of received spam. This goes against common perception that most spam originates from Windows botnet drones This goes against common perception that most spam originates from Windows botnet drones
Spam from Transient BGP Announcements Some spammers briefly hijack large portions of IP address space (that do not belong to them), send spam, and withdraw routes immediately after spamming Not much known, not well defended against Very difficult to trace Allows spammer to evade DNSRBLs Allows spammer to evade DNSRBLs Used 10% or less of the time, as complementary spamming tactic
Lessons on Spam Mitigation Why should we use network-level information? Information is less malleable Information is less malleable More constant than spam contents, which content-based filters monitor Information is observable in the middle of the network Information is observable in the middle of the network Closer to the source of the spam than other techniques Will result in more effective spam filters Will result in more effective spam filters When combined with other techniques Has potential to stop spam that other techniques miss Has potential to stop spam that other techniques miss
More Lessons Improves knowledge of host identity Bases detection techniques on aggregate behavior Protects against route hijacking “BGP spectrum agility” “BGP spectrum agility” Other techniques do not Other techniques do not Uses network-level properties to detect and filter
Conclusion Studying the network-level behavior of spammers Designing better spam filters with network- level filters Network-level behavior filters vs. content- based filters Should not replace content-based filters, but complement them Should not replace content-based filters, but complement them
Questions?