Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech.

Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech

2 Talk Overview Network-level behavior of spammers –Ultimate goal: Construct spam filters based on network-level properties, rather than content –Content-based properties are malleable Low cost to evasion: Spammers can easily alter content High admin cost: Filters must be continually updated –Content-based filters are applied at the destination Too little, too late: Wasted network bandwidth, storage, etc. Study of DNS-based blacklists Discovery: One of the most telling network-level properties is botnet membership –DNSBL Counter-Intelligence –Network monitoring

3 Network-Level Behavior of Spammers: Major Findings Where does spam come from? –Most received from few regions of IP address space –Insight about spammier prefixes could improve filters Do spammers hijack routes? –A small set of spammers continually advertise short-lived routes –Traceability is not guaranteed How is spam sent? –Most coming from Windows hosts (likely, bots) –Identification of spamming groups (e.g., botnets) could help

4 Data Collection Two domains instrumented with MailAvenger (both on same network) –Sinkhole domain #1 Continuous spam collection since Aug 2004 No real email addresses---sink everything 10 million+ pieces of spam Legitimate mail corpus from a large email provider (40 million inboxes) Monitoring BGP route advertisements from same network

5 Mail Collection: MailAvenger Highly configurable SMTP server that collects many useful statistics

6 BGP Data Collection MX 1 MX 2

7 Spam Study: Major Findings Where does spam come from? –Most received from few regions of IP address space –Insight about spammier prefixes could improve filters Do spammers hijack routes? –A small set of spammers continually advertise short-lived routes –Traceability is not guaranteed How is spam sent? –Most coming from Windows hosts (likely, bots) –identification of spamming groups (e.g., botnets) could help

8 What IP ranges does spam come from? /24 prefix Fraction Spam comes from a few concentrated regions of IP address space

9 Distribution across ASes Top two spamming ASes: 10% of received spam ASes in the US: most spam Top ASes for legitimate email are different Top 10 ASes by Spam CountTop 10 ASes by Legit Email Count Points to note

10 Spam Study: Major Findings Where does spam come from? –Most received from few regions of IP address space –Insight about spammier prefixes could improve filters Do spammers hijack routes? –A small set of spammers continually advertise short-lived routes –Traceability is not guaranteed How is spam sent? –Most coming from Windows hosts (likely, bots) –Indentification of spamming groups (e.g., botnets) could help

11 BGP Spectrum Agility Log IP addresses of SMTP relays Join with BGP route advertisements seen at network where spam trap is co-located. A small club of persistent players appears to be using this technique. Common short-lived prefixes and ASes 61.0.0.0/8 4678 66.0.0.0/8 21562 82.0.0.0/8 8717 ~ 10 minutes Somewhere between 1-10% of all spam (some clearly intentional, others might be flapping)

12 Why Such Big Prefixes? Flexibility: Client IPs can be scattered throughout dark space within a large /8 –Same sender usually returns with different IP addresses Visibility: Route typically wont be filtered (nice and short)

13 Spam Study: Major Findings Where does spam come from? –Most received from few regions of IP address space –Insight about spammier prefixes could improve filters Do spammers hijack routes? –A small set of spammers continually advertise short-lived routes –Traceability is not guaranteed How is spam sent? –Most coming from Windows hosts (likely, bots) –Identification of spamming groups (e.g., botnets) could help

14 Characteristics of spamming bots Distribution across IP space for bots –Similar to IP space distribution for all spam –Lower bot activity in ranges where spam also comes from hijacked routes Operating Systems of Spamming Hosts –~ 95% run Windows –The 4% Unix-based hosts send up to 8% spam

15 Most Bots Send Low Volumes of Spam Lifetime (seconds) Amount of Spam Most bot IP addresses send very little spam, regardless of how long they have been spamming… 99% of bots

16 Most Bot IP addresses are quiet 65% of bots only send mail to a domain once over 18 months Blacklists may want to target IP ranges, rather than individual IPs Lifetime (seconds) Percentage of bots

17 Take-Away Lessons Network-level properties are less malleable, and are observable closer to the source of spam Aggregate properties (e.g., IP prefix, ASN, route used etc.) may be more effective Some network-level properties can be incorporated into spam filters –could be used as a first-pass filter Spam filtering requires a better notion of end-host identity Securing the Internet routing infrastructure is key to traceabilty Network-Level Spam Filtering Redefining End-Host Identifiers

DNS-based Blacklists (DNSBLs) –The most prevalent network-level spam filtering mechanism today –Various criteria: open relays/proxies, virus senders, bad/unused address spaces etc. –Hundreds of DNSBLs of all sizes How to measure the effectiveness of DNSBLs? –Completeness –Responsiveness What about DNS-Based Blacklists?

What is the completeness of the DNSBL? What is the responsiveness of the DNSBL? –How many distinct domains are targeted by a spamming host before it is blacklisted? Does frequency of spam from a host change after it is blacklisted? Questions about DNSBLs

20 Blacklisting: Completeness ~80% listed on average ~95% of bots listed in one or more blacklists Number of DNSBLs listing this spammer Only about half of the IPs spamming from short-lived BGP are listed in any blacklist Fraction of all spam received Spam from IP-agile senders tend to be listed in fewer blacklists

21 Are IP-Based Blacklists Enough? Mail Avenger is very aggressive –Eight different blacklists Cloaking techniques complicate detection –For example, what if a bot could change IP addresses and remain reachable? LAN agility BGP agility

Response Time –Difficult to calculate without ground truth –Can still estimate lower bound Infection S-Day Possible Detection Opportunity RBL Listing Time Response Time Lifecycle of a spamming host A Model of Responsiveness

Data –1.5 days worth of packet captures of DNSBL queries from a mirror of Spamhaus –46 days of pcaps from a hijacked C&C for a Bobax botnet; overlaps with DNSBL queries Method –Monitor DNSBL for lookups for known Bobax hosts Look for first query Look for the first time a query response had a listed status Measuring Responsiveness

Observed 81,950 DNSBL queries for 4,295 (out of over 2 million) Bobax IPs Only 255 (6%) Bobax IPs were blacklisted through the end of the Bobax trace (46 days) –88 IPs became listed during the 1.5 day DNSBL trace –34 of these were listed after a single detection opportunity Both responsiveness and completeness appear to be low. Much room for improvement. Responsiveness: Preliminary Results

25 Over 60% are queried by just one IP/AS –Hypothesis: Decreased chances of being reported Domains Performing Lookups

26 So…What can be Done? Network-level behavior of spammers –Ultimate goal: Construct spam filters based on network-level properties, rather than content –Content-based properties are malleable Low cost to evasion: Spammers can easily alter content High admin cost: Filters must be continually updated –Content-based filters are applied at the destination Too little, too late: Wasted network bandwidth, storage, etc. Study of DNS-based blacklists Discovery: One of the most telling network-level properties is botnet membership –DNSBL Counter-Intelligence –Network monitoring

27 Mitigation #1: Counter-Intelligence Botmasters advertise spamming bots for which bots are not listed in any blacklist. Insight: Someone must be looking up the bots! Can we fish out these DNSBL reconnaissance queries and identify subjects/targets as suspect?

28 Legit Queries vs. Reconnaissance Legitimate queriers are also the targets of queries Reconnaissance queriers are ususally not queried themselves email to mx.a.com DNS- Based Blacklist Legit Mail Server A mx.a.com Legit Mail Server B mx.b.com email to mx.b.com lookup mx.a.com lookup mx.b.com DNS- Based Blacklist Reconnaissance host

29 Measurement Approach Log Spamhaus queries Construct querier/queried graph Prune graph: only nodes in the Bobax trace Examine nodes with high out-degree –Hypothesis: targets of nodes with high out-degree likely bots

30 Whos Doing the Lookups? The botmaster, on behalf of the bots The bots, on behalf of themselves The bots, on behalf of each other Spam Sinkhole Implication: Use a seed set to bootstrap? Known bobax drone!

31 Some Problems with Counter-Intel Constructing the query graph is intensive –Computationally –Storage-wise Initially pruning the graph with IP addresses of known suspects (e.g., spammers) could help

32 Mitigation: Network Monitoring In-network filtering –Requires the ability to detect botnets Question: Can we detect botnets by observing communication structure among hosts? Example: Migration between command and control hosts New type of problem: essentially coupon collection How good are current traffic sampling techniques at exposing these patterns?

33 Experimental Setup

34 (Preliminary) Results Feasible sampling rates Conventional sampling techniques are not well- suited to collecting conversations

35 Summary Lessons Network-level spam filtering holds promise –Potentially a useful complement to content-based filters –Todays DNSBLs arent doing the tricks Two critical pieces –Monitoring techniques (which might be used together) In-network (e.g., with better traffic monitoring techniques) At the edge (e.g., DNSBL reconnaissance) –Routing security Clean-slate wish list –Better notions of identity –More agile monitoring/sampling techniques

Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech.

Similar presentations

Presentation on theme: "Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech.

Similar presentations

Presentation on theme: "Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback