Network-Based Spam Filtering Anirudh Ramachandran Nick Feamster Georgia Tech.

Slides:



Advertisements
Similar presentations
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Advertisements

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
BGP01 An Examination of the Internets BGP Table Behaviour in 2001 Geoff Huston Telstra.
ARIN Public Policy Meeting
Network Monitoring System In CSTNET Long Chun China Science & Technology Network.
Nick Feamster Georgia Tech
Filtering: Sharpening Both Sides of the Double-Edged Sword Prof. Nick Feamster Georgia Tech feamster cc.gatech.edu.
Revealing Botnet Membership Using DNSBL Counter-Intelligence Anirudh Ramachandran, Nick Feamster, David Dagon College of Computing, Georgia Tech.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Unwanted Network Traffic: Threats and Countermeasures
Dynamics of Online Scam Hosting Infrastructure
Network-Level Spam and Scam Defenses
11/20/09 ONR MURI Project Kick-Off 1 Network-Level Monitoring for Tracking Botnets Nick Feamster School of Computer Science Georgia Institute of Technology.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Challenges in Making Tomography Practical
Understanding the Network- Level Behavior of Spammers Anirudh Ramachandran Nick Feamster Georgia Tech.
Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon.
Spam and Botnets: Characterization and Mitigation Nick Feamster Anirudh Ramachandran David Dagon Georgia Tech.
Usage-Based DHCP Lease- Time Optmization Manas Khadilkar, Nick Feamster, Russ Clark, Matt Sanders Georgia Tech.
Research Summary Nick Feamster. The Big Picture Improving Internet availability by making networks easier to operate Three approaches –From the ground.
Spamming with BGP Spectrum Agility Anirudh Ramachandran Nick Feamster Georgia Tech.
Spamming with BGP Spectrum Agility Anirudh Ramachandran Nick Feamster Georgia Tech.
Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech.
Understanding the Network- Level Behavior of Spammers Anirudh Ramachandran Nick Feamster Georgia Tech.
Multihoming and Multi-path Routing
Network-Based Spam Filtering Nick Feamster Georgia Tech Joint work with Anirudh Ramachandran and Santosh Vempala.
Network Security Highlights Nick Feamster Georgia Tech.
1 Dynamics of Online Scam Hosting Infrastructure Maria Konte, Nick Feamster Georgia Tech Jaeyeon Jung Intel Research.
1 Network-Level Spam Detection Nick Feamster Georgia Tech.
Spamming with BGP Spectrum Agility Anirudh Ramachandran Nick Feamster Georgia Tech.
Network Operations Research Nick Feamster
Network-Based Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Nadeem Syed, Alex Gray, Sven Krasser, Santosh Vempala.
Network Security Highlights Nick Feamster Georgia Tech.
Multihoming and Multi-path Routing
Network-Level Spam Defenses Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Alex Gray, Santosh Vempala.
UNITED NATIONS Shipment Details Report – January 2006.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Fact-finding Techniques Transparencies
Zhiyun Qian, Z. Morley Mao (University of Michigan)
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Countering DoS Attacks with Stateless Multipath Overlays Presented by Yan Zhang.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 EN0129 PC AND NETWORK TECHNOLOGY I IP ADDRESSING AND SUBNETS Derived From CCNA Network Fundamentals.
© 2012 National Heart Foundation of Australia. Slide 2.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v2.2—5-1 MPLS VPN Implementation Configuring BGP as the Routing Protocol Between PE and CE Routers.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
User Security for e-Post Applications Dr Chandana Gamage University of Moratuwa.
What’s new in WebSpace Changes and improvements with Xythos 7.2 Effective June 24,
Where Are You From? Confusing Location Distinction Using Virtual Multipath Camouflage Song Fang, Yao Liu Wenbo Shen, Haojin Zhu 1.
Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.
Understanding the Network-Level Behavior of Spammers Anirudh Ramachandran Nick Feamster.
Network Security: Spam Nick Feamster Georgia Tech CS 6250 Joint work with Anirudh Ramachanrdan, Shuang Hao, Santosh Vempala, Alex Gray.
Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and.
1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:
Fighting Spam, Phishing and Online Scams at the Network Level Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Nadeem Syed, Alex Gray,
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Revealing Botnet Membership Using DNSBL Counter-Intelligence David Dagon Anirudh Ramachandran, Nick Feamster, College of Computing,
Network-Level Spam and Scam Defenses Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte Alex Gray, Jaeyeon Jung, Santosh Vempala.
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
Understanding the Network-Level Behavior of Spammers Best Student Paper, ACM Sigcomm 2006 Anirudh Ramachandran and Nick Feamster Ye Wang (sando)
Understanding the Network-Level Behavior of Spammers Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy Presenter:
Understanding the network level behavior of spammers Published by :Anirudh Ramachandran, Nick Feamster Published in :ACMSIGCOMM 2006 Presented by: Bharat.
Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman * Joint work with Subhabrata Sen §, Oliver Spatscheck §, Patrick Haffner.
1 Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Speaker: Jun-Yi Zheng 2010/01/18.
Presentation transcript:

Network-Based Spam Filtering Anirudh Ramachandran Nick Feamster Georgia Tech

2 Spam 75-90% of all traffic –PDF Spam: ~11% and growing –Content filters cannot catch! Late 2006: there was a significant rise in spammers use of botnets, armies of PCs taken over by malware and turned into spam servers without their owners realizing it. August 2007: Botnet-based spam caused volumes to increase 53% from previous day Source: NetworkWorld, August 2007

3 More Than Just a Nuisance As of August 2007, one in every 87 s constituted a phishing attack Targeted attacks on the rise –20k-30k unique phishing attacks per month –Spam targeted at CEOs, social networks on the rise

4 One Approach: Filtering Prevent traffic from reaching users inboxes by distinguishing spam from ham Key question: What features best differentiate spam from legitimate mail? –Content –IP address of sender –Behavioral features

5 Content-Based Filtering is Malleable Low cost to evasion: Spammers can easily alter features of an s content can be easily adjusted and changed Customized s are easy to generate: Content- based filters need fuzzy hashes over content, etc. High cost to filter maintainers: Filters must be continually updated as content-changing techniques become more sophistocated

6 This Talk: Network-Based Filtering Filter based on how it is sent, in addition to simply what is sent. Network-level properties are more fixed –Hosting or upstream ISP (AS number) –Botnet membership –Location in the network –IP address block Challenge: Which properties are most useful for distinguishing spam traffic from legitimate ? Very little (if anything) is known about these characteristics

7 Talk Outline Study current sending and mitigation techniques –Network-level behavior of spammers –The effectiveness of IP-based blacklists Design behavioral based filtering techniques –Behavioral blacklisting General idea First trial of system on basic set of features –Joint work with Santosh Vempala Deploy distributed monitoring system to –learn distinguishing features on the fly

8 Studying Sending Patterns Where is the spam coming from? –What IP address space? –ASes? –What are the OSes of the senders? What techniques? –Botnets –Short-lived route announcements –Shady ISPs Capabilities and limitations? –Bandwidth –Size of botnet army

9 BGP Spectrum Agility Log IP addresses of SMTP relays Join with BGP route advertisements seen at network where spam trap is co-located. A small club of persistent players appears to be using this technique. Common short-lived prefixes and ASes / / / ~ 10 minutes Somewhere between 1-10% of all spam (some clearly intentional, others might be flapping)

10 Why Such Big Prefixes? Flexibility: Client IPs can be scattered throughout dark space within a large /8 –Same sender usually returns with different IP addresses Visibility: Route typically wont be filtered (nice and short)

11 Characteristics of IP-Agile Senders IP addresses are widely distributed across the /8 space IP addresses typically appear only once at our sinkhole Depending on which /8, 60-80% of these IP addresses were not reachable by traceroute when we spot- checked Some IP addresses were in allocated, albeit unannounced space Some AS paths associated with the routes contained reserved AS numbers

12 Lessons for Improving Spam Filters IP-Based Blacklists are Becoming Less Effective Effective spam filtering requires –A better notion of end-host identity –Filtering based on features that are more persistent Some features may require network-wide monitoring capabilities

13 Two Parts Study the network-level behavior of spammers –Majority of spam comes from a very small portion of the Internet address space –Most coming from Windows hosts –Most senders low volume to our domain –Conventional blacklists somewhat ineffective Develop behavioral based filtering techniques –Behavioral blacklisting

14 Two Metrics for Evaluating Blacklists Completeness: The fraction of spamming IP addresses that are listed in the blacklist Responsiveness: The time for the blacklist to list the IP address after the first occurrence of spam

15 Completeness of IP Blacklists ~80% listed on average ~95% of bots listed in one or more blacklists Number of DNSBLs listing this spammer Only about half of the IPs spamming from short-lived BGP are listed in any blacklist Fraction of all spam received Spam from IP-agile senders tend to be listed in fewer blacklists

16 Completeness and Responsiveness 10-35% of spam is unlisted at the time of receipt % of these IP addresses remain unlisted even after one month

17 Problems with Existing Blacklists Based on ephemeral identifier (IP address) –More than 10% of all spam comes from IP addresses not seen within the past two months Dynamic renumbering of IP addresses Stealing of IP addresses and IP address space Compromised machines IP addresses of senders have considerable churn Requires a human to first notice the behavior –Spamming is compartmentalized by domain and not analyzed across domains

18 Problem: Changing IP Addresses Fraction of IP Addresses

19 Problem: Low Volume to Each Domain Lifetime (seconds) Amount of Spam Most bot IP addresses send very little spam, regardless of how long they have been spamming. Single-domain observation cannot detect.

20 SpamTracker: Main Idea and Intuition Idea: Blacklist sending behavior (Behavioral Blacklisting) –Identify sending patterns that are commonly used by spammers Intuition: Much more difficult for a spammer to change the technique by which mail is sent than it is to change the content

21 SpamTracker Design For each sender, construct a behavioral fingerprint Cluster senders with similar fingerprints Filter new senders that map to existing clusters Approach Cluster Classify IP x domain x time Collapse LookupScore

22 Building the Classifier: Clustering Feature: Distribution of sending volumes across recipient domains Clustering Approach –Build initial seed list of bad IP addresses –For each IP address, compute feature vector: volume per domain per time interval –Collapse into a single IP x domain matrix: –Compute clusters

23 Clustering: Output and Fingerprint For each cluster, compute fingerprint vector: New IPs will be compared to this fingerprint IP x IP Matrix: Intensity indicates pairwise similarity

24 Classifying IP Addresses Given new IP address, build a feature vector based on its sending pattern across domains Compute the similarity of this sending pattern to that of each known spam cluster –Normalized dot product of the two feature vectors –Spam score is maximum similarity to any cluster

25 Evaluation Emulate the performance of a system that could observe sending patterns across many domains –Data: Postfix logs –Build clusters / train on given time interval –Evaluate classification with subsequent data in trace Evaluate classification –Relative to labeled logs –Relative to IP addresses that were eventually listed

26 Dataset: Summary and Issues 30 days of Postfix logs from large provider –Time, remote IP, receiving domain, accept/reject –Allows us to observe sending behavior over a large number of domains –Problem: About 15% of accepted mail is also spam Creates problems with validating SpamTracker 30 days of SpamHaus database in the month following the Postfix logs –Allows us to determine whether SpamTracker detects some sending IPs earlier than SpamHaus

27 Initial Results Many single- domain senders Large volumes to just a few domains SpamTracker Score Ham Spam Problems

28 Rejected Mails Have Higher Scores Ham Spam SpamTracker Score

29 SpamTracker and Early Detection Compare SpamTracker scores on accepted mail to the SpamHaus database –About 15% of accepted mail was later determined to be spam –Can SpamTracker catch this? Of 620 s that were accepted, but sent from IPs that were blacklisted within one month –65 s had a score larger than 5 (85 th percentile)

30 Deployment Integration with existing infrastructure –Deploy SpamTracker as yet another DNSBL –Existing spam filters use SpamTracker score as an additional feature –Advantage: easy deployment On the wire –Infer connections/ from traffic flow records in individual domains –Advantage: Stop mail closer to the source

31 Improving Classification Use additional features, and combining for more robust classification –Temporal: interarrival times, diurnal patterns, etc. –Spatial: sending patterns of groups of senders Improved similarity computation –Better similarity metrics –Better metrics for detecting early onset

32 Evasion Problem: Malicious senders could add noise to a large feature vector –Possibility: Use smaller number of trusted domains Problem: Malicious senders could change sending behavior to emulate normal senders –In doing so, they may limit their own effectiveness

33 Other Questions and Challenges Reactivity: Can the features be observed quickly enough to construct the fingerprints? Scalability: How can the data be aggregated and collected without imposing too much overhead? Reliability: How can SpamTracker be replicated to better defend against attack or failure? Sensor placement: From where should we watch spam to ensure that the clusters can be distinguished? Symbiosis between botnet detection and spam filtering

34 Summary Spam is on the rise and becoming more clever –12% of spam now PDF spam. Content filters are falling behind –Also becoming more targeted IP-Based blacklists are evadable –Up to 30% of spam not listed in common blacklists at receipt. ~20% remains unlisted after a month –Spammers commonly steal IP addresses New approach: Behavioral blacklisting –Blacklist how the mail was sent, not what was sent