1 Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Speaker: Jun-Yi Zheng 2010/01/18.

Slides:

Advertisements

Similar presentations

Nick Feamster Georgia Tech

Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.

Understanding the Network- Level Behavior of Spammers Anirudh Ramachandran Nick Feamster Georgia Tech.

Network-Level Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte, Nadeem Syed, Alex Gray, Santosh Vempala, Jaeyeon.

Spamming with BGP Spectrum Agility Anirudh Ramachandran Nick Feamster Georgia Tech.

Understanding the Network- Level Behavior of Spammers Anirudh Ramachandran Nick Feamster Georgia Tech.

Network-Based Spam Filtering Anirudh Ramachandran Nick Feamster Georgia Tech.

Network-Based Spam Filtering Nick Feamster Georgia Tech Joint work with Anirudh Ramachandran and Santosh Vempala.

Network Security Highlights Nick Feamster Georgia Tech.

1 Network-Level Spam Detection Nick Feamster Georgia Tech.

Network Operations Research Nick Feamster

Network-Based Spam Filtering Nick Feamster Georgia Tech with Anirudh Ramachandran, Nadeem Syed, Alex Gray, Sven Krasser, Santosh Vempala.

Network-Level Spam Defenses Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Alex Gray, Santosh Vempala.

Basic Communication on the Internet:

A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.

Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.

What is Spam  Any unwanted messages that are sent to many users at once.  Spam can be sent via , text message, online chat, blogs or various other.

Design and Evaluation of a Real-Time URL Spam Filtering Service

Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase.

1 Aug. 3 rd, 2007Conference on and Anti-Spam (CEAS’07) Slicing Spam with Occam’s Razor Chris Fleizach, Geoffrey M. Voelker, Stefan Savage University.

IMF Mihály Andó IT-IS 6 November Mihály Andó 2 / 11 6 November 2006 What is IMF? Intelligent Message Filter provides server-side message filtering,

Spam Sagar Vemuri slides courtesy: Anirudh Ramachandran Nick Feamster.

Understanding the Network-Level Behavior of Spammers Anirudh Ramachandran Nick Feamster.

Network Security: Spam Nick Feamster Georgia Tech CS 6250 Joint work with Anirudh Ramachanrdan, Shuang Hao, Santosh Vempala, Alex Gray.

Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and.

1 Enhancing Address Privacy on Anti-SPAM by Dou Wang and Ying Chen School of Computer Science University of Windsor October 2007.

Spam May CS239. Taxonomy (UBE)  Advertisement  Phishing Webpage  Content  Links From: Thrifty Health-Insurance Mailed-By: noticeoption.comReply-To:

Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.

Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.

1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:

Can DNS Blacklists Keep Up With Bots? Anirudh Ramachandran, David Dagon, and Nick Feamster College of Computing, Georgia Tech.

Fighting Spam, Phishing and Online Scams at the Network Level Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Nadeem Syed, Alex Gray,

An Effective Defense Against Spam Laundering Paper by: Mengjun Xie, Heng Yin, Haining Wang Presented at:CCS'06 Presentation by: Devendra Salvi.

Team Excel What is SPAM ?. Spam Offense Team Excel '‘a distinctive chopped pork shoulder and ham mixture'' Image Source:Appscout.com.

SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.

Belnet Antispam Pro A practical example Belnet – Aris Adamantiadis BNC – 24 November 2011.

1 RedIRIS Reputation Block List September RedIRIS Reputation Block ListPágina 2 RedIRIS and mail services At the beginning, RedIRIS was directly.

Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,

BOTNETS & TARGETED MALWARE Fernando Uribe. INTRODUCTION  Fernando Uribe   IT trainer and Consultant for over 15 years specializing.

Towards Modeling Legitimate and Unsolicited Traffic Using Social Network Properties 1 Towards Modeling Legitimate and Unsolicited Traffic Using.

May l Washington, DC l Omni Shoreham The ROI of Messaging Security JF Sullivan VP Marketing, Cloudmark, Inc.

Network-Level Spam and Scam Defenses Nick Feamster Georgia Tech with Anirudh Ramachandran, Shuang Hao, Maria Konte Alex Gray, Jaeyeon Jung, Santosh Vempala.

Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.

Client X CronLab Spam Filter Technical Training Presentation 19/09/2015.

The Internet 8th Edition Tutorial 2 Basic Communication on the Internet: .

1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.

Understanding the Network-Level Behavior of Spammers Best Student Paper, ACM Sigcomm 2006 Anirudh Ramachandran and Nick Feamster Ye Wang (sando)

1 Characterizing Botnet from Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten,

Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.

Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,

What’s New in WatchGuard XCS v9.1 Update 1. WatchGuard XCS v9.1 Update 1  Enhancements that improve ease of use New Dashboard items  Mail Summary >

Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.

Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,

Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.

Studying Spamming Botnets Using Botlab 台灣科技大學資工所楊馨豪 2009/10/201 Machine Learning And Bioinformatics Laboratory.

Cross-Analysis of Botnet Victims: New Insights and Implication Seungwon Shin, Raymond Lin, Guofei Gu Presented by Bert Huang.

By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.

Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.

Understanding the Network-Level Behavior of Spammers Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy Presenter:

Delivery for Spam Mitigation Usenix Security 2012 Gianluca Stringhini, Manuel Egele, Apostolis Zarras, Thorsten Holz, Christopher.

Understanding the network level behavior of spammers Published by :Anirudh Ramachandran, Nick Feamster Published in :ACMSIGCOMM 2006 Presented by: Bharat.

Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.

Exploiting Temporal Persistence to Detect Covert Botnet Channels Authors: Frederic Giroire, Jaideep Chandrashekar, Nina Taft… RAID 2009 Reporter: Jing.

Leveraging Delivery for Spam Mitigation.

Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman * Joint work with Subhabrata Sen §, Oliver Spatscheck §, Patrick Haffner.

© 2006 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Cisco Spam & Virus Blocker Wilson Prokosch WW Channel GTM- Sr. BDM.

Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,

Learning to Detect and Classify Malicious Executables in the Wild by J

Internet Quarantine: Requirements for Containing Self-Propagating Code

Presentation transcript:

1 Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Speaker: Jun-Yi Zheng 2010/01/18

2 Reference  S. Hao, N. Feamster, A. Gray, N. Syed, and S. Krasser, “Detecting spammers with snare: Spatio- temporal network-level automated reputation engine,” in 18th USENIX Security, Montreal, Aug 2009.

3 Outline  Motivation  Data From McAfee  Network-level Features  Building a Classifier  Evaluation  Future Work  Conclusion

4 Motivation  Spam: More than Just a Nuisance Spam: unsolicited bulk s Ham: legitimate s from desired contacts  95% of all traffic is spam In 2009, the estimation of lost productivity costs is $130 billion worldwide  Spam is the carrier of other attacks Phishing Virus, Trojan horses, …

5 Current Anti-spam Methods  Content-based filtering: What is in the mail? More spam format rather than text (PDF spam ~12%) Customized s are easy to generate High cost to filter maintainers  IP blacklist: Who is the sender? (e.g., DNSBL) ~10% of spam senders are from previously unseen IP addresses (due to dynamic addressing, new infection) ~20% of spam received at a spam trap is not listed in any blacklists

6 SNARE  Spatio-temporal Network-level Automatic Reputation Engine Network-Based Filtering: How the is sent?  Fact: >75% spam can be attributed to botnets  Intuition: Sending patterns should look different than legitimate mail Example features: geographic distance, neighborhood density in IP space, hosting ISP (ASn) etc. Automatically determine an sender’s reputation  70% detection rate for a 0.2% false positive rate

7 Why Network-Level Features?  Lightweight Do not require content parsing  Even getting one single packet  Need little collaboration across a large number of domains Can be applied at high-speed networks Can be done anywhere in the middle of the network  Before reaching the mail servers  More Robust More difficult to change than content More stable than IP assignment

8 Data Source  McAfee’s TrustedSourcee mail sender reputation system Time period: 14 days October 22 –November 4, 2007 Message volume: Each day, 25 million messages from 1.3 million IPs Reported appliances 2,500 distinct appliances ( ≈ recipient domains) Reputation score: certain ham, likely ham, certain spam, likely spam, uncertain

9 Finding the Right Features  Question: Can sender reputation be established from just a single packet, plus auxiliary information? Low overhead Fast classification In-network Perhaps more evasion resistant  Key challenge What features satisfy these properties and can distinguish spammers from legitimate senders?

10 Network-level Features  Feature categories Single-packet features Single-header and single-message features Aggregate features  A combination of features to build a classifier No single feature needs to be perfectly discriminative between spam and ham  Measurement study McAfee’s data, October 22-28, 2007 (7days)

11 Summary of SNARE Features Total of 13 features in use

12 What Is In a Packet?  Packet format (incoming SMTP example)  Help of auxiliary knowledge: Timestamp: the time at which the was received Routing information Sending history from neighbor IPs of the sender

13 Sender-receiver Geodesic Distance  Intuition: Social structure limits the region of contacts The geographic distance travelled by spam from bots is close to random

14 Distribution of Geodesic Distance  Find the physical latitude and longitude of Ips based on the MaxMind’s GeoIP database  Calculate the distance along the surface of the earth  Observation: Spam travels further

15 Sender IP Neighborhood Density  Intuition: The infected IP addresses in a botnet are close to one another in numerical space Often even within the same subnet

16 Distribution of Distance in IP Space  IPs as one-dimensional space (0 to for IPv4)  Measure of sender density: the average distance to its k nearest neighbors (in the past history)  Observation: Spammers are surrounded by other spammers

17 Sender IP Neighborhood Density  Intuition: Diurnal sending pattern of different senders Legitimate sending patterns may more closely track workday cycles

18 Differences in Diurnal Sending Patterns  Local time at the sender’s physical location  Relative percentages of messages at different time of the day (hourly)  Observation: Spammers send messages according to machine power cycles

19 Status of Service Ports  Ports supported by service provider  Intuition: Legitimate is sent from other domains ‟ MSA (Mail Submission Agent) Bots send spam directly to victim domains

20 Distribution of number of Open Ports  Actively probe back senders’ IP to check out what service ports open  Sampled IPs for test, October 2008 and January 2009  Observation: Legitimate mail tends to originate from machines with open ports

21 AS of sender’s IP  Intuition: Some ISPs may host more spammers than others  Observation: A significant portion of spammers come from a relatively small collection of ASes More than 10% of unique spamming IPs originate from only 3 ASes The top 20 ASes host ~42% of spamming IPs

22 Number of recipients  Intuition: Spam is sent to multiple addresses  Observation: Spam tends to have more recipients Over 94% of spam and 96% ham is sent to a single recipient

23 Message size  Intuition: Spammer will mostly send the similar content in all the messages  Observation: Legitimate mail has variable message size

24 Summary of SNARE Features Total of 13 features in use

25 SNARE: Building A Classifier  RuleFit (ensemble learning) is the prediction result (label score) are base learners (usually simple rules) are linear coefficients  Example

26 Evaluation Setup  Data 14-day data, October 22 to November 4, million messages sampled each day (only consider certain spam and certain ham)  Training Train SNARE classifier with equal amount of spam and ham (30,000 in each categories per day)  Temporal Cross-validation Temporal window shifting 3 days1 day

27 Receiver Operator Characteristic (ROC)  False positive rate = Misclassified ham/Actual ham  Detection rate = Detected spam/Actual spam(True positive rate)

28 Detection of “Fresh” Spammers  “Fresh” senders IP addresses not appearing in the previous training windows  Accuracy Fixing the detection rate as 70%, the false positive is 5.2%

29 A Spam-Filtering System  arrival Single packet or message  Whitelisting Reduce false positive rate  Greylisting and content- based detection Ham-like queue have higher priority  Retraining Labeled data

30 Future Work  Combine SNARE with other anti-spam techniques to get better performance Can SNARE capture spam undetected by other methods (e.g., content-based filter)?  Make SNARE more evasion-resistant Can SNARE still work well under the intentional evasion of spammers?

31 Conclusion  Network-level features are effective to distinguish spammers from legitimate senders Lightweight: Sometimes even by the observation from one single packet More Robust: Spammers might be hard to change all the patterns, particularly without somewhat reducing the effectiveness of the spamming botnets  SNARE is designed to automatically detect spammers A good first line of defense