Improving Spam Detection Based on Structural Similarity By Luiz H. Gomes, Fernando D. O. Castro, Rodrigo B. Almeida, Luis M. A. Bettencourt, Virgílio A.

Slides:

Advertisements

Similar presentations

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.

Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.

Network Security Highlights Nick Feamster Georgia Tech.

1 Network-Level Spam Detection Nick Feamster Georgia Tech.

Text Categorization.

Measure Projection Analysis

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.

Albert Gatt Corpora and Statistical Methods Lecture 13.

What is Spam  Any unwanted messages that are sent to many users at once.  Spam can be sent via , text message, online chat, blogs or various other.

Addressing spam and enforcing a Do Not Registry using a Certified Electronic Mail System Information Technology Advisory Group, Inc.

Deliverability How We Get You to the Inbox. +98 % Our Deliverability routinely ranks in the high 90s. There’s another way of saying this: We Get Your.

Ostra: Leveraging trust to thwart unwanted commnunication Alan Mislove Ansley Post Reter Druschel Krishna P. Gummadi.

Google Apps: Google Mail Got Gmail?....Need Help? Mrs. Connor.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan.

S ENTIMENTAL A NALYSIS O F B LOGS B Y C OMBINING L EXICAL K NOWLEDGE W ITH T EXT C LASSIFICATION. 1 By Prem Melville, Wojciech Gryc, Richard D. Lawrence.

CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.

Phishing (pronounced “fishing”) is the process of sending messages to lure Internet users into revealing personal information such as credit card.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Preventing Spam: Today and Tomorrow Zane Bonny Vilaphong Phasiname The Spamsters!

Network Security: Spam Nick Feamster Georgia Tech CS 6250 Joint work with Anirudh Ramachanrdan, Shuang Hao, Santosh Vempala, Alex Gray.

Tony BrettOUCS Course Code ZAE 1 March 2004 Webmail – the new WING Tony Brett Oxford University Computing Services.

Deep Belief Networks for Spam Filtering

Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and.

Spam May CS239. Taxonomy (UBE)  Advertisement  Phishing Webpage  Content  Links From: Thrifty Health-Insurance Mailed-By: noticeoption.comReply-To:

Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.

ITIS 1210 Introduction to Web-Based Information Systems Chapter 15 How Spam Works.

Surrey Libraries Computer Learning Centres Totally New to Computers Easy Gmail March 2013 Easy Gmail Teaching Script.

1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:

Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.

23 October 2002Emmanuel Ormancey1 Spam Filtering at CERN Emmanuel Ormancey - 23 October 2002.

Managing and Avoiding Junkmail. Junk  Where does Junk Mail come from? People with whom you do business  Pepsi Friends of people with whom you.

An Effective Defense Against Spam Laundering Paper by: Mengjun Xie, Heng Yin, Haining Wang Presented at:CCS'06 Presentation by: Devendra Salvi.

SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.

Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,

Visit for Marketing and Deliverability Tips, Tools, & Trainingwww. Delivered.com.

Personalized Spam Filtering for Gray Mail Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Robert McCann Microsoft Corporation.

OCR Nationals – Unit 1 AO2 (Part 2) – s. Overview of AO2 (Part 2) To select and use tools and facilities to download files/information and to send.

Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Spam Filtering. From: "" Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY ! There.

A Technical Approach to Minimizing Spam Mallory J. Paine.

ACT: Attachment Chain Tracing Scheme for Virus Detection and Control Jintao Xiong Proceedings of the 2004 ACM workshop on Rapid malcode Presented.

Vigilante: End-to-End Containment of Internet Worms Authors : M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham In Proceedings.

SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.

Technology Considerations for Spam Control 3 rd AP Net Abuse Workshop Busan Dave Crocker Brandenburg InternetWorking

Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:

Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,

Spam Detection Ethan Grefe December 13, 2013.

By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.

Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.

Understanding the network level behavior of spammers Published by :Anirudh Ramachandran, Nick Feamster Published in :ACMSIGCOMM 2006 Presented by: Bharat.

Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.

Let’s Talk About ! Instructor: Robert Griffiths Spring 2010 (original presentation developed by Jill Bond)

Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.

 Left Side  Mail/Contacts/Tasks  Labeled Folders  Contacts – “IM” Feature  Right Side  s.

-to-Blog How It Works. This Is The « -to-blog» System Architecture.

Welcome to Using How to use Gmail, it’s free!

Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.

Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,

An Effective Defense Against Spam Laundering Author: Mengjun Xie, Heng Yin, Haining Wang Presented At: CCS’ 06 Prepared By: Amit Shrivastava.

Dec 14, 2014, Harvard University

Computer Skills (1) .

Unit 3 Section 6.4: Internet Security

Harry Wang APJ Solution Architect

Decision Trees (suggested time: 30 min)

ZIMBRA WEB ACCESS USER MANUAL

KDD 2004: Adversarial Classification

Design open relay based DNS blacklist system

Presentation transcript:

Improving Spam Detection Based on Structural Similarity By Luiz H. Gomes, Fernando D. O. Castro, Rodrigo B. Almeida, Luis M. A. Bettencourt, Virgílio A. F. Almeida, Jussara M. Almeida Presented at Steps to Reducing Unwanted Traffic on the Internet Workshop, 2005 Presented by Jared Bott

2 Outline Overview Concepts Detecting Spam Experimental Results Analysis of Paper

3 Overview New algorithm to detect spam messages Uses information that is harder to change Works in conjunction with another spam classifier  I.e. SpamAssassin Less false positives than compared methods

4 Spam Detection Problem Spam detection algorithms use some part of s to determine if a message is spam  Spammers change messages so that they do not meet detection criteria for spam  Very easy to change spam messages, usernames, domains, subjects, etc.

5 Key Idea The lists that spammers and legitimate users send messages to and from can be used as the identifiers of classes of traffic.  The lists of addresses spammers send to are unlikely to be similar to those of legitimate users.  Lists don’t change that often

6 Using Lists A user is not just an address. It can be a domain, etc. Represent user as a vector in multi- dimensional conceptual space created with all possible contacts  Each sender and each recipient has their own vector Model relationship between senders and recipients

7 Constructing Vectors If there is at least one sent from sender s i to recipient r n, then the value in s i ’s vector’s nth dimension is 1. Otherwise, that value is 0. If there is at least one received by recipient r i from sender s n, the value in r i ’s vector’s nth dimension is 1. Otherwise it is 0.

8 Example Vectors

9 Similarity Between Senders Similarity between senders s i and s k is the cosine of the angle between their vectors  cos(s i, s k )  0 means no shared contact  1 means identical contact lists In legitimate , a 1 means that the senders operate in the same social group. In spammers, a 1 means that the senders use the same list or are the same person.

10 Grouping Users Into Clusters Group users with similar vectors  Users with similar vectors are likely to have related roles, i.e. spammer or legitimate user Each cluster is represented by a vector  This vector is the sum of all its component users’ vectors

11 Similarity Between a User and a Cluster Similarity is derived from user to user similarity equation  If sender s i is a member of cluster sc k, then the similarity is cos(sc k – s i, s i ).  If sender s i is not a member of cluster sc k, then the similarity is cos(sc k, s i ). Similarity between a user and a cluster will change over time  Remove the user’s vector from the cluster’s vector when computing similarity and reclassifying a user

12 Detecting Spam Two probabilities to compute  P s (m) – Probability of an m being sent by a spammer  P r (m) – Probability of an m being addressed to users that receive spam

13 Detecting Spam When an arrives, classify it using some other method Find the cluster (sc) the ’s sender belongs in  If many users in the cluster send messages that are classified as spam by auxiliary method, the probability of all the users in that cluster sending spam is high Update the sc’s spam probability P s (m) ← sc’s spam probability

14 Detecting Spam For all recipients of the , find the cluster (rc) each one belongs to Update the spam probability for each cluster P r (m) ← P r (m) + spam probability of each rc P r (m) ← P r (m)/number of recipients

15 Detecting Spam Compute a spam rank for the based upon P r (m) and P s (m) If the spam rank is above some threshold (ω), label it as spam If the spam rank is below 1- ω, label it is legitimate Otherwise label the as the auxiliary method’s classification

16

17 Experimental Results Tested on a log of eight days of from a large Brazilian university Tested on a 2.8 GHz Pentium 4 with 512 MB RAM  Able to classify 20 messages per second  Faster than the average message arrival peak rate

18 Results MeasureNon-SpamSpamAggregate # of s 191,417173,584365,001 Size of s 11.3 GB1.2 GB12.5 GB # of distinct senders 12,33819,56727,734 # of distinct recipients 22,76227,92638,875

19 Results Manually checked false positives to see if they were spam or not  Auxiliary algorithm had more false positives Algorithm% of Misclassifications Original Classification 60.33% Their approach 39.67%

20 Strengths Less false positives than SpamAssassin Low-cost Works with message information that doesn’t change that much

21 Weaknesses Needs an additional message classifier, i.e. SpamAssassin Manual tuning of algorithm

22 Improvements Time correlation of similar addresses Collaborative filtering based upon user feedback