Towards Online Spam Filtering in Social Networks Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia and Alok Choudhary Lab for Internet and Security Technology.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Network Security Highlights Nick Feamster Georgia Tech.
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
Detecting and Characterizing Social Spam Campaigns Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen and Ben Y. Zhao Northwestern University, US.
Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan.
Design and Evaluation of a Real-Time URL Spam Filtering Service
Detecting Phantom Nodes in Wireless Sensor Networks Joengmin Hwang Tian He Yongdae Kim Department of Computer Science, University of Minnesota, Minneapolis.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
Hongyu Gao, Tuo Huang, Jun Hu, Jingnan Wang.  Boyd et al. Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication,
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Statistical Analysis of the Social Network and Discussion Threads in Slashdot Vicenç Gómez, Andreas Kaltenbrunner, Vicente López Defended by: Alok Rakkhit.
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
PageRank Identifying key users in social networks Student : Ivan Todorović, 3231/2014 Mentor : Prof. Dr Veljko Milutinović.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
GAYATRI SWAMYNATHAN, CHRISTO WILSON, BRYCE BOE, KEVIN ALMEROTH AND BEN Y. ZHAO UC SANTA BARBARA Do Social Networks Improve e-Commerce? A Study on Social.
An Effective Defense Against Spam Laundering Paper by: Mengjun Xie, Heng Yin, Haining Wang Presented at:CCS'06 Presentation by: Devendra Salvi.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
1. Introduction The underground Internet economy Web-based malware The system analyzing the post-infection network behavior of web-based malware How do.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
BotNet Detection Techniques By Shreyas Sali
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
SURF:SURF: Detecting and Measuring Search Poisoning Long Lu, Roberto Perdisci, and Wenke Lee Georgia Tech and University of Georgia.
Suspended Accounts in Retrospect: An Analysis of Twitter Spam Kurt Thomas, Chris Grier, Vern Paxson, Dawn Song University of California, Berkeley International.
1 Measurement and Classification of Humans and Bots in Internet Chat By Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang College of William.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in.
Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.
Uncovering Social Network Sybils in the Wild Zhi YangChristo WilsonXiao Wang Peking UniversityUC Santa BarbaraPeking University Tingting GaoBen Y. ZhaoYafei.
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.
Technology Considerations for Spam Control 3 rd AP Net Abuse Workshop Busan Dave Crocker Brandenburg InternetWorking
May 30, 2016Department of Computer Sciences, UT Austin1 Using Bloom Filters to Refine Web Search Results Navendu Jain Mike Dahlin University of Texas at.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Prediction of Influencers from Word Use Chan Shing Hei.
Spam ain’t as Diverse as It Seems: Throttling OSN Spam with Templates Underneath Hongyu Gao, Yi Yang, Kai Bu, Yan Chen, Doug Downey, Kathy Lee, Alok Choudhary.
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
The Koobface Botnet and the Rise of Social Malware Kurt Thomas David M. Nicol
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
Socialbots and its implication On ONLINE SOCIAL Networks Md Abdul Alim, Xiang Li and Tianyi Pan Group 18.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
ICDCS 2014 Madrid, Spain 30 June-3 July 2014
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
Detecting and Characterizing Social Spam Campaigns Yan Chen Lab for Internet and Security Technology (LIST) Northwestern Univ.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Social Turing Tests: Crowdsourcing Sybil Detection Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang Miriam Metzger, Haitao Zheng and Ben Y. Zhao Computer.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
Content I. Introduction II. Unique features III. Pros & Cons IV. Extra benefits V. Tips & Conclusion.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,
An Effective Defense Against Spam Laundering Author: Mengjun Xie, Heng Yin, Haining Wang Presented At: CCS’ 06 Prepared By: Amit Shrivastava.
CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee.
Uncovering Social Spammers: Social Honeypots + Machine Learning
Hummingbird: Privacy at the time of Twitter
Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow
Lab for Internet and Security Technology Yan Chen
Dieudo Mulamba November 2017
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
GANG: Detecting Fraudulent Users in OSNs
Presentation transcript:

Towards Online Spam Filtering in Social Networks Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia and Alok Choudhary Lab for Internet and Security Technology (LIST) Department of EECS Northwestern University

Background 2

3

4 Another Study in Spam Detection?? Unique characteristics of OSNs –Are existing features still effective? –Number of words –Average word length –Sender IP neighborhood density –Sender AS number –Status of sender’s service ports –…–… –Any new features? Not effective!

5 Goals and Existing Work An effort towards a system ready to deploy Existing studies in OSN spam: –[Gao IMC10, Grier CCS10] offline analysis –[Thomas Oakland11] landing page vs. message content –Numerous work in spammer-faked account detection  Online detection  High accuracy  Low latency  Detection of campaigns absent from training set  No need for frequent re-training

66 Detection System Design Evaluation Conclusions & Future Work Roadmap

77 We Do NOT: Inspect each message individually … Key Intuition msg_1msg_2msg_3msg_n Spam??

88 We Do: Inspect correlated message clusters Key Intuition msg_k msg_j msg_k msg_i Correlated messages?? Spam campaign??

9 System Overview Detect coordinated spam campaigns.

10 Incremental Clustering Requirement: –Given (k+1) th message and result of the first k messages –Efficiently compute the result of the (k+1) messages Adopt text shingling technique –Pros: High efficiency –Cons: Syntactic method

11 Feature Selection Feature selection criteria: –Cannot be easily maneuvered. –Grasp the commonality among campaigns. 6 identified features:  Sender social degree  Interaction history  Cluster size  Average time interval  Average URL #  Unique URL #

12 Detection System Design Evaluation Conclusions & Future Work Roadmap

13 All experiments obey the time order –First 25% as training set, last 75% as testing set. Evaluated metrics: Dataset and Method  Overall accuracy  Accuracy of feature subset  Accuracy over time  Accuracy under attack  Latency  Throughput SiteSizeSpam #Time Facebook187M217KJan ~ Jun Twitter17 M467KJun ~ Jul. 2011

14 Best result –FB: 80.9% TP 0.19%FP –TW: 69.8%TP 0.70%FP Overall Accuracy

15 No significant drop of TP or increase of FP Accuracy over Time

16 Latency Latency (ms)FacebookTwitter Mean Median3.17.0

17 Detection System Design Evaluation Conclusions & Future Work Roadmap

18 Conclusions We design an online spam filtering system based on spam campaigns. –Syntactical incremental clustering to identify message clusters –Supervised machine learning to classify message clusters We evaluate the system on both Facebook and Twitter data –187M wall posts, 17M tweets –80.9% TPR, 0.19% FPR, 21.5ms mean latency Prototype release:

19 Cool, I by no means noticed anyone do that prior to. {URL} Wow, I in no way noticed anyone just before. {URL} Amazing, I by no means found people do that just before. {URL} Future Work Cool, Iby no meansnoticedanyonedo thatprior to. {URL} Wow, Iin no waynoticedanyonejust before. {URL} Amazing, Iby no meansfoundpeopledo thatjust before. {URL} {Cool | Wow | Amazing} +, I + {by no means | in no way} + {noticed | found} + {anyone | people} + {do that | ε} + {prior to | just before} +. {URL} Template generation? Call for semantic clustering approaches

20 Thank you!

21 Contributions Design an online spam filtering system to deploy as a component of the OSN platform. –High accuracy –Low latency –Tolerance for incomplete training data –No need for frequent re-training Release the system –

22 Incremental Clustering shingle_1 shingle_2 shingle_3 … msg_11msg_13 msg_21msg_22msg_23 msg_31msg_33msg_32 msg_12 … … … msg_new shingle_i shingle_k shingle_j … Compare and Insert

23 Sender Social Degree Compromised accounts: –The more edges, with a higher probability the node will be infected quickly by an epidemic. Spammer accounts: –Social degree limits communication channels. Hypothesis: –Senders of spam clusters have higher average social degree than those of legitimate message clusters.

24 Sender Social Degree Average social degree of spam and legitimate clusters, respectively.

25 Interaction History Legitimate accounts: –Normally only interact with a small subset of its friends. Spamming accounts: –Desire to push spam messages to as many recipients as possible. Hypothesis: –Spam messages are more likely to be interactions between friends that rarely interact with before.

26 Interaction History Interaction history score of spam and legitimate clusters, respectively.

27 Scalability –300M tweets/day –Map-reduce style and cloud computing? Other Thoughts