Detecting and Characterizing Social Spam Campaigns Yan Chen Lab for Internet and Security Technology (LIST) Northwestern Univ.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

PhishZoo: Detecting Phishing Websites By Looking at Them
Detecting Social Spam Campaigns on Twitter Zi Chu & Haining Wang The College of William & Mary Indra Widjaja Bell Laboratories, Alcatel-Lucent, USA Presented.
FRAppE: Detecting Malicious Facebook Applications
Detecting and Characterizing Social Spam Campaigns Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen and Ben Y. Zhao Northwestern University, US.
Walter Willinger AT&T Research Labs Reza Rejaie, Mojtaba Torkjazi, Masoud Valafar University of Oregon Mauro Maggioni Duke University HotMetrics’09, Seattle.
ABUSING BROWSER ADDRESS BAR FOR FUN AND PROFIT - AN EMPIRICAL INVESTIGATION OF ADD-ON CROSS SITE SCRIPTING ATTACKS Presenter: Jialong Zhang.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
Privacy in Social Networks CSCE 201. Reading Dwyer, Hiltz, Passerini, Trust and privacy concern within social networking sites: A comparison of Facebook.
Hongyu Gao, Tuo Huang, Jun Hu, Jingnan Wang.  Boyd et al. Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication,
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Miscreant of Social Networks Paper1: Social Honeypots, Making Friends With A Spammer Near You Paper2: Social phishing Kai and Isaac.
User Interactions in OSNs Evangelia Skiani. Do you have a Facebook account? Why? How likely to know ALL your friends? Why confirm requests? Why not remove.
Inbound Statistics Slides Attract. 1 Blogging There are 31% more bloggers today than there were three years ago 46% of people read blogs more than once.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Overview of Web Data Mining and Applications Part I
Verma - ICISS 2014 R easoning M ining NLP Defense Rakesh M. Verma ReMiND Laboratory Catching Classical and Hijack-based Phishing Attacks.
Towards Online Spam Filtering in Social Networks Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia and Alok Choudhary Lab for Internet and Security Technology.
1 New : Create your own message starting from scratch 2 New From Template: add professionally designed templates provided exclusively by Gorilla Contact.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Discovery of Emergent Malicious Campaigns in Cellular Networks Nathaniel Boggs, Wei Wang, Suhas Mathur, Baris Coskun, Carol Pincock © 2013 AT&T Intellectual.
Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Computer Science.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Using Social Networks to Harvest Addresses Reporter: Chia-Yi Lin Advisor: Chun-Ying Huang Mail: 9/14/
Suspended Accounts in Retrospect: An Analysis of Twitter Spam Kurt Thomas, Chris Grier, Vern Paxson, Dawn Song University of California, Berkeley International.
FaceTrust: Assessing the Credibility of Online Personas via Social Networks Michael Sirivianos, Kyungbaek Kim and Xiaowei Yang in collaboration with J.W.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Uncovering Social Network Sybils in the Wild Zhi YangChristo WilsonXiao Wang Peking UniversityUC Santa BarbaraPeking University Tingting GaoBen Y. ZhaoYafei.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Data Mining By Dave Maung.
Week 10-11c Attacks and Malware III. Remote Control Facility distinguishes a bot from a worm distinguishes a bot from a worm worm propagates itself and.
Evaluation of Spam Detection and Prevention Frameworks for and Image Spam - A State of Art Pedram Hayati, Vidyasagar Potdar Digital Ecosystems and.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Detecting Phishing in s Srikanth Palla Ram Dantu University of North Texas, Denton.
SocialTube: P2P-assisted Video Sharing in Online Social Networks
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
Studying Spamming Botnets Using Botlab
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
The Koobface Botnet and the Rise of Social Malware Kurt Thomas David M. Nicol
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
Phi.sh/$oCiaL: The Phishing Landscape through Short URLs Sidharth Chhabra *, Anupama Aggarwal †, Fabricio Benevenuto ‡, Ponnurangam Kumaraguru † * Delhi.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Uncovering Social Spammers: Social Honeypots + Machine Learning
Summary Presented by : Aishwarya Deep Shukla
Lab for Internet and Security Technology Yan Chen
Dieudo Mulamba November 2017
Web archive data and researchers’ needs: how might we meet them?
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

Detecting and Characterizing Social Spam Campaigns Yan Chen Lab for Internet and Security Technology (LIST) Northwestern Univ.

22 Detecting and Characterizing Social Spam Campaigns: Roadmap Motivation & Goal Detection System Design Experimental Validation Malicious Activity Analysis Conclusions

33 Detecting and Characterizing Social Spam Campaigns: Roadmap Motivation & Goal Detection System Design Experimental Validation Malicious Activity Analysis Conclusions

4 Motivation Online social networks (OSNs) are exceptionally useful collaboration and communication tools for millions of Internet users. –400M active users for Facebook alone –Facebook surpassed Google as the most visited website

5 Motivation Unfortunately, the trusted communities in OSN could become highly effective mechanisms for spreading miscreant activities. –Popular OSNs have recently become the target of phishing attacks –account credentials are already being sold online in underground forums

6 Goal In this study, our goal is to: –Design a systematic approach that can effectively detect the miscreant activities in the wild in popular OSNs. –Quantitatively analyze and characterize the verified detection result to provide further understanding on these attacks.

77 Detecting and Characterizing Social Spam Campaigns: Roadmap Motivation & Goal Detection System Design Experimental Validation Malicious Activity Analysis Conclusions

8 Detection System Design The system design, starting from raw data collection and ending with accurate classification of malicious wall posts and corresponding users.

9 Data Collection Based on “wall” messages crawled from Facebook (crawling period: Apr. 09 ~ Jun. 09 and Sept. 09). Leveraging unauthenticated regional networks, we recorded the crawled users’ profile, friend list, and interaction records going back to January 1, M wall posts with 3.5M recipients are used in this study.

10 Filter posts without URLs Assumption: All spam posts should contain some form of URL, since the attacker wants the recipient to go to some destination on the web. Example (without URL): Kevin! Lol u look so good tonight!!! Filter out

11 Filter posts without URLs Assumption: All spam posts should contain some form of URL, since the attacker wants the recipient to go to some destination on the web. Example (with URL): Further process Um maybe also this: Guess who your secret admirer is?? Go here nevasubevd\t. blogs pot\t.\tco\tm (take out spaces)

12 Build Post Similarity Graph After filtering wall posts without URLs, we build the post similarity graph on the remaining ones. –A node: a remaining wall post –An edge: if the two wall posts are “similar” and are thus likely to be generated from the same spam campaign

13 Wall Post Similarity Metric Two wall posts are “similar” if: –They share similar descriptions, or –They share the same URL. Example (similar descriptions): Guess who your secret admirer is?? Go here nevasubevd\t. blogs pot\t.\tco\tm (take out spaces) Guess who your secret admirer is?? Visit: \tyes-crush\t.\tcom\t (remove\tspaces) Establish an edge!

14 Wall Post Similarity Metric Two wall posts are “similar” if: –They share similar descriptions, or –They share the same URL. Example (same URL): secret admirer revealed. goto yourlovecalc\t.\tcom (remove the spaces) hey see your love compatibility ! go here yourlovecalc\t.\tcom (remove\tspaces) Establish an edge!

15 Extract Wall Post Clusters Intuition: –If A and B are generated from the same spam campaign while B and C are generated from the same spam campaign, then A, B and C are all generated from the same spam campaign. We reduce the problem of extract wall post clusters to identifying connected subgraphs inside the post similarity graph.

16 Extract Wall Post Clusters A sample wall post similarity graph and the corresponding clustering process (for illustrative purpose only)

17 Identify Malicious Clusters The following heuristics are used to distinguish malicious clusters (spam campaigns) from benign ones: –Distributed property: the cluster is posted by at least n distinct users. –Bursty property: the median interval of two consecutive wall posts is less than t.

18 Identify Malicious Clusters A sample process of distinguishing malicious clusters from benign ones (for illustrative purpose only) from_user >= n && interval <= t? NO!! from_user >= n && interval <= t? Yes!! Malicious Cluster!! from_user >= n && interval <= t? Yes!! Malicious Cluster!! from_user >= n && interval <= t? Yes!! Malicious Cluster!! Benign Cluster!! from_user >= n && interval <= t? NO!! Benign Cluster!! from_user >= n && interval <= t? NO!! Benign Cluster!! Benign Cluster!! Benign Cluster!! from_user >= n && interval <= t? NO!! from_user >= n && interval <= t? NO!!

19 Identify Malicious Clusters (6, 3hr) is found to be a good (n, t) value by testing TF:FP rates on the border line. Slightly modifying the value only have minor impact on the detection result. Sensitivity test: we vary the threshold –(6, 3 hr) to (4, 6hr) –Only result in 4% increase in the classified malicious cluster.

20 Detecting and Characterizing Social Spam Campaigns: Roadmap Motivation & Goal Detection System Design Experimental Validation Malicious Activity Analysis Conclusions

21 Experimental Validation The validation is focused on detected URLs. A rigid set of approaches are adopted to confirm the malice of the detection result. The URL that cannot be confirmed by any approach will be assumed as “benign” (false positive).

22 Experimental Validation Step 1: Obfuscated URL –URLs embedded with obfuscation are malicious, since there is no incentive for benign users to do so. –Detecting obfuscated URLs, e.g., Replacing ‘.’ with “dot”, e.g., 1lovecrush dot com Inserting white spaces, e.g., abbykywyty\t. blogs pot\t.\tco\tm, etc. Have a complete such list from anti-spam research

23 Experimental Validation Step 2: Third-party tools –Multiple tools are used, including: McAfee SiteAdvisor Google’s Safe Browsing API URL blacklist (SURBL, URIBL, Spamhaus, SquidGuard) Wepawet, drive-by-download checking –The URL that is classified as “malicious” by at least one of these tools will be confirmed as malicious

24 Experimental Validation Step 3: Redirection analysis –Any URL that redirects to a confirmed malicious URL is considered as “malicious”, too. Step 4: Wall post keyword search –If the wall post contains typical spam keyword, like “viagra”, “enlarger pill”, “legal bud”, etc, the contained URL is considered as “malicious”. –Human assistance is involved to acquire such keywords

25 Experimental Validation Step 5: URL grouping –Groups of URLs exhibit highly uniform features. Some have been confirmed as “malicious” previously. The rest are also considered as “malicious”. –Human assistance is involved in identifying such groups. Step 6: Manual analysis –We leverage Google search engine to confirm the malice of URLs that appear many times in our trace.

26 Experimental Validation The validation result. Each row gives the number of confirmed URL and wall posts in a given step. The total # of wall posts after filtering is ~2M out of 187M.

27 Detecting and Characterizing Social Spam Campaigns: Roadmap Motivation & Goal Detection System Design Experimental Validation Malicious Activity Analysis Conclusions

28 Usage summary of 3 URL Formats 3 different URL formats (with e.g.): –Link: –Plain text: mynewcrsh.com –Obfuscated: nevasubevu\t. blogs pot\t.\tco\tm

29 Usage summary of 4 Domain Types 4 different domain types (with e.g.): –Content sharing service: imageshack.us –URL shortening service: tinyurl.org –Blog service: blogspot.com –Other: yes-crush.com

30 Spam Campaign Identification

31 Spam Campaign Temporal Correlation

32 Attack Categorization The attacks categorized by purpose. Narcotics, pharma and luxury stands for the corresponding product selling.

33 User Interaction Degree Malicious accounts exhibit higher interaction degree than benign ones.

34 User Active Time Active time is measured as the time between the first and last observed wall post made by the user. Malicious accounts exhibit much shorter active time comparing to benign ones.

35 Wall Post Hourly Distribution The hourly distribution of benign posts is consistent with the diurnal pattern of human, while that of malicious posts is not.

36 Detecting and Characterizing Social Spam Campaigns: Roadmap Motivation & Goal Detection System Design Experimental Validation Malicious Activity Analysis Conclusions

37 Conclusions We design our automated techniques to detect coordinated spam campaigns on Facebook. Based on the detection result, we conduct in- depth analysis on the malicious activities and make interesting discoveries, including: –Over 70% of attacks are phishing attacks. –malicious posts do not exhibit human diurnal patterns. –etc.

38 Thank you!

39 Extract Wall Post Clusters The algorithm for wall post clustering. The detail of breadth-first search (BFS) is omitted.