You Are How You Click Clickstream Analysis for Sybil Detection Gang Wang, Tristan Konolige, Christo Wilson †, Xiao Wang ‡ Haitao Zheng and Ben Y. Zhao.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove.
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Clickstream Models & Sybil Detection Gang Wang ( 王刚 ) UC Santa Barbara
RB-Seeker: Auto-detection of Redirection Botnet Presenter: Yi-Ren Yeh Authors: Xin Hu, Matthew Knysz, Kang G. Shin NDSS 2009 The slides is modified from.
TransAD: A Content Based Anomaly Detector Sharath Hiremagalore Advisor: Dr. Angelos Stavrou October 23, 2013.
Qiang Cao Duke University
ABUSING BROWSER ADDRESS BAR FOR FUN AND PROFIT - AN EMPIRICAL INVESTIGATION OF ADD-ON CROSS SITE SCRIPTING ATTACKS Presenter: Jialong Zhang.
Fighting Fire With Fire: Crowdsourcing Security Solutions on the Social Web Christo Wilson Northeastern University
Worm Origin Identification Using Random Moonwalks Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy.
Design and Evaluation of a Real- Time URL Spam Filtering Service Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, Dawn Song University of California,
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Enabling the Social Web Krishna P. Gummadi Networked Systems Group Max Planck Institute for Software Systems.
Asking Questions on the Internet
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Hongyu Gao, Tuo Huang, Jun Hu, Jingnan Wang.  Boyd et al. Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication,
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
GAYATRI SWAMYNATHAN, CHRISTO WILSON, BRYCE BOE, KEVIN ALMEROTH AND BEN Y. ZHAO UC SANTA BARBARA Do Social Networks Improve e-Commerce? A Study on Social.
Fighting Fire With Fire: Crowdsourcing Security Threats and Solutions on the Social Web Gang Wang, Christo Wilson, Manish Mohanlal, Ben Y. Zhao Computer.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
Zifei Shan, Haowen Cao, Jason Lv, Cong Yan, Annie Liu Peking University, China 1.
Automated malware classification based on network behavior
資安新聞簡報 報告者:劉旭哲、曾家雄. Spam down, but malware up 報告者:劉旭哲.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Social Media Attacks By Laura Jung. How the Attacks Start Popularity of these sites with millions of users makes them perfect places for cyber attacks.
Authors: Gianluca Stringhini Christopher Kruegel Giovanni Vigna University of California, Santa Barbara Presenter: Justin Rhodes.
University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
FaceBook and Your Business Women in Technology in Nigeria Presented by Mrs M.O Alade Women in Technology in Nigeria
EASEAndroid: Automatic Analysis and Refinement for SEAndroid Policy via Large-scale Audit Log Analytics Presenter: Hongyang Zhao Ruowen Wang, Xinwen Zhang,
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers Gang Wang, Tianyi Wang, Haitao Zheng, Ben Y. Zhao, UC Santa Barbara, Usenix Security.
Collusion-Resistance Misbehaving User Detection Schemes Speaker: Jing-Kai Lou 2015/10/131.
Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.
Uncovering Social Network Sybils in the Wild Zhi YangChristo WilsonXiao Wang Peking UniversityUC Santa BarbaraPeking University Tingting GaoBen Y. ZhaoYafei.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Computer Science Department, Peking University
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)
Group 4 1.Maithili Gokhale 2.Swati Sisodia 3.Aman Chanana 4.Piyush Agade “Uncovering Social Network Sybils in the Wild” - Zhi Yang, Christo Wilson, Xio.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Socialbots and its implication On ONLINE SOCIAL Networks Md Abdul Alim, Xiang Li and Tianyi Pan Group 18.
Cryptography and Network Security Sixth Edition by William Stallings.
Authors: Yazan Boshmaf, Lldar Muslukhov, Konstantin Beznosov, Matei Ripeanu University of British Columbia Annual Computer Security Applications Conference.
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
Social Turing Tests: Crowdsourcing Sybil Detection Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang Miriam Metzger, Haitao Zheng and Ben Y. Zhao Computer.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks (WWW2013) BEUTEL, ALEX, WANHONG XU, VENKATESAN GURUSWAMI, CHRISTOPHER.
Gang Wang, Sarita Y. Schoenebeck †, Haitao Zheng, Ben Y. Zhao UC Santa Barbara, † University of Michigan Understanding Bias and Misbehavior on Location-based.
Uncovering Social Network Sybils in the Wild Zhi YangChristo WilsonXiao Wang Peking UniversityUC Santa BarbaraPeking University Tingting GaoBen Y. ZhaoYafei.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Social Media Attacks.
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Lab for Internet and Security Technology Yan Chen
Dieudo Mulamba November 2017
Bolun Wang*, Yuanshun Yao, Bimal Viswanath§ Haitao Zheng, Ben Y. Zhao
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
Introduction to Internet Worm
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
Presentation transcript:

You Are How You Click Clickstream Analysis for Sybil Detection Gang Wang, Tristan Konolige, Christo Wilson †, Xiao Wang ‡ Haitao Zheng and Ben Y. Zhao UC Santa Barbara † Northeastern University ‡ Renren Inc.

Sybils in Online Social Networks Sybil (s ɪ bəl): fake identities controlled by attackers –Friendship is a pre-cursor to other malicious activities –Does not include benign fakes (secondary accounts) Large Sybil populations * Million Sybils (August, 2012) 20 Million Sybils (April, 2013) 20 Mutual Friends * Numbers from CNN 2012, NYT 2013

Sybil Attack: a Serious Threat Social spam –Advertisement, malware, phishing Steal user information Sybil-based political lobbying efforts 3 Malicious URL Taliban uses sexy Facebook profiles to lure troops into giving away military secrets

Sybil Defense: Cat-and-Mouse Game 4 Social Networks Attackers Stop automated account creation CAPTCHA Detect suspicious profiles Spam features, URL blacklists User report Detect Sybil communities [SIGCOMM’06], [Oakland’08], [NDSS’09], [NSDI’12] Crowdsourcing CAPTCHA solving [USENIX’10] Realistic profile generation Complete bio info, profile pic [WWW’12]

Graph-based Sybil Detectors A key assumption –Sybils have difficulty “friending” normal users –Sybils form tight-knit communities Measuring Sybils in Renren social network [IMC’11] –Ground-truth 560K Sybils collected over 3 years –Most Sybils befriend real users, integrate into real-user communities –Most Sybils don’t befriend other Sybils 5 Is This True? Sybils don’t need to form communities! SybilReal

Sybil detection with static profiles analysis [NDSS’13] –Leverage human intuition to detect fake profiles (crowdsourcing) –Successful user-study shows it scales well with high accuracy Profile-based detection has limitations –Some profiles are easy to mimic (e.g. CEO profile ) –Information can be found online A new direction: look at what users do! –How users browse/click social network pages –Build user behavior models using clickstreams Sybil Detection Without Graphs 6

Clickstreams and User Behaviors Clickstream: a list of server-side user-generated events –E.g. profile load, link follow, photo browse, friend invite Intuition: Sybil users act differently from normal users –Goal-oriented: concentrate on specific actions –Time-limited: fast event generation (small inter-arrival time) UserIDEvent GeneratedTimestamp Send Friend Request_ Visit Profile_ ……… 7 Analyze ground-truth clickstreams for Sybil detectio n

Outline Motivation Clickstream Similarity Graph –Ground-truth Dataset –Modeling User Clickstreams –Generating Behavioral Clusters Real-time Sybil Detection 8

Ground-truth Dataset Renren Social Network –A large online social network in China (280M+ users) –Chinese Facebook Ground-truth –Ground-truth provided by Renren’s security team –16K users, clickstreams over two months in 2011, 6.8M clicks 9 * Our study is IRB approved. DatasetUsersSessionsClicksDate (2011) Sybil9,994113,5951,008,031Feb.28-Apr.30 Normal5,998467,1795,856,941Mar.31-Apr.30

Normal users use many social network features Sybils focus on a few actions (e.g. friend invite, browse profiles) Basic Analysis: Click Transitions Sybil Clickstream Friend Invite Photo Browse Profiles InitialFinal 89% 91% 57% 38%7% 34% 44% 6%4% 5% Spammers Crawlers 10 Normal Clickstream Photo InitialFinal 39% 4% Share Blog Notification Browse Profiles 7% 14% 25% 31% 19% 13% 31% 46% 47% 31% 42% 21% 16% 17% 93% 33% 11% Sybils and normal users have very different click patterns!

Identifying Sybils From Normal Users Goal: quantify the differences in user behaviors –Measure the similarity between user clickstreams Approach: map user’s clickstreams to a similarity graph –Clickstreams are nodes –Edge-weights indicate the similarity of two clickstreams Clusters in the similarity graph capture user behaviors –Each cluster represents certain type of click/behavior pattern –Hypothesis: Sybils and normal users fall into different clusters 11

Legit Sybils ①Clickstream Log ③Behavior Clusters ? Unknown User Clickstream ②Similarity Graph ④Labeled Clusters Good Clusters Sybil Cluster Model TrainingDetection 12

Capturing User Clickstreams 1.Click Sequence Model: order of click events –e.g. ABCDA … 2.Time-based Model: sequence of inter-arrival time –e.g. {t 1, t 2, t 3, …} 3.Complete Model: sequence of click events with time –e.g. A(t 1 )B(t 2 )C(t 3 )D(t 4 )A … 13 User1: User2: XXXXXX Time ABCDAA XX DB XX AA XX EA XXXXXX BBCDEC XX BA XX BD XX DE

Clickstream Similarity Functions Similarity of sequences –Common subsequence –Common subsequence with counts Adding “time” to the sequence –Bucketize inter-arrival time, encode time into the sequence –Apply the same sequence similarity function ngram 1 = {A, B, AA, AB, AAB} ngram 2 = {A, C, AA, AC, AAC} S 1 = AAB S 2 = AAC ngram1= { A(2), B(1), AA(1), AB(1), AAB(1) } ngram2= { A(2), C(1), AA(1), AC(1), AAC(1) } S 1 = AAB S 2 = AAC Euclidean Distance V 1 =(2,1,0,1,0,1,1,0) V 2 =(2,0,1,1,1,0,0,1) 14

Clickstream Clustering Similarity graph (fully-connected) –Nodes: user’s clickstreams –Edges: weighted by the similarity score of two users’ clickstreams Clustering similar clickstreams together –Minimum edge weight cut –Graph partitioning using METIS Perform clustering on ground-truth data –Complete model produces very accurate behavior clusters –3% false negatives and 1% false positives 15 Sybils in normal clusters Normal users in Sybil clusters

Outline Motivation Clickstream Similarity Graph Real-time Sybil Detection –Sybil Detection Using Similarity Graph –Unsupervised Approach 16

Detection in a Nutshell ? Normal Sybil 17 Fastest, scalable Sybil detection methodology –Assign the unclassified clickstream to the “nearest” cluster –If the nearest cluster is a Sybil cluster, then the user is a Sybil Assigning clickstreams to clusters –K nearest neighbor (KNN) –Nearest cluster (NC) –Nearest cluster with center (NCC) New ClickstreamsClustered Similarity Graph

Detection Evaluation Split 12K clickstreams into training and testing datasets –Train initial clusters with 3K Sybil + 3K normal users –Classify remaining 6K testing clickstreams 18 NCC (fastest) is as good as the others < 0.7% false positive rate K-nearest neighbor Nearest Cluster Nearest Cluster (center)

400 random good users are enough to color all behavior clusters For unknown dataset, add good users until diminishing returns Still achieve high detection accuracy (1% fp, 4% fn) (Semi) unsupervised Approach What if we don’t have a big ground-truth dataset? –Need a method to label clusters Use a (small) set of known-good users to color clusters –Adding known users to existing clusters –Clusters that contain good users are “good” clusters 19 Good Clusters Sybil Cluster Known Good Users Details here

Real-world Experiments Deploy system prototypes onto social networks –Shipped our prototype code to Renren and LinkedIn –All user data remained on-site 20 Scanned 40K ground-truth user’s clickstreams Flagged 200 previous unknown Sybils Scanned 1M user’s clickstreams Flagged 22K suspicious users Identified a new attack “Image” Spammers  Embed spam content in images  Easy to evade text/URL based detectors

Evasion and Challenges In order to evade our system, Sybils may … –Slow down their click speed –Generate “normal” actions as cover traffic Practical challenges –How to update behavior clusters over time (incrementally)? –How to integrate with other existing detection techniques? (e.g. profile, content based detectors ) 21 Force Sybils to mimic normal users = Win

Thank You! Questions? 22