Group 4 1.Maithili Gokhale 2.Swati Sisodia 3.Aman Chanana 4.Piyush Agade “Uncovering Social Network Sybils in the Wild” - Zhi Yang, Christo Wilson, Xio.

Slides:

Advertisements

Similar presentations

An analysis of Social Network-based Sybil defenses Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University.

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

NHnetWORKS December 14,  Facebook is a global Social Networking website that is operated and privately owned by Facebook, Inc.  Users can add.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

An Analysis of Social Network-Based Sybil Defenses Sybil Defender

ABUSING BROWSER ADDRESS BAR FOR FUN AND PROFIT - AN EMPIRICAL INVESTIGATION OF ADD-ON CROSS SITE SCRIPTING ATTACKS Presenter: Jialong Zhang.

Fighting Fire With Fire: Crowdsourcing Security Solutions on the Social Web Christo Wilson Northeastern University

Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.

Hongyu Gao, Tuo Huang, Jun Hu, Jingnan Wang.  Boyd et al. Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication,

Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)

Evaluating Search Engine

Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.

UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.

Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.

Miscreant of Social Networks Paper1: Social Honeypots, Making Friends With A Spammer Near You Paper2: Social phishing Kai and Isaac.

1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.

Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.

Neural Technology and Fuzzy Systems in Network Security Project Progress 2 Group 2: Omar Ehtisham Anwar Aneela Laeeq

Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.

BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,

 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.

Towards Online Spam Filtering in Social Networks Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia and Alok Choudhary Lab for Internet and Security Technology.

SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.

A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人：徐波.

Models of Influence in Online Social Networks

Masquerade Detection Mark Stamp 1Masquerade Detection.

University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.

Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.

Authors: Xu Cheng, Haitao Li, Jiangchuan Liu School of Computing Science, Simon Fraser University, British Columbia, Canada. Speaker : 童耀民 MA1G0222.

SpotRank : A Robust Voting System for Social News Websites

Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.

DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and.

WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Collusion-Resistance Misbehaving User Detection Schemes Speaker: Jing-Kai Lou 2015/10/131.

Uncovering Social Network Sybils in the Wild Zhi YangChristo WilsonXiao Wang Peking UniversityUC Santa BarbaraPeking University Tingting GaoBen Y. ZhaoYafei.

A Graph-based Friend Recommendation System Using Genetic Algorithm

Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock,

By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.

Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University SIGCOMM 2010 Presented by Junyao Zhang Many of the.

Computer Science Department, Peking University

Dual-Region Location Management for Mobile Ad Hoc Networks Yinan Li, Ing-ray Chen, Ding-chau Wang Presented by Youyou Cao.

SocialTube: P2P-assisted Video Sharing in Online Social Networks

Scalable Routing Protocols for

SybilGuard: Defending Against Sybil Attacks via Social Networks.

Socialbots and its implication On ONLINE SOCIAL Networks Md Abdul Alim, Xiang Li and Tianyi Pan Group 18.

Detecting and Characterizing Social Spam Campaigns Yan Chen Lab for Internet and Security Technology (LIST) Northwestern Univ.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Social Turing Tests: Crowdsourcing Sybil Detection Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang Miriam Metzger, Haitao Zheng and Ben Y. Zhao Computer.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Privacy Preserving in Social Network Based System PRENTER: YI LIANG.

Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.

Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.

A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.

Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.

Uncovering Social Network Sybils in the Wild Zhi YangChristo WilsonXiao Wang Peking UniversityUC Santa BarbaraPeking University Tingting GaoBen Y. ZhaoYafei.

Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee.

Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow

Uncovering Social Spammers: Social Honeypots + Machine Learning

Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow

Dieudo Mulamba November 2017

Binghui Wang, Le Zhang, Neil Zhenqiang Gong

GANG: Detecting Fraudulent Users in OSNs

Presentation transcript:

Group 4 1.Maithili Gokhale 2.Swati Sisodia 3.Aman Chanana 4.Piyush Agade “Uncovering Social Network Sybils in the Wild” - Zhi Yang, Christo Wilson, Xio Wang, Tingting Gao, Ben Y. Zhao, Yafei Dai

The Renren Network o Renren is one of the most popular (220 million users) OSNs in China. o Functions maintain personal profiles, upload photos, write diary entries (blogs), and establish bidirectional social links with friends. o The most popular type of user activity is sharing blog entries, which can be forwarded across social hops like “retweets” on Twitter.

What are Sybils? o Sybils are fake identities created to unfairly increase the power or resources of a single malicious user. o Sybil accounts on Renren blend in extremely well with normal users to effectively attract friends and disseminate advertisements. o They have completely filled user profiles with realistic background information, coupled with attractive profile. o As its user population has grown, Renren has become an attractive venue for companies to disseminate information about their products. o This has created opportunities for Sybil accounts to spam advertisements for companies.

Previous detectors on Renren o Previously, Renren had already deployed a few techniques to detect Sybil accounts: using thresholds to detect spamming scanning content for suspect keywords and blacklisted URLs providing Renren users with the ability to flag accounts and content as abusive. o Disadvantages of these techniques generally ad hoc require significant human effort effective only after spam content has been posted.

Identifying Malicious Activities o Definition: Malicious activities are actions taken by an attacker that directly or indirectly support a monetization strategy. o Example: targeting users with spam and phishing attacks. o The definition does not cover legitimate monetization strategies, such as keyword, banner, or news-feed advertising. o In order for attackers to reach a user on OSNs, the attacker must first be friends with that user.

Profiles that are NOT considered o Benign Fake Accounts.: Although, it is possible that an attacker could create benign Sybils that behave identically to normal users and appear on the surface to be real- we are only interested in detecting Sybil accounts that perform attacks. o Inactive Accounts: Determining whether an inactive account is a malicious Sybil is challenging because there is no behavioral data (e.g., friend requests, status updates). The goal of the detector is to catch these accounts as quickly as possible once they become active to minimize the amount of damage they can do to normal users.

Characterizing Sybil Accounts The features that would help distinguish Sybil accounts from normal users are: o Invitation Frequency o Outgoing Requests Accepted o Incoming Requests Accepted o Clustering Coefficient

Characterizing Sybil Accounts 1) Invitation Frequency o The number of friend requests that a user has sent within a fixed time period o Figure shows the friend invitation frequency of our dataset, averaged over long- term (400-hour) and short-term (1-hour) time scales. o Sybil accounts are much more aggressive in sending requests than normal users. There is a clear separation: accounts sending more than 20 invites per time interval are Sybils.

Characterizing Sybil Accounts 2) Outgoing Requests Accepted o It is the fraction of outgoing friend requests confirmed by the recipient. o Figure shows a distinct difference between Sybils and normal users o Non-Sybil users have high accepted percentages, with an average of 79%. o On average, only 26% of all friend requests sent by Sybil accounts are accepted.

Characterizing Sybil Accounts 3) Incoming requests Accepted o It is the fraction of incoming friend requests that users accept. o Sybil accounts are nearly uniform: they accept all incoming friend requests (e.g., 80% of Sybils accepted all friend requests). o Sybil accounts receive few friend requests, this detection mechanism- hence, this method can incur significant delay. The incoming requests accepted by non- Sybil users are spread across the board.

Characterizing Sybil Accounts 4) Clustering Coeffecient o Is graph metric that measures the mutual connectivity of a user’s friends. o Sybil accounts, are likely to befriend users with no mutual friendships. o Figure plots the CDF of cc values for each user’s first 50 friends (sorted by time). o Non-Sybil users have cc values orders of magnitude larger than Sybil users.

Building and Running a Sybil Detector o An Support Vector Machine (SVM) classifier is applied to dataset of 1,000 normal users and 1,000 Sybils. o Partition: five subsamples-four for training the classifier and one tests the classifier. o The results show that the classifier is very accurate, correctly identifying 99% of both Sybil and non-Sybil accounts. o Value of threshold: outgoing requests accepted % 20 ∧ cc< 0.01 o Properly tuned threshold-based detector can achieve performance similar to the computationally expensive SVM.

Real time Sybil Detection o Uses ground truth dataset to give an adaptive, threshold based Sybil detector. o Monitors characteristics of Sybil accounts. o After the detector has been bootstrapped, it uses an adaptive feedback(drawn from the customer complaint rate ) scheme to dynamically tune the threshold parameters on the fly. o Tuning the thresholds minimizes the likelihood of false-positive classifications of normal accounts as Sybils. o It is unlikely to detect Sybils that behave like normal users. o Drawback: will not catch benign inactive Sybils. Inactive Sybils will not be detected until after they begin friending normal users.

Real time Sybil Detection o The detector incorporates real-time changes in friendship links when calculating acceptance percentages. o In some cases, normal users accept friend requests from Sybils only to later revoke the friendship. This causes the accept percentage for the Sybil to drop. o When Renren bans Sybils, all of their edges are destroyed. o This causes the acceptance percentages for other Sybils with which they are linked to drop. o In both cases, the decrease in acceptance percentage helps the detector to more accurately detect Sybils.

False Positives

Analysis of Structural and Behavioral Attributes of Sybils 1.Topological Analysis 2. Clickstream Analysis

Topological Analysis Normal Edges Sybil Edges Attack Edges Honest Nodes Sybil Nodes

Topological Analysis o Community Detection Algorithms work under assumption that Sybils form tight knit communities Community Detection o Given Network Structure, is it ddpossible to detect Sybil Nodes ?

Topological Analysis o Normal User follow same general trend as Sybil User o Only 20% of Sybils are connected to one or more than one Sybil edges

Topological Analysis Is it still possible that the connected minority are vulnerable to community detection ? o Community detection is not a viable option o Is this edge creation intentional ?

Topological Analysis o Most Sybil edge creation is interspersed randomly with edges created to normal users. o For each Sybil, sequence of edges is plotted, with the edges sorted chronologically by creation time.

Topological Analysis o Majority of Sybils do not form communities. o Even the Sybil Edges that are formed are unintentional.

Clickstream Analysis o Each click characterized by USER ID : TIMESTAMP : URL o Clicks were grouped into five categories Photo Message Share Friending Profile Various aspects of clickstream were analyzed : o Number of clicks for each category o Sequence of clicks for a particular session. o Session Duration : Time between first and last click o Session Frequency : How often does a user login

Clickstream Analysis Session Frequency o Sixty-four percent of normal users access Renren no more than once per day. o Only 8% Sybils fall in this low- frequency range o Sybils averaged 3.9 sessions per day versus 1.5 for normal users

Clickstream Analysis Session Duration o The median session duration for normal users is 6 minutes, whereas the median for Sybils is 48 seconds o Less than 25% Normal sessions are 48s long o A very small percent of sybils exhibit sessions that are hours long

Clickstream Analysis Click Activity

Clickstream Analysis Clickstream Modelling o Each state represents a category o Initial and final states are added to mark the beginning and end of each click sequence o Each Edge represents probability of transition from one state to next To analyze sequence of clicks from normal and Sybil nodes a Markov model was created.

Clickstream Analysis

There is stark difference in Click Activity, Click Sequence, and Sessions of Normal and Sybil users. Can this difference be leveraged ?

Clickstream Analysis SVM (Support Vector Machine) Train an SVM on the following clickstream features: o Session-level features including Average session length Average sessions per day o Features from click activities Percentage of clicks in each category Transition probabilities between Categories

Clickstream Analysis MLE (Maximum likelihood Estimation) MLE categorizes user from its clickstream by examining which clickstream model better explains user’s click sequence. For a c lick sequence {s 1, s 2,..., s n } Individual Likelihood P M (s i, s i+1 ) = Probability that user transits from category s i to category s i+1 according to the model M. Likelihood that Model M = ∏ (Individual P M ) reproduces given click stream

Spam Strategies and Collusion o Share Spam on Renren o Case Study: Spam Blogs o Content-Based Sybil Components o Temporal Correlation Between Sybils

Share Spam on Renren o Sybils dominantly share links to spam content to disseminate spam. o Shares per Sybil is much greater than status updates or wall posts.

Share Spam on Renren o 25% of the 237K Sybils share once before they are caught and banned. o Less than 1% of Sybils go uncaught long enough to share 100 or more links.

Share Spam on Renren o The shares of a random sample of 1000 Sybils were manually examined. o Sybils on Renren share two types of links: o Blogs (62.5% shares link to spam blog posts) o Videos (37.5% shares link to bogus online videos)

Case Study: Spam Blogs o Classifying Spam Blogs o Identifying Collusion o Information Dissemination

Classifying Spam Blogs The subset of blogs shared by Sybils were manually verified to be spam. These blogs: o Include links to phishing sites. o Include links to websites selling contraband goods o Majority of them were banned by Renren’s security system.

Identifying Collusion o Fundamental question: are Sybils colluding to promote spam blogs, or is each Sybil operating independently? o Answer: the amount of duplication among the spam blogs was calculated. o Only 302,333 unique spam blogs were promoted, among the 3 million individual spam shares in the dataset.

Identifying Collusion o Top 30 spam blogs were shared more than 10,000 times. o 25% of spam blogs received 2 or more shares from Sybils.

Information Dissemination o Sybils collude so that the spam blogs get featured on the trending content section on Renren. o Sybils can inflate the popularity of spam blogs by making them artificially trend. o Currently, Renren relies on manual inspection by humans to filter spam out of the trending section.

Content-Based Sybil Components o Whether content similarity can be used to group Sybils into connected components. o Intuitively, a single attacker is likely to control strongly connected components. o Understanding these components allows to estimate the number of attackers threatening Renren. o Collusion between Sybils is modeled as a content similarity graph.

o In a content similarity graph, Sybils are nodes and two Sybils are connected if they share similar content. o Content similarity between two sets s i and s j is: where s i and s j are sets of contents shared by two Sybils, respectively. o It ranges from 0 to 1, where o 0- no duplication o 1- sybils share exactly same content Content-Based Sybil Components

o Two Sybils i and j share similar content if s ij is larger than some threshold T s (or equal to T s in the special case of T s = 1) o T s = 0 is the most lax threshold o T s = 1 is the strictest threshold o For T s = 1, >50% of Sybils have at least one Sybil partner forwarding exactly the same content. Content-Based Sybil Components

o Figure shows the quantity and sizes of connected components for different thresholds, ordered from largest to smallest. TsTs Connected componen ts Giant component 04.9K 219K(90%) Sybils 0.576K 84K(35%) Sybils 1114K 3700 Sybils Content-Based Sybil Components

Temporal Correlation Between Sybils o Are there temporal correlations between Sybils that exhibit content similarity? o We suspect that Sybils under the control of a single attacker will be active at similar times. o If t i and t j are set of links that two sybils i, j share during time interval ‘S’ the temporal similarity between them is defined as

o Temporal similarity ranges from 0 to 1, with 0 meaning no overlap and 1 meaning exact overlap. o The size of the time interval ‘S’ can be varied to control the granularity of comparisons. o We evaluate time similarity over two time intervals: 1 hour and 1 day. Temporal Correlation Between Sybils

o Each line plots average time similarity for discreet sets of Sybil pairs with close content similarity. o For example, the first point of the hour-scale line represents the average time similarity for all pairs of Sybils with content similarity in the range of 0 to 0.1. Temporal Correlation Between Sybils

o Figure reveals that time similarity is roughly proportional to content similarity. o Sybils that share similar content tend to do so at similar times. o Under 1 day threshold, Sybils that share near-identical content also exhibit nearly 0.92 time similarity. Temporal Correlation Between Sybils

Making Sybil Defense Future-proof o We discussed a scalable, and accurate system that has been really effective in detecting Sybils in Renren OSN. o Can attackers try adapting and circumvent the defense strategy discussed earlier? o If yes, what are the options that an attacker has? What can an attacker control and manipulate? o Invitation frequency? o Incoming requests acceptance rate? o Outgoing requests acceptance rate? o Clustering Coefficient?

Making Sybil Defense Future-proof o Outgoing requests acceptance rate? o Clustering Coefficient? o The only way these two features can be influenced by a Sybil is by forming tight-knit communities with other Sybil. o What will sending friend requests to other Sybils accomplish? o Other Sybils will accept the requests, hence, the outgoing acceptance rate of the sender will inflate. o A tight community of Sybils will imply a high clustering coefficient.

Fortunately, there is! The Sybils won? There should be something more that could be done.

Making Sybil Defense Future-proof

o A study where the new attack model was simulated (on a regional network in Renren having 170k nodes) suggests Sybil graph structure changed according to the input parameters. o In the simulations, two models for directing the creation of Sybil edges were used o Erdos-Renyi - the attacker links randomly chosen Sybils. o Preferential Attachment - the destination of each Sybil edge is chosen proportionally to the destination Sybil’s degree. o In the simulations, α = 0.26 and β = 0.5, p = 0.33 and Blondel’s algorithm was used to detect communities in the regional graph.

Making Sybil Defense Future-proof o For various values of n and N the following table was obtained. o The results are mixed. For n ≤ 300, the community detector is able to identify Sybils with high accuracy. However, as n grows, so does the false-positive rate. * Uncovering Social Network Sybils in the Wild, Zhi Yang, et alia N : no of Sybil nodes n : no of friend requests sent per Sybil node

So, the community detection algorithms alone are not as precise as we want them to be, as with increasing n, the number false positives increases.

Making Sybil Defense Future-proof o In order for Sybil community detectors to be accurate (i.e., not generate false positives), they must leverage additional features beyond the graph topology (detecting communities). o External Acceptance Rate – The external acceptance percentage is the fraction of friend requests sent by members of a community to users outside the community that are accepted. This should work. Why? Because for Sybils the vast majority of accepted friend requests are from other Sybils inside the local community. Conversely, rejections are from normal users outside the local community.

Conclusion o We discussed the behaviour of Sybils to create a feature-based Sybil detector which can manage to catch 99% of Sybils, with low false-positive and false-negative rates. o Next we saw characterization of Sybil graph topology on a major OSN (Renren). And we found that Sybils on Renren do not obey behavioural assumptions that underlie previous work on decentralized Sybil detectors. 80% of Sybils do not connect to other Sybils but instead they emphasize on connecting with normal users. o We also analyzed Sybil clickstream and learnt that Sybils do not waste time browsing photos or viewing profiles; they prefer visiting profiles. o Finally, we learnt that social links between Sybils are inadequate for identifying colluding behaviour. Sybils with no social connections still act in concert to spread spam.

Question?

Thank you!