Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow

Slides:



Advertisements
Similar presentations
An analysis of Social Network-based Sybil defenses Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University.
Advertisements

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Fighting Fire With Fire: Crowdsourcing Security Solutions on the Social Web Christo Wilson Northeastern University
SMS WATCHDOG: PROFILING SOCIAL BEHAVIORS OF SMS USERS FOR ANOMALY DETECTION Authors: Guanhua Yan, Stephan Eidenbenz, Emannuele Galli Presented by: Ishtiaq.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
Big Data Analytics and Challenge Presented by Saurabh Rastogi Asst. Prof. in Maharaja Agrasen Institute of Technology B.Tech(IT), M.Tech(IT)
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Models of Influence in Online Social Networks
A Taxonomy of Network and Computer Attacks Simon Hansman & Ray Hunt Computers & Security (2005) Present by Mike Hsiao, S. Hansman and R. Hunt,
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
SpotRank : A Robust Voting System for Social News Websites
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
Security Evaluation of Pattern Classifiers under Attack.
Presented by Tienwei Tsai July, 2005
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Collusion-Resistance Misbehaving User Detection Schemes Speaker: Jing-Kai Lou 2015/10/131.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Uncovering Social Network Sybils in the Wild Zhi YangChristo WilsonXiao Wang Peking UniversityUC Santa BarbaraPeking University Tingting GaoBen Y. ZhaoYafei.
Variables, sampling, and sample size. Overview  Variables  Types of variables  Sampling  Types of samples  Why specific sampling methods are used.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Socialbots and its implication On ONLINE SOCIAL Networks Md Abdul Alim, Xiang Li and Tianyi Pan Group 18.
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
1 Munther Abualkibash University of Bridgeport, CT.
Alan Mislove Bimal Viswanath Krishna P. Gummadi Peter Druschel.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee.
Presenter: Siddharth Krishna Sinha Instructor: Jing Gao
Learning to Detect and Classify Malicious Executables in the Wild by J
Item-to-Item Recommender Network Optimization
What Is Cluster Analysis?
Rule Induction for Classification Using
Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow
QianZhu, Liang Chen and Gagan Agrawal
Cloud Data Anonymization Using Hadoop Map-Reduce Framework With Qos Evaluation and Behaviour analysis PROJECT GUIDE: Ms.S.Subbulakshmi TEAM MEMBERS: A.Mahalakshmi( ).
Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel
Flavio Toffalini, Ivan Homoliak, Athul Harilal,
De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017.
Roland Kwitt & Tobias Strohmeier
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
iSRD Spam Review Detection with Imbalanced Data Distributions
Intro to Machine Learning
GhostLink: Latent Network Inference for Influence-aware Recommendation
ReStore: Reusing Results of MapReduce Jobs
Presentation transcript:

Uncovering Large groups of active malicious accounts in online social networks Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow Presented by Rama Krishna Chaitanya Somavajhala

Overview Introduction Examples System overview System Design Parallelising user-pair comparison Implementation Security Analysis Evaluation Conclusion

Introduction Online social network (OSN) is the most popular target for attacking and exploiting. To defend against these attacks, this paper introduces malicious account detection system called SynchroTrap. SynchroTrap has been deployed in common OSN such as Facebook and Instagram and has observed precision higher than 99%. The authors of this paper have analysed the behavioural patterns of social network accounts to differentiate between malicious accounts and legitimate ones.

Introduction The SynchroTrap is an incremental processing system which makes it practical to be deployable at large OSN. This system overcomes all the design challenges such as detecting weak signal from large amount of noisy data and to handle a few terabytes of data on a daily basis. Previous work was just to use a social network’s connectivity to infer if it is fake or real. Another approach was build machine learning classifiers to infer malicious accounts.

Example(1) The graph compares the photo-uploading activities of malicious users to those of normal users at Facebook. The graph (a) plots the photo uploads with timestamps from a group of 450 malicious accounts over a week. The graph (b) shows the photo uploads of 450 randomly chosen accounts which have never been flagged as malicious.

Example(2) The figure compares user-following activities between 1,000 malicious users and 1,000 normal users. Malicious users in Instagram follow target users to inflate the number of their followers.

Economic constraints of attackers Cost on computing and operating resources. Revenue from missions with strict requirements: malicious accounts often perform loosely synchronized actions. The missions of attack campaigns constitute attackers' mission constraints and the limited Infrastructure to launch attack campaigns constitute resource constraints.

System Overview High level system architecture: main idea of SynchroTrap is clustering analysis. It measures pairwise user behaviour similarity and then uses a hierarchical clustering algorithm to group users with similar behaviour over a period of time together.

Challenges Scalability: The large volume of user activity data leads to a low signal-to-noise ratio, making it hard to achieve high detection accuracy. The sheer volume of activity data prohibits a practical implementation that can cope with generic actions. To handle massive user activities at Facebook-scale OSNs, we apply divide-and-conquer. We slice the computation of user comparison into smaller jobs along the time dimension and use parallelism to scale

Challenges Accuracy: The diversity of normal user behavior and the stealthness of malicious activity hinder high accurate detection. In order to achieve high accuracy, we design SynchroTrap based on our understanding of an attacker’s economic constraints. Adaptability to new applications : It is challenging to develop a generic solution that can adapt to new applications

System Design Partitioning activity data by applications: categorize a user’s actions into subsets according to the applications they belong to, which they call application contexts. Comparing user actions: In this system the user actions are taken as tuples each of which has an explicit constraint field that express both resource and mission constraints. The tuple abstraction can be denoted as ‹U,T,C› where U,T,C represents userID, action timestamp and constraint object.

System Design Pairwise user similarity metrics: the system introduces per constraint similarity to measure the fraction of matched actions on a single constraint object. Jaccard similarity, a widely used metric that measures similarity between two sets is used. This value ranges from 0 to 1. Scalable user clustering: clustering users based on their effectiveness and scalability.

System Design Making the algorithm suitable for parallel implementation: maximum similarity from all pairs of users are drawn from different cluster. User pair filter function: filtering functions are used to select user pairs with action similarity. First filtering criterion uncovers malicious user pairs that manifest loosely synchronised behaviour on a set of single constraint objects.

System Design Parallelizing user-pair comparison: large computation of user pair comparison on a bulk data is divided into smaller ones in the time dimension.

System Design Daily comparison and Hourly comparison with sliding windows

System Design Improving Accuracy: the volumes and synchronization levels of malicious attacks vary in different OSN applications. SynchroTrap allows OSN operators to tune a set of parameters to achieve the desired trade offs between false positives and false negatives. Computational Cost: cost can be reduced by taking only the user actions pertaining to the same target object.

Implementation SynchroTrap is built on top of Hadoop MapReduce stack at Facebook. Clustering module is done on Giraph and large graph processing platform based on the Bulk Synchronous Parallel (BSP) model.

Security Analysis Spread spectrum attacks: attackers could attempt to hide synchronization signal that SynchroTrap detects. SynchroTrap limits the total number of abusive actions on a constraint object irrespective of the number of malicious accounts an attacker controls. It uses jaccard similarity to evaluate the action sets of two users and this attack can be evaded by calculating the fraction of matched actions of malicious accounts to be below certain threshold.

Security Analysis Aggressive attacks: they are launched by controlling accounts to perform bulk actions within a short time period. SynchroTrap works together with existing anomaly detection schemes and complements them by targeting stealthier attacks. SynchroTrap limits the total number of abusive actions on a constraint object.

Evaluation: Validation of identified accounts Validation of identified accounts: SynchroTrap uncovers millions of accounts and cross validating the detected accounts is a big task. They study the network-level characteristics of the detected attacks, including the email domains and IP addresses used by malicious accounts. Precision: SynchroTrap allows Facebook and Instagram to identify and invalidate millions of malicious user actions in each application.

Evaluation: Validation of identified accounts Post-processing to deal with false positives: small user clusters are discarded and screen only large clusters which are more likely to result from large attacks. Scale of campaigns:

Evaluation: Validation of identified accounts How are the malicious accounts taken under control? The Facebook security team classifies the reviewed accounts into categories based on their campaigns.

Evaluation: New findings on malicious accounts Malicious accounts detected by SynhroTrap against those detected by existing approaches inside Facebook. SynchroTrap identifies a large number of previously unknown malicious accounts (almost 70% of them were not identified by existing approaches). Full deployment of SynchroTrap in each application on more OSN could yield more new findings and achieve higher rates of malicious accounts.

Evaluation: Social Connectivity of malicious accounts Attackers manipulate account with a variety degree of social connectivity to legitimate users. Ex: an account caught in photo upload is ranked high because attackers tend to use well connected accounts to spread spam photos to their friends.

Evaluation: Operation Experience Longitudinal study has been performed on number of users for first few weeks and the number of detected users decrease after first month in Facebook like and Instagram user following.

Evaluation: System Performance Daily jobs Aggregation jobs Single –linkage hierarchical clustering

Related Work Clickstream and CopyCatch pioneered the work in OSN users but there were few drawbacks which makes SynchroTrap efficient. Clickstream compares pairwise similarity, if a number of fake accounts are larger than a certain threshold then the cluster is classified as fake. CopyCatch assumes that a user can perform a malicious action only once. SynchroTrap uses the source IP addresses and tries to further reduce its computational complexity making it deployable at large scale network.

Conclusion SynchroTrap is a system that uses clustering analysis by adopting a clustering algorithm whose computational complexity grows linearly with the number of actions an account performs to detect large group of malicious users. It is an incremental processing system and it unveiled more than two million malicious accounts. This approach of detecting loosely synchronized actions can also uncover large attacks in other online services. It can analyze large volume of time independent data by reducing the requirements on their computing infrastructure.

QUESTIONS? Thankyou