CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee.

Slides:



Advertisements
Similar presentations
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Advertisements

Large-Scale Entity-Based Online Social Network Profile Linkage.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Predicting Emerging Social Conventions in Online Social Networks Farshad Kooti * Winter Mason † Krishna Gummadi * Meeyoung Cha ‡ MPI-SWS * Stevens Institute.
Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages.
Forwarding Redundancy in Opportunistic Mobile Networks: Investigation and Elimination Wei Gao 1, Qinghua Li 2 and Guohong Cao 3 1 The University of Tennessee,
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
 Network Management  Network Administrators Jobs  Reasons for using Network Management Systems  Analysing Network Data  Points that must be taken.
Towards Online Spam Filtering in Social Networks Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia and Alok Choudhary Lab for Internet and Security Technology.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
Detecting Spammers on Social Networks Gianluca Stringhini, Christopher Kruegel, Giovanni Vigna (University of California) Annual Computer Security Applications.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
HOW-TO: Driving Traffic with Twitter Cards & Analytics 9 types of Twitter Cards to install on your site and how to measure ROI for subscription sales.
DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Profile-based Web Application Security System Kyungtae Kim High Performance.
Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers Gang Wang, Tianyi Wang, Haitao Zheng, Ben Y. Zhao, UC Santa Barbara, Usenix Security.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
Classification Techniques: Bayesian Classification
Cross-Analysis of Botnet Victims: New Insights and Implication Seungwon Shin, Raymond Lin, Guofei Gu Presented by Bert Huang.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Prediction of Influencers from Word Use Chan Shing Hei.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Discovering the Fake Followers in the Micro-blogging via Machine Learning Yi Shen Jianjun Yu October 16, 2013 Chinese Academy of Sciences Computer Network.
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Phi.sh/$oCiaL: The Phishing Landscape through Short URLs Sidharth Chhabra *, Anupama Aggarwal †, Fabricio Benevenuto ‡, Ponnurangam Kumaraguru † * Delhi.
 Definition of Social Media - forms of electronic communication (as Web sites for social networking and microblogging) through which users create online.
Reputation Management System
Statistics Outline I.Types of Error A. Systematic vs. random II. Statistics A. Ways to describe a population 1. Distribution 1. Distribution 2. Mean, median,
Social Turing Tests: Crowdsourcing Sybil Detection Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang Miriam Metzger, Haitao Zheng and Ben Y. Zhao Computer.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
C ROWD T ARGET : T ARGET - BASED D ETECTION OF C ROWDTURFING IN O NLINE S OCIAL N ETWORKS Authors: Jonghyuk Song, Sangho Lee, Jong Kim Dept. of CSE, POSTECH.
Presenter: Siddharth Krishna Sinha Instructor: Jing Gao
Uncovering Social Spammers: Social Honeypots + Machine Learning
The Hidden Locality in Swarms
Evaluating Classifiers
By : Namesh Kher Big Data Insights – INFM 750
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Speaker : YUN–KUAN,CHANG Date : 2009/11/17
Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow
On the Strength of Weak Identities in Social Computing Systems
Measuring and Mitigating OAuth Access Token Abuse by Collusion Networks Shehroze Farooqi1, Fareed Zaffar2, Nektarios Leontiadis3, Zubair Shafiq1 University.
Source: Procedia Computer Science(2015)70:
Graphical Descriptive Techniques
Artificial Intelligence Techniques
Cryptocurrencies: A Brief Look & Sentiment Analysis
Multi-Biometrics: Fusing At The Classification Output Level Using Keystroke and Mouse Motion Features Todd Breuer, Paola Garcia Cardenas, Anu George, Hung.
Basic Statistics for Non-Mathematicians: What do statistics tell us
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Analyzing social media data to monitor public health trends
Starter: Percentage of amounts Starter: Percentage of amounts
Unit 4 Quiz: Review questions
Analyzing Influence of Social Media Through Twitter
Best Twitter Services Providing company Followersuk.com M BILAL ASLAM.
Presentation transcript:

CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee

Introduction What is Crowdturfing? 2

Introduction ▹ Crowdturfing ▸ Crowdsourcing + astroturfing ▸ Malicious crowdsourcing  Process of outsourcing tasks to a crowd of human workers ▸ Astroturfing  False impression of widespread support 3

Crowdturfing 4

Twitter 5 ▹ Tweets and retweets ▹ Manipulation of account popularity using artificial retweets ▸ Unjust gain of money through sponsored tweets

Black-market vs Crowdturfing Sites for OSN ▹ Black-market sites ▸ Operates by utilising large number of bots ▸ Synchronised group activities ▹ Crowdturfing sites ▸ Human workers ▸ No synchronised group activities 6

▹ Legitimate user? ▸ Account-based features ▸ Synchronised group activities Existing Detection Methods 7

Analysing Accounts ▹ Account Popularity ▸ Follower to following ratio ▸ Number of received retweets per tweet ▸ Klout score ▹ Synchronised group activity ▸ Following similarity ▸ Retweet similarity 8

Account Popularity 9 Percentage of accounts with a larger number of followers than following: 20%, 37%, 70% Percentage of tweets that are retweeted more than once: 4%, 5%, 43% Median Klout scores: 20, 33, 41

Synchronised Group Activity 10 ▹ Following similarity: Similarity of followers between two accounts Black-market: HIGH Normal: LOW Crowdturfing: LOW Similarity of retweets between two accounts Black-market: HIGH Normal: LOW Crowdturfing: LOW ▹ Retweet similarity: Perform malicious activities while doing normal behaviour Human workers work independently of each other

Solution CrowdTarget 11

Solution 12 CrowdTarget: ▹ Focus on target of crowdturfing accounts ▹ Discover manipulation patterns of target objects ▸ Analyse retweets generated by:  Normal  Crowdturfing  Black-market

Analysing Crowdturfing Targets ▹ Tweets receiving artificial retweets generated by crowdturfing workers ▹ Characteristics: ▸ Retweet time distribution ▸ Twitter application ▸ Unreachable retweeters ▸ Click information 13

Data Collection 14 Normal Tweets 1044 Twitter accounts with ≥ 100,000 followers Crowdturfing tweets Registered to 9 crowdturfing sites, retrieved tasks requesting retweets Black-market tweets Wrote 282 tweets and registered at black-market sites to purchase retweets

Retweet Time Distribution 15 ▹ Count number of retweets generated every hour since a tweet is created Normal tweets & crowdturfing & black-market tweets: Significant difference between mean, standard deviation, skewness and kurtosis value

Twitter Application, Unreachable Retweeters, Click Information 16 Ratio of retweets generated by dominant aplication: 99%, 40%, 90% Ratio of “non followers”: 80% of tweets have 80% unreachable followers Normal: < 10% Number of clicks per retweet: > 80% receives more clicks than number of retweets Most tweets never clicked > 90% receives smaller number of clicks

CrowdTarget 17 Prepare Training & Testing Data Build Classifiers Test Classifiers Set ratio of malicious tweets as 1% of total tweets. Using features of retweets explained previously Select top classifier with highest accuracy

CrowdTarget 18 classifier Retweet time distribution Twitter application Unreachable retweeters Ada Boost Gaussian Bayes K-nearest neighbours TPR: 0.95 TPR: 0.87 TPR: 0.96 Click Information classifier K-nearest neighbours TPR: 0.98

Results ▹ False-negatives ▸ Misjudgement of tweets that receive a small number of retweets ▸ 50% of undetected crowdturfing tweets mostly retweeted by reachable accounts  Buy followers from same crowdturfing service ▹ False-positives ▸ Verified accounts received retweets from automated applications 19

Feature Robustness ▹ Artificially manipulate retweet time distribution ▸ Cooperation (Independent) ▸ Bot accounts to manipulate retweet time distribution (costly) ▹ Eliminate dominant applications ▹ Reduce number of unreachable retweeters ▸ Follow posting user (decrease popularity) ▹ Manipulate click information (spam?) 20

Summary ▹ Novel crowdturfing detection method ▹ CrowdTarget can detect crowdturfing retweets on Twitter with TPR of 0.98 at FPR of 0.01 ▹ Manipulation patterns of the target objects maintained regardless of what evasion techniques crowdturfing account used 21

Criticism ▹ Identification of crowdturfing targets ▸ No identification of crowdturfing accounts ▹ Data collection ▸ Same set of tweets used for training AND testing: biased results ▸ Data set not representative of black-market tweets ▹ Unaccounted cases: ▸ Indirect retweets via a “popular” user?  Ratio of unreachable retweeters ↑ 22 A B G C F E D

23 THANKS! Any questions?