Presentation is loading. Please wait.

Presentation is loading. Please wait.

CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee.

Similar presentations


Presentation on theme: "CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee."— Presentation transcript:

1 CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee

2 Introduction What is Crowdturfing? 2

3 Introduction ▹ Crowdturfing ▸ Crowdsourcing + astroturfing ▸ Malicious crowdsourcing  Process of outsourcing tasks to a crowd of human workers ▸ Astroturfing  False impression of widespread support 3

4 Crowdturfing 4

5 Twitter 5 ▹ Tweets and retweets ▹ Manipulation of account popularity using artificial retweets ▸ Unjust gain of money through sponsored tweets

6 Black-market vs Crowdturfing Sites for OSN ▹ Black-market sites ▸ Operates by utilising large number of bots ▸ Synchronised group activities ▹ Crowdturfing sites ▸ Human workers ▸ No synchronised group activities 6

7 ▹ Legitimate user? ▸ Account-based features ▸ Synchronised group activities Existing Detection Methods 7

8 Analysing Accounts ▹ Account Popularity ▸ Follower to following ratio ▸ Number of received retweets per tweet ▸ Klout score ▹ Synchronised group activity ▸ Following similarity ▸ Retweet similarity 8

9 Account Popularity 9 Percentage of accounts with a larger number of followers than following: 20%, 37%, 70% Percentage of tweets that are retweeted more than once: 4%, 5%, 43% Median Klout scores: 20, 33, 41

10 Synchronised Group Activity 10 ▹ Following similarity: Similarity of followers between two accounts Black-market: HIGH Normal: LOW Crowdturfing: LOW Similarity of retweets between two accounts Black-market: HIGH Normal: LOW Crowdturfing: LOW ▹ Retweet similarity: Perform malicious activities while doing normal behaviour Human workers work independently of each other

11 Solution CrowdTarget 11

12 Solution 12 CrowdTarget: ▹ Focus on target of crowdturfing accounts ▹ Discover manipulation patterns of target objects ▸ Analyse retweets generated by:  Normal  Crowdturfing  Black-market

13 Analysing Crowdturfing Targets ▹ Tweets receiving artificial retweets generated by crowdturfing workers ▹ Characteristics: ▸ Retweet time distribution ▸ Twitter application ▸ Unreachable retweeters ▸ Click information 13

14 Data Collection 14 Normal Tweets 1044 Twitter accounts with ≥ 100,000 followers Crowdturfing tweets Registered to 9 crowdturfing sites, retrieved tasks requesting retweets Black-market tweets Wrote 282 tweets and registered at black-market sites to purchase retweets

15 Retweet Time Distribution 15 ▹ Count number of retweets generated every hour since a tweet is created Normal tweets & crowdturfing & black-market tweets: Significant difference between mean, standard deviation, skewness and kurtosis value

16 Twitter Application, Unreachable Retweeters, Click Information 16 Ratio of retweets generated by dominant aplication: 99%, 40%, 90% Ratio of “non followers”: 80% of tweets have 80% unreachable followers Normal: < 10% Number of clicks per retweet: > 80% receives more clicks than number of retweets Most tweets never clicked > 90% receives smaller number of clicks

17 CrowdTarget 17 Prepare Training & Testing Data Build Classifiers Test Classifiers Set ratio of malicious tweets as 1% of total tweets. Using features of retweets explained previously Select top classifier with highest accuracy

18 CrowdTarget 18 classifier Retweet time distribution Twitter application Unreachable retweeters Ada Boost Gaussian Bayes K-nearest neighbours TPR: 0.95 TPR: 0.87 TPR: 0.96 Click Information classifier K-nearest neighbours TPR: 0.98

19 Results ▹ False-negatives ▸ Misjudgement of tweets that receive a small number of retweets ▸ 50% of undetected crowdturfing tweets mostly retweeted by reachable accounts  Buy followers from same crowdturfing service ▹ False-positives ▸ Verified accounts received retweets from automated applications 19

20 Feature Robustness ▹ Artificially manipulate retweet time distribution ▸ Cooperation (Independent) ▸ Bot accounts to manipulate retweet time distribution (costly) ▹ Eliminate dominant applications ▹ Reduce number of unreachable retweeters ▸ Follow posting user (decrease popularity) ▹ Manipulate click information (spam?) 20

21 Summary ▹ Novel crowdturfing detection method ▹ CrowdTarget can detect crowdturfing retweets on Twitter with TPR of 0.98 at FPR of 0.01 ▹ Manipulation patterns of the target objects maintained regardless of what evasion techniques crowdturfing account used 21

22 Criticism ▹ Identification of crowdturfing targets ▸ No identification of crowdturfing accounts ▹ Data collection ▸ Same set of tweets used for training AND testing: biased results ▸ Data set not representative of black-market tweets ▹ Unaccounted cases: ▸ Indirect retweets via a “popular” user?  Ratio of unreachable retweeters ↑ 22 A B G C F E D

23 23 THANKS! Any questions?


Download ppt "CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks Jenny (Bom Yi) Lee."

Similar presentations


Ads by Google