Download presentation
Presentation is loading. Please wait.
1
By : Namesh Kher Big Data Insights – INFM 750
Crowdturfers, campaigns and social media: Tracking and revealing Crowdsourced manipulation of social media By : Namesh Kher Big Data Insights – INFM 750
2
Crowdsourcing & Crowdturfing
Crowdsourcing – “process of obtaining needed services, ideas or content by soliciting contributions from a large group of people, especially an online community” – Wikipedia Crowdturfing – Make use of crowdsourcing platforms to spread malicious content on the web in the form of malicious URL’s, AstroTurf campaigns, manipulating search engines etc Main objective of such systems is to degrade the quality of online information This paper talks about – Finding out some insights of such Crowdturfing ecosystems and attempts to answer questions like Who are the participants of such systems ? What are their roles ? What are the campaigns carried out by such systems/organizations ? Can we distinguish between behaviors of Crowdturfers and regular social media users ?
3
Terminology & Methodology
Requestors – People/users who post a task or a job online, start a thread Workers – People who do the online tasks posted by requestors Methods Analyze types of malicious tasks and properties of workers, requestors Propose a framework for linking the above tasks to their workers on social media sites and hence track activities of crowdturfers Identify the hidden structure of the workers (Identified three classes of crowdturfers – professional, casual and middle men) Propose and develop statistical models to differentiate between the regular online users and workers Web sites used : Microworkers.com / ShortTask.com/ Rapidworkers.com
4
Crowdturfing tasks and participants
Collected exactly 505 campaigns in a span of 2 months Tasks Social Media Manipulation (56%) Sign up (26%) Search Engine Spamming (7%) Vote stuffing (4%) Miscellany (7%) Got a total of 144 requestor and 4012 worker profiles from Microworkers.com
5
90% of workers from USA, India – AMT
Workers from 75 different countries with 38 % from Bangladesh (1539) - MW 90% of workers from USA, India – AMT Almost 3 Million tasks completed with total earnings 500K $ -
6
Requestors from 31 different countries
55 % of them from US and majority (70%) of them from English speaking countries Average money earned per task is 0.51
7
Linking crowdsourcing workers to social media
Following workers on Twitter reveals majorly two types of tasks Tweet about a link – Towrards increasing the pagerank for a page Following a twitter user – Increasing visibility of a user Collected 10,000 random samples to distinguish between workers and non workers Ensure these samples are non-workers !!! Monitor their accounts for a month to check if active Manually checking Found a total of 9878 non workers
8
Analysis of workers by Profile
Observations: Avg. number of followers and followings are more for workers than regular users but tweets are less ! Workers are well connected with other users !
9
Analysis of workers by Activity
Cumulative distribution function for three distinct activity based characteristics.
10
Analysis of workers by Activity
Observations: Workers rarely communicate with each other Workers often re-tweet more than non workers Workers tend to send more URL links
11
Analysis of workers by studying linguistic characteristics of the tweet
Compared each tweet to the LIWC dictionary (linguistic inquiry and word count dictionary) Contains 68 categories Get a score for each categories
12
Analysis of workers by studying linguistic characteristics of the tweet
Observations: Workers tend to swear less than non workers Workers use the First person singular less
13
Network structure of twitter users
Step 1 - Check worker closeness Workers are very close to each other forming a close nit network Average graph density of workers is In a previous study Avg. Graph density was measured as (Yang et al. 2012)
14
Network structure of twitter users
Step 2 - Hubs and Authorities (HITS) Hubs - Workers who follow many other workers Authorities – Workers who are followed by many other workers Formulae – a = A(t).h h = A.a Observations Many of the top 10 hubs are the top 10 authorities This means they are well connected The top hub and authority is NannyDotNet
15
Network structure of twitter users
Step 3 - Professional Workers Try breaking down the 2864 workers to see them in depth Two types of workers Professional workers (At least 3 campaigns). 187 in number Casual workers (1 or 2 campaigns) Observations Their graph density for professional workers is 0.028 Insights Middle men : Some professional workers commonly re tweeted the messages generated by 2 users “Alexambroz” and “Oboy” Professional workers follow these middle men and re tweet their messages hence increasing their rank
16
Detecting middle men How to find middle men ?
Step 1 – Investigate messages of 187 professional workers and get their tweets containing URLs Step 2 – Count how many of professional workers re tweeted each one of extracted messages containing URLs Step 3 – Sort extracted messages by descending order of frequencies Get the origin user from where the messages have come Found 575 potential middle men Top 10 middle men had Large number of followers Many are interested in social media strategy, social marketing, SEO
17
Detecting crowd workers
Training set of 2864 workers and 9878 non-workers 10 fold cross validation 30 classifications algorithms WEKA machine learning toolkit Feature Groups (Total of 192 features in these groups) UD – User demographics UFN – User Friendship networks UA – User Activity UC – User Content Observations: Positive chi square value for all of the features Min accuracy – 86% Max accuracy – 91% Random forests produced highest accuracy 93.26%
18
Related Work (Kittur, Chi and Suh 2008) Showed how large number of workers can be hired within a short time frame for a low cost. They used Amazon Mechanical Turk (Venetis and Garcia – Monlina 2012) proposed 2 quality control mechanisms for controlling the quality of outputs due to the openness of web sites Repeat each task multiple times and combine results from multiple users Define a score for each worker and eliminate the work from users with low scores Recent research has begun in augmenting traditional information retrieval systems and database systems (Alonso, Rose and Stewart 2008)
19
Q and A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.