Download presentation
Published byKimberly Nash Modified over 9 years ago
0
Crowdsourcing: Opportunity or New Threat?
Major Area Exam: June 12th Gang Wang Committee: Prof. Ben Y. Zhao (Co-chair) Prof. Heather Zheng (Co-chair) Prof. Christopher Kruegel
1
Why Crowdsourcing Software automation replaces the role of human in many areas Store and retrieve large volumes of information Perform calculation Human still outperform computer in many ways
2
Searching for Jim Gray (2007)
Jim Gray, Turing Award winner Missing with his sailboat outside San Francisco Bay, Jan 2007 No result from searches of coastguard and private planes Use satellite image to search for Jim Gray’s sailboat Problem: the search cannot be automated by computer Solution Split the satellite image into many small images Volunteers look for his boat in each image 100,000 tasks completed in 2 days
3
Traffic Monitoring (2012) Google Map traffic monitoring
Large area Real-time update Previous approach Deploy sensors on the road Expensive equipment Traffic Jam! No Traffic Here!
4
User-driven traffic monitoring
200 million Google Map users (mobile) 1 Report location while driving Integrate the traffic map in real-time Newsflash: Apple also builds crowdsourced map system in iOS6 this fall 1.
5
Crowdsourcing: a process that enlists a crowd to do micro-work to solve problems that software cannot do Tasks/Problems Solution Effective when a big problem can be decomposed into small tasks that are easy for individuals to solve
6
Crowdsourcing Workflow
Requester Submit tasks, integrate results Platform Manage tasks and workers E.g. Amazon Mechanical Turk Workers Work on tasks, return results Large number of human users Requester Submit Tasks Collect Results Platform Distribute Tasks Return Results Workers
7
State of the Art Popular crowdsourcing services Other success stories
Amazon Mechanical Turk, FreeLancer Tasks: translation, transcription, product survey, etc. Other success stories Wikipedia Protein folding Unfolded Folded However, there are problems
8
Misuse of Crowdsourcing
Difficult to detect High quality spam Deceptive product reviews Realistic fake accounts Emerging threat to online communities “Over 40% of New Mechanical Turk Jobs Involve Spam” “In a Race to Out-Rave, 5-Star Web Reviews Go for $5” “Dairy Giant Mengniu in Smear Scandal” “Hacked s Reveal Russian Astroturfing Program”
9
Crowdsourcing Research
Business models Market survey Labor economics Economics Social, cultural, and ethical issues Social Science Computer Science System/Applications HCI NLP IR DB Security Fake Review Sybils Social Spam SEO
10
Outline Introduction Overview of Crowdsourcing Applications
Research Challenges Security and Crowdsourcing
11
Crowdsourcing Applications
Categorize applications based on human intelligence Natural language processing (NLP) Data labeling [Snow2008] [Callison-Burch2009] Searching results validation [Alonso2008] Database query: CrowdDB [Franklin2011], Qurk [Marcus2011] Image processing Image annotation [Ahn2004] [Chen2009] Image search [Yan2010] Content generation/knowledge sharing Wikipedia, Quora, Yahoo! Answers, StackOverflow Real-time Q&A: Vizwiz [Bigham2010], Mimir [Hsieh2009] Human sensor Google Map traffic monitoring Twitter earthquake report [Sakaki2010]
12
Crowdsourcing Applications
Categorize applications based on human intelligence Natural language processing (NLP) Data labeling [Snow2008] [Callison-Burch2009] Searching results validation [Alonso2008] Database query: CrowdDB [Franklin2011], Qurk [Marcus2011] Image processing Image annotation [Ahn2004] [Chen2009] Image search [Yan2010] Content generation/knowledge sharing Wikipedia, Quora, Yahoo! Answers, StackOverflow Real-time Q&A: Vizwiz [Bigham2010], Mimir [Hsieh2009] Human sensor Google Map traffic monitoring Twitter earthquake report [Sakaki2010]
13
Data Annotation Natural Language Processing (NLP) problems Challenges
Evaluating machine translation quality [Callison-Burch2009] Labeling text content (e.g. emotions) [Snow2008] Challenges Difficult for software automation Experts are expensive and slow Benefits of using crowdsourcing Non-experts are cheap and fast Non-expert results (processed) are as good as experts
14
Automated Image Search
CrowdSearch [Yan2010] Accurate image searching for mobile devices by combining Automated image searching Human validation of searching results via crowdsourcing Candidate Images Result Query Image Automated Image Search Crowd Validation Only 25% accuracy Accuracy > 95% Use human intelligence to improve automation
15
Question and Answer VizWiz [Bigham2010] Help blind people
“Take a photo” “Record the question” “Right side” VizWiz [Bigham2010] Help blind people Answer questions Near real-time Example Shopping scenario “Which item is corn?” Server Use the crowd to help people in need Replace an “expensive” personal assistant
16
Outline Introduction Overview of Crowdsourcing Applications
Research Challenges Security and Crowdsourcing
17
Challenges in Crowdsourcing
Quality control High diversity in worker background and expertise Incentives Encourage participation Improve work quality Task management Perform complex/real-time tasks Coordinate workers and requesters Security Spammy/cheating workers, fraud requesters Using crowdsourcing systems for malicious attacks
18
Quality Control Fundamental problem: the crowd is not reliable
[Oleson2011] [Snow2008] [Callison-Burch2009] [Yan2010] [Franklin2011] Workers make mistakes Workers spam the system Existing strategies Majority voting Pre-screening to test workers Statistic models to clear data bias Yes No Screening Test Ground Truth
19
Incentives Basic questions: how to set the right price of the tasks?
Can you improve work quality by raising payment? Can you attract more workers by raising payment? Empirical study on worker incentives [Mason2010] [Hsieh2010] High payment helps to recruit workers faster and increase participation Money does not improve quality Punishment/bonus based quality control Pay the minimum $0.01 for all workers and $0.01 for bonus Common problem for all applications
20
Task Management Example: Writing a travel book for New York City
Attractions Brief History … Partition (outline) Task1 Map (gather facts) Task2 Paragraph Reduce (collect text) Crowdsource complex tasks [Kittur2011] Partition the complex tasks Parallel execute each work flow Integrate results Implement algorithms on the crowd [Little2010] Regard the crowd as computation unit Design/organize the tasks in a way to run algorithms Open problems Real-time crowdsourcing, parallel tasks execution, synchronization Example: use bubble sort algorithm to sort pictures Task1 Task2 … Task3 Which one is better?
21
Security Challenges Attacks inside crowdsourcing systems
Spammy workers give random/bad answers Dishonest requesters Using crowdsourcing system to carry out malicious campaigns Real-user can perform all kinds of malicious tasks Crowdsourcing makes it possible to scale Write fake reviews Create fake accounts (Sybils) Generate social network spam Solve CAPTCHA Give biased voting Build back links (SEO)
22
Outline Introduction Overview of Crowdsourcing Applications
Research Challenges Security and Crowdsourcing Malicious Crowdsourcing Systems Fake Reviews Generation by the Crowd Detecting Sybils in Online Social Networks
23
Threat Map Sybils Spam OSNs Fake Reviews Greyhat SEO Online Review
Search Engine
24
Dark Side of Crowdsourcing
Large number of workers Easy, cheap, fast Real users can do bad jobs 5 star rating and positive review Use different IP addresses Bypass existing spam filter
25
Measuring Malicious Crowdsourcing
Crowdsourcing malicious tasks Generate Spam, solve CAPTCHA, create fake accounts, Greyhat SEO Scale and economics Zhubajie (China): malicious jobs 10K/month, with $1M/month [Wang2012b] FreeLancer (US): malicious jobs 140K/7 years [Motoyama2011] Emerging threat International work force Growing exponentially Buyers Workers Figure from [Motoyama2011]
26
Outline Introduction Overview of Crowdsourcing Applications
Research Challenges Security and Crowdsourcing Malicious Crowdsourcing Systems Fake Reviews Generation by the Crowd Detecting Sybils in Online Social Networks
27
Online Reviews: Why Important?
80% of people will check online reviews before purchasing products/travel online.1 Independent restaurants: a one-star increase in Yelp rating leads to a 9% increase in revenue.2 1 2 Michael Luca. Reviews, reputation, and revenue: The Case of Yelp.com. Harvard Business School Working Paper, 2011
28
Detecting Fake Reviews
Detecting review spam [Jindal 2008] Duplicated/Near-duplicated reviews Detecting review spammers Classify rating/review behaviors [Lim 2010] Detect synchronized reviews in groups [Mukherjee2012] Deception models [Ott2011] Content classification using trained data Psycholinguistic deception detection Lower bound of spam reviews Dataset Reviewer Products Reviews Spam Reviews Amazon 2,146,048 1,195,133 5,838,032 55,319 Human accuracy 60% Classifier accuracy 90%
29
Challenges to Detect Fake Reviews
Current review spam detection solution is limited Assume a few attackers control many accounts Crowdsourcing can break these assumptions Deception model (NLP approach) has limitations Domain specific Content analysis still has high false positive (10%) Detecting fake reviews is an open problem Low false positive Real-time
30
Outline Introduction Overview of Crowdsourcing Applications
Research Challenges Security and Crowdsourcing Malicious Crowdsourcing Systems Fake Reviews Generation by the Crowd Detecting Sybils in Online Social Networks
31
Social Network Sybils Use crowdsourcing for Sybil detection
Sybils in Online Social Networks (OSNs) Cheating in social games Spreading spam/malware [Thomas2011], [Gao2010], [Nazir2010] Challenges to detect Sybils in the wild Various/adaptive Sybil behavior patterns/attack strategies Increasingly sophisticated/realistic Sybil account profiles Automated mechanisms losing effectiveness Use crowdsourcing for Sybil detection
32
Crowdsourced Sybil Detection
Basic idea: build a crowdsourced Sybil detector Resilient to changing attacker strategies Question: can human identify Sybil profiles? (answer: user study) Ground truth datasets of full user profiles 200 real fake accounts (Renren, Facebook, Facebook-India) Segmented user groups Renren users (Chinese), Facebook (US), Facebook (Indian) Experts (conscientious, motivated), Turkers (paid per profile, $-driven) High level results Experts are accurate; both experts and turkers have near-zero false positives Quality control can improve turker accuracy ~ experts Accurate, scalable, cost-effective
33
Conclusion An alternative solution to various problems
Difficult to be automated by software Can be decomposed into small tasks Many challenges in crowdsourcing system Quality control against unreliable workers Task management for complex tasks Incentive models to reduce cost and optimize performance Malicious crowdsourcing and related attacks Serious threat to existing security mechanisms Measurement study to understand the problem Defense is still an open problem
34
Possible Research Areas
Defend against malicious crowdsourcing systems Attacking malicious crowdsourcing systems Detecting crowdsourcing campaigns in real-time Spot fake reviews/reviewers Resilient to changing behaviors Real-time Scalability Using crowdsourcing to solve security problems Crowdsourcing to detect social Sybils
35
Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.