Download presentation
Presentation is loading. Please wait.
1
CrowdFlow Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility Alex Quinn, Ben Bederson, Tom Yeh, Jimmy Lin
2
Human Computation Things HUMANS can do Things COMPUTERS can do Translation Photo tagging Face recognition Human detection Speech recognition Text analysis Planning
3
Human Computation Things HUMANS can do Things COMPUTERS can do Translation Photo tagging Face recognition Speech recognition Human detection Text analysis Planning
4
Example: Human detection
5
Trade-off space Quality Speed, Affordability Computers Human Workers (traditional) Human Computation
6
Trade-off space Quality Speed, Affordability Computers Human Computation Human Workers (traditional)
7
Man-Computer Symbiosis Automation with human post-correction Supervised machine learning humans computer speed cost quality computer humans speed cost quality
8
Man-Computer Symbiosis CrowdFlow Automation with human post-correction Supervised machine learning humans computer speed cost quality humans computer speed cost quality computer humans speed cost quality
9
Mechanical Turk
10
Human Detection – Starting point
11
Human Detection – Task
12
Human Detection – Results Quality Speed, Affordability 60%90% 119 images took 3 hrs 50 mins and cost $2.38
13
Human Detection – Scenarios Quality Speed, Affordability 60%90% 1000 photos at 72% accuracy would take 12 hrs 20 mins and cost $8.00 119 images took 3 hrs 50 mins and cost $2.38
14
Vision: Richer model Input with computer results Validator Appraiser Fixer Worker Correct Incorrect Fix Start over Output
15
Lessons Learned Design for overall needs/constraints Practical advice: Pay consistently and reasonably Reject only work that is definitely cheating Build in fair cheating deterrence from the start Keep instructions short, but always clear Contact: Alex Quinn aq@cs.umd.edu
16
Cheating Earlier naïve experiment: 2000 reviews classified by 3 Turkers each 91% of work was cheated by 9 bad Turkers
17
Cheating Deterrence Mix in task instances with known answers Keep track of each worker’s accuracy Warning after 10 HITs of <70% accuracy Block after 20 HITs of <70% accuracy Thresholds are problem-specific Other mechanisms Approve payment only after inspection Filter workers based on approval record
18
Ideal Pricing Pay proportional to Turker effort Choose a reasonable hourly rate Example: Confirming correct answer: 10 seconds Fixing an incorrect answer: 60 seconds Answering from scratch: 50 seconds If machine < 80%, bypass machine results Need to adjust for human accuracy!
19
Sentiment Polarity – Example 1 “Skim each movie review and decide whether it is positive or negative....” ○ positive ○ negative
20
Sentiment Polarity – Results 1083 movie reviews grouped into 361 HITs Cost: $54.35 1.7¢ per movie review (5¢ per HIT) Time: 8 hours 7 minutes 27 seconds per movie review Human accuracy: 90% Machine accuracy: 83.5%
21
Sentiment Polarity – Scenarios Given: 100,000 movie reviews Cost constraint: $1000 Expect: Humans do 66,714; machines do the rest 78% combined accuracy 18 days, 17 hours, 40 minutes
22
Review: Monotrans Quality Affordability Machine Translation Machine Translation Professional Bilingual Human Participation Amateur Bilingual Human Participation Monolingual Human Participation Monolingual Human Participation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.