Download presentation
Presentation is loading. Please wait.
Published byAshlynn Caldwell Modified over 9 years ago
1
© 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk New York City Meet Up September 1, 2009 WELCOME!
2
© 2009 Amazon.com, Inc. or its Affiliates. AGENDA Welcoming Statements Introductions Dolores Labs – Video Directory Use Case Knewton – Adaptive Learning Use Case FreedomOSS – Enterprise Integration New York University – Worker Quality Solution Panel Questions and Answers
3
© 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meetup Howie Liu Dolores Labs
4
© 2009 Amazon.com, Inc. or its Affiliates. Dolores Labs Introduction Founded in 2008 by Lukas Biewald, Senior Scientist, Powerset (MSFT); Yahoo! Search; Stanford AI Lab –Recognized enormous potential of AMT platform Dolores Labs develops quality control technology (CrowdControl™) to make AMT more accessible and reliable
5
© 2009 Amazon.com, Inc. or its Affiliates. Case Study A large video directory needed to select relevant thumbnails for 200k+ videos
6
© 2009 Amazon.com, Inc. or its Affiliates. Why Mechanical Turk? Size of project and turnover speed made MTurk the obvious solution –Given the needs of the client, traditional outsourcing or hiring employees was not an option –However, the client was concerned about quality of results Inherent variability of Mechanical Turk workers –Unlike other Amazon marketplaces, workers are not a perfect commodity –Significant variations in quality (accuracy) –Need to ensure workers diligently completed work –Intelligently aggregate multiple responses to find the single best thumbnail for a video 6
8
© 2009 Amazon.com, Inc. or its Affiliates. 3 Step Process for Optimizing the Task Baseline Performance Create a custom interactive UI 74% result accuracy CrowdControl™ Apply statistical quality control 90% result accuracy CrowdControl™ + 2 pass Second pass for Turkers to verify results 98% result accuracy
9
© 2009 Amazon.com, Inc. or its Affiliates. High Quality on Mechanical Turk: Best Practices Statistical inference algorithms to dynamically assess quality –…Of each worker, of each result –…While the task is live –Smart allocation of worker resources Blindly increasing redundancy is expensive Aggregating all responses from workers with varying quality into a single “best” answer White paper with Stanford AI Lab about quality on AMT http://bit.ly/DLpaper
10
© 2009 Amazon.com, Inc. or its Affiliates. Other Insights Clear task instructions are crucial for good results –Garbage in, garbage out Intuitive and efficient task interface makes the task faster (read—cheaper) and more fun! Mechanical Turk is an unprecedented, hyper- efficient labor marketplace –Need to understand its dynamics through experience in order to harness its power
11
© 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meetup Dahn Tamir, Knewton Inc.
12
© 2009 Amazon.com, Inc. or its Affiliates. Knewton - Introduction Live online GMAT and LSAT prep courses customized for each student, powered by the world’s most advanced adaptive learning engine. Selected to the 2009 AlwaysOn Global 250 List. Named Category Winner in the Digital Education field.
13
© 2009 Amazon.com, Inc. or its Affiliates. How we use MTurk Quality assurance Focus Groups and Surveys Database building Marketing Calibration for computer-adaptive testing
14
© 2009 Amazon.com, Inc. or its Affiliates. Why Mturk? Cost Appropriate worker population for each task Quality Speed
15
© 2009 Amazon.com, Inc. or its Affiliates. What We Learned Use qualification tests Invest in building good HITs Hesitate to reject work (but not cheaters) Turkers are a diverse and capable population Meet Turker Nation
16
© 2009 Amazon.com, Inc. or its Affiliates. Thank you! --- Questions? dahn@knewton.com 978-KNEWTON
17
© 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meet-up (Max Yankelevich, Chief Architect– Freedom OSS)
18
© 2009 Amazon.com, Inc. or its Affiliates. Freedom OSS- Introduction Freedom OSS is a professional services organization with a focus on Practical Implementations using Cloud Computing & Open Source Technologies International Firm –US Offices: PA,NYC, GA, KC,NV, WA,NC –4 Large Solution Centers in Eastern Europe (Russia, Belarus, Ukraine and Lithuania) Practical Approach to Cloud Computing – most successfully completed Enterprise Cloud Computing projects in the Industry Key Cloud Computing Partnerships –Top Amazon AWS Enterprise System Integrator –Top Eucalyptus Enterprise Partner Key Open Source Partnerships –Top Red Hat Advanced Business Partner –#1 JBoss Advanced Business Partner in US 2008 “JBoss SOA Innovation” Award Winner 2007-08 “Practical SOA” Award Winner 2008 “Red Hat Extensive Ecosystem” Award Winner Leading technology partner for many Fortune 2000 companies Freedom is a privately held corporation
19
© 2009 Amazon.com, Inc. or its Affiliates. MTurk and Enterprise Integration Most Legacy systems are not architected to include the human intervention Providing a technological interface to maintain the workflow while inserting human intelligence and building self adjudicating business flows Leveraging Mechanical Turk programmatically in your everyday systems Freedom OSS has leveraged the power of Enterprise Service Bus (ESB) & Practical Service Oriented Architecture (SOA) to make the process of on-boarding and managing MTurk workers a rapid and cost effective process Using its Professional Open Source ESB – freeESB, Freedom has developed many powerful Connectors for some of the most used Enterprise Systems and Technologies such as SAP, Mainframe, Siebel, Java/J2EE, Oracle, IBM MQ,etc
20
© 2009 Amazon.com, Inc. or its Affiliates. Master Data Cleansing & Validation Use Case Keeping Master Customer Data File (Master Data Management) – Record de-duping – Contact information validation Traditional MDM tactics –Expensive software –Big Bang approach –Invasive Code Changes to Legacy Applications Clean and consistent customer data
21
© 2009 Amazon.com, Inc. or its Affiliates. AWS Cloud freeESB Routing, Transformation, Connectivity, QoS Business Applications Real-time Events Real-time access Legacy Applications Mainframe, Client-Server, Oracle,.NET, SAP, Siebel,etc API First Turk Task – Simple Data Checking Second Turk Task – Deeper Data Checking Third Turk Task – Data Edit/Trusted Task Master Data Business Process Orchestration & Workflow Business Rules Engine Business Rules Engine
22
© 2009 Amazon.com, Inc. or its Affiliates. Outcome Low operational costs Non-invasive data integration High-degree of accuracy due to multi-task distribution Some Best Practices when integrating MTurk within an Enterprise –Deliver value incrementally –Inversion of Control
23
© 2009 Amazon.com, Inc. or its Affiliates. Thank you! --- Questions?
24
© 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meetup (Panos Ipeirotis – New York University)
25
© 2009 Amazon.com, Inc. or its Affiliates. “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu Panos Ipeirotis - Introduction New York University, Stern School of Business
26
© 2009 Amazon.com, Inc. or its Affiliates. Example: Build an Adult Web Site Classifier Need a large number of hand-labeled sites Get people to look at sites and classify them as: G (general), PG (parental guidance), R (restricted), X (porn) Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost: $15/hr MTurk: 2500 websites/hr, cost: $12/hr Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost: $15/hr MTurk: 2500 websites/hr, cost: $12/hr
27
© 2009 Amazon.com, Inc. or its Affiliates. Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience) Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience)
28
© 2009 Amazon.com, Inc. or its Affiliates. Improve Data Quality through Repeated Labeling Get multiple, redundant labels using multiple workers Pick the correct label based on majority vote Probability of correctness increases with number of workers Probability of correctness increases with quality of workers 1 worker 70% correct 1 worker 70% correct 11 workers 93% correct 11 workers 93% correct
29
© 2009 Amazon.com, Inc. or its Affiliates. 11-vote Statistics MTurk: 227 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr 11-vote Statistics MTurk: 227 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr Single Vote Statistics MTurk: 2500 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr Single Vote Statistics MTurk: 2500 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr But Majority Voting is Expensive
30
© 2009 Amazon.com, Inc. or its Affiliates. Using redundant votes, we can infer worker quality Look at our spammer friend ATAMRO447HWJQ together with other 9 workers Our “friend” ATAMRO447HWJQ mainly marked sites as G. Obviously a spammer… We can compute error rates for each worker Error rates for ATAMRO447HWJQ P[X → X]=9.847%P[X → G]=90.153% P[G → X]=0.053%P[G → G]=99.947%
31
© 2009 Amazon.com, Inc. or its Affiliates. Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2% P[X → X]=9.847%P[X → G]=90.153% P[G → X]=0.053%P[G → G]=99.947% Action: REJECT and BLOCK Results: Over time you block all spammers Spammers learn to avoid your HITS You can decrease redundancy, as quality of workers is higher
32
© 2009 Amazon.com, Inc. or its Affiliates. After rejecting spammers, quality goes up Spam keeps quality down Without spam, workers are of higher quality Need less redundancy for same quality Same quality of results for lower cost With spam 1 worker 70% correct With spam 1 worker 70% correct With spam 11 workers 93% correct With spam 11 workers 93% correct Without spam 1 worker 80% correct Without spam 1 worker 80% correct Without spam 5 workers 94% correct Without spam 5 workers 94% correct
33
© 2009 Amazon.com, Inc. or its Affiliates. Correcting biases Classifying sites as G, PG, R, X Sometimes workers are careful but biased Classifies G → P and P → R Average error rate for ATLJIK76YH1TF: 45.0% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0% Is ATLJIK76YH1TF a spammer?
34
© 2009 Amazon.com, Inc. or its Affiliates. Correcting biases For ATLJIK76YH1TF, we simply need to compute the “non- recoverable” error-rate (technical details omitted) Non-recoverable error-rate for ATLJIK76YH1TF: 9% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0%
35
© 2009 Amazon.com, Inc. or its Affiliates. Too much theory? Open source implementation available at: http://code.google.com/p/get-another-label/ Input: –Labels from Mechanical Turk –Cost of incorrect labelings (e.g., X G costlier than G X) Output: –Corrected labels –Worker error rates –Ranking of workers according to their quality Alpha version, more improvements to come! Suggestions and collaborations welcomed!
36
© 2009 Amazon.com, Inc. or its Affiliates. Thank you! Questions? “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.