Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Slides:



Advertisements
Similar presentations
Panos Ipeirotis Stern School of Business
Advertisements

Incentivize Crowd Labeling under Budget Constraint
CrowdER - Crowdsourcing Entity Resolution
18 and 24-month-olds use syntactic knowledge of functional categories for determining meaning and reference Yarden Kedar Marianella Casasola Barbara Lust.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Increasing computer science popularity and gender diversity through the use of games and contextualized learning By Mikha Zeffertt Supervised by Mici Halse.
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University Title Page.
Applying Crowd Sourcing and Workflow in Social Conflict Detection By: Reshmi De, Bhargabi Chakrabarti 28/03/13.
Slide 1-1 Copyright © 2004 Pearson Education, Inc. Stats Starts Here Statistics gets a bad rap, and Statistics courses are not necessarily chosen as fun.
Chapter 1 - An Introduction to Computers and Problem Solving
Estimating the Completion Time of Crowdsourced Tasks using Survival Analysis Jing Wang, New York University Siamak Faridani, University of California,
Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE.
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,
Amazon Mechanical Turk (Mturk) What is MTurk? – Crowdsourcing Internet marketplace that utilizes human intelligence to perform tasks that computers are.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
Crowdsourcing research data UMBC ebiquity,
CrowdFlow Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility Alex Quinn, Ben Bederson, Tom Yeh, Jimmy Lin.
Chapter 8: Introduction to High-level Language Programming Invitation to Computer Science, C++ Version, Third Edition.
 Introduction (Scary details)  Part I: Introduction to Stock Market Challenge (Brett) 4:30 to 5:15  Part II: What is Financial Literacy (Bill) 5:15.
Task and Workflow Design I KSE 801 Uichin Lee. TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and.
Chapter 2 RISK AND RETURN BASICS. Chapter 2 Questions What are the sources of investment returns? How can returns be measured? How can we compute returns.
EQNet Travel Well Criteria.
Cis-Regulatory/ Text Mining Interface Discussion.
Crowdsourcing Quality Management and other stories Panos Ipeirotis New York University & Tagasauris.
Basics of Good Documentation Document Control Systems
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Panos Ipeirotis Stern School of Business New York University Joint.
Making a difference? Measuring the impact of an information literacy programme Ann Craig
Logging the Search Self-Efficacy of Amazon Mechanical Turkers Henry Feild* (UMass) Rosie Jones* (Akamai) Robert Miller (MIT) Rajeev Nayak (MIT) Elizabeth.
SCIENCE FAIR HINTS AND TIPS Charlotte Rodeen-Dickert St. Jerome School October 29,
Christopher Harris Informatics Program The University of Iowa Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011) Hong Kong, Feb. 9, 2011.
1 BA 275 Quantitative Business Methods Housekeeping Introduction to Statistics Elements of Statistical Analysis Concept of Statistical Analysis Statgraphics.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor Sheng, Foster Provost, Panos Ipeirotis KDD 2008 New York.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
Systematization of Crowdsoucing for Data Annotation Aobo, Feb
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Joint work with Foster Provost & Panos Ipeirotis New York University.
Heuristic evaluation Functionality: Visual Design: Efficiency:
Scientific Research in Biotechnology 5.03 – Demonstrate the use of the scientific method in the planning and development of an experimental SAE.
Crowdsourcing: Ethics, Collaboration, Creativity KSE 801 Uichin Lee.
Scientific Method. Steps to Solving a Problem (The Scientific Method) 1.Identify the Problem State the problem to be solved or the question to be answered.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Institute for Financial Literacy © Three Elements to a Successful Financial Literacy Education Program Leslie E. Linfield, Esq. October 29, 2008.
1 William P. Cunningham University of Minnesota Mary Ann Cunningham Vassar College Copyright © The McGraw-Hill Companies, Inc. Permission required for.
Systems Analysis and Design in a Changing World, Fourth Edition
Individual Differences in Human-Computer Interaction HMI Yun Hwan Kang.
Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
A Roadmap towards Machine Intelligence
Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
CrowdForge: Crowdsourcing Complex Work Aniket Kittur, Boris Smus, Robert E. Kraut February 1, 2011 Presenter: Karalis Alexandros 1.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
Algorithms and Pseudocode
Crowdsourcing Blog Track Top News Judgments at TREC Richard McCreadie, Craig Macdonald, Iadh Ounis {richardm, craigm, 1.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
FYP 446 /4 Final Year Project 2 Dr. Khairul Farihan Kasim FYP Coordinator Bioprocess Engineering Program Universiti Malaysia Perls.
Crowdsourcing High Quality Labels with a Tight Budget Qi Li 1, Fenglong Ma 1, Jing Gao 1, Lu Su 1, Christopher J. Quinn 2 1 SUNY Buffalo; 2 Purdue University.
CS 2750: Machine Learning Active Learning and Crowdsourcing
Designing a Human-Machine Hybrid Computing System for Unstructured Data Analytics KOUSHIK SINHA, GEETHA MAJUNATH, BIDYUT GUPTA, SHAHRAM RAHIMI.
Claire M. Renzetti, Ph.D. Judi Conway Patton Endowed Chair, CRVAW
Building & Applying Emotion Recognition
CrowdDB : Answering queries with Crowdsourcing
Learning a Policy for Opportunistic Active Learning
Presentation transcript:

Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Some Examples   Call for professional helps  Award 50,000 to 1,000,000 for each tasks   Office work platform   Microtask platform  Over 30,000 tasks at the same time 

What Tasks are crowdsourceable?

Software Development  Reward: 25,000 USD

Data Entry  Reward: 4.4 USD/hour

Image Tagging  Reward: 0.04 USD

Trip Advice  Reward: points on Yahoo! Answers

The impact of crowdsourcing on scientific research?

Amazon Mechanical Turk  A micro-task marketplace  Task prices are usually between 0.01 to 1 USD  Easy-to-use interface

Amazon Mechanical Turk  Human Intelligence Task (HIT)  Tasks hard for computers  Developer  Prepay the money  Publish HITs  Get results  Worker  Complete the HITs  Get paid

Who are the workers?

A Survey of Mechanical Turk  Survey on 1000 Turkers (Turk workers)  Two identical surveys (Oct and Dec. 2008)  Consistent results  Blog post:  A Computer Scientist in a Business School A Computer Scientist in a Business School

Education Age Gender Annual Income

Compare with Internet Demographics  Use the data from ComScore  In summary, Tukers are  younger  Portion of years old: 51% vs. 22% in internet  mainly female  70% female vs. 50 % female  having lower income  65% turkers with income < 60k/year vs. 45% in internet  having smaller family  55% turkers have no children vs. 40% in internet

How Much Turkers Earn?

Why Turkers Turk?

Research Applications

Dataset Collection  Dataset is important in computer science!  In multimedia analysis  Is there X in the image  Where is Y in the image  In natural language processing  What is the emotion of this sentence  And in lots of other applications

Dataset Collection  Utility Annotation  By Sorokin and Forsyth at UIUC  Image analysis  Type keyword  Select examples  Click on landmarks  Outline figures

0.01 USD/ task

0.02 USD/ task

0.01 USD/ task

Dataset Collection  Linguistic annotations (Snow et al. 2008)  Word similarity USD 0.2 to label 30 word pairs

Dataset Collection  Linguistic annotations (Snow et al. 2008)  Affect recognition USD 0.4 to label 20 headlines (140 labels)

Dataset Collection  Linguistic annotations (Snow et al. 2008)  Textual entailment  If “Microsoft was established in Italy in 1985”, then “Microsoft was established in 1985” ?  Word sense disambiguation  “a bass on the line” vs. “a funky bass line”  Temporal annotation  Ran happens before fell:  “The horse ran past the barn fekk”

Dataset Collection  Document relevance evaluation  Alonso et al. (2008)  User rating collection  Kittur et al. (2008)  Noun compound paraphrasing  Nakov (2008)  Name resoluation  Su et al. (2007)

Data Characteristic Cost? Efficiency? Quality?

Cost and Efficiency  In image annotation  Sorokin and Forsyth, 2008

Cost and Efficiency  In linguistic annotation  Snow et. al, 2008

Cheap and fast! Is it good?

Quality  Multiple non-experts can beat experts  三個臭皮匠勝過一個諸葛亮  Black line  agreement among turkers  Green line:  single expert  Golden result:  agreement among multiple experts

In addition to Dataset Collection

QoE Measurement  QoE (Quality of Experience)  Subjective measure of user perception  Traditional approach  User studies by MOS ratings (Bad -> Excellent)  Crowdsourcing with paired comparison  Diverse user input  Easy to understand  Interval scale scores can be calculated

Acoustic QoE Evaluation

 Which one is better?  Simple pair comparison

Optical QoE evaluation

Interactive QoE Evaluation

Acoustic QoE  MP3 Compression Rate  VoIP Loss Rate

Optical QoE  Video Codec  Packet loss rate

Iterative Task

Iterative Tasks  Turkit: tools for iterative tasks on Mturk  Imperative programming paradigm  Basic elements  Variable (a = b)  Control (if else statement)  Loop (for, while statement)  Turning MTurk into a programming platform which integrates human brain powers

Iterative Text Improvement  A Wikipedia-like scenario  One Turker improve the text  Other Turkers vote if the improvement is valid

Iterative Text Improvement  Image description  Instructions for the improve-HIT  Please improve the description for this image  People will vote whether to approve your changes  Use no more than 500 characters  Instructions for the vote-HIT  Please select the better description for this image  Your vote must agree with the majority to be approved

Iterative Text Improvement  Image description  A partial view of a pocket calculator together with some coins and a pen.  A view of personal items a calculator, and some gold and copper coins, and a round tip pen, these are all pocket and wallet sized item used for business, writing, calculating prices or solving math problems and purchasing items.  A close-up photograph of the following items: * A CASIO multi-function calculator * A ball point pen, uncapped * Various coins, apparently European, both copper and gold  …Various British coins; two of £1 value, three of 20p value and one of 1p value. …

Iterative Text Improvement  Image description A close-up photograph of the following items: A CASIO multi-function, solar powered scientific calculator. A blue ball point pen with a blue rubber grip and the tip extended. Six British coins; two of £1 value, three of 20p value and one of 1p value. Seems to be a theme illustration for a brochure or document cover treating finance - probably personal finance.

Iterative Text Improvement  Handwriting Recognition  Version 1  You (?) (?) (?) (work). (?) (?) (?) work (not) (time). I (?) (?) a few grammatical mistakes. Overall your writing style is a bit too (phoney). You do (?) have good (points), but they got lost amidst the (writing). (signature)

Iterative Text Improvement  Handwriting Recognition  Version 6  “You (misspelled) (several) (words). Please spell-check your work next time. I also notice a few grammatical mistakes. Overall your writing style is a bit too phoney. You do make some good (points), but they got lost amidst the (writing). (signature)”

Cost and Efficiency

More on Methodology

Repeated Labeling  Crowdsourcing -> Multiple imperfect labeler  Each worker is a labeler  Labels are not always correct  Repeated labeling  Improve the supervised induction  Increase the single-label accuracy  Decrease the cost for acquiring training data

Repeated Labeling  Repeated labeling helps improve the overall quality when the accuracy of single labeler low.

Selected Repeated Labeling  Repeat-label the most uncertain points  Label uncertainty (LU)  Whether the label distribution is stable  Calculated from beta distribution  Model uncertainty (MU)  Whether the model has high confidence for the label  Calculated from model predictions

Selected Repeated Labeling  Selected repeated labeling improves the overall quality of crowdsourcing approach. GRR: no selected repeated labeling MU: Model Uncertainty LU: Label Uncertainty LMU: integrate Label and Model Uncertainty

Incentive vs. Performance  High financial incentive -> high performance?  User studies (Mason and Watt 2009)  Order images  Ex: choose the busiest image  Solve word puzzles

Incentive vs. Performance  High incentive -> high quantity, not high quality

Incentive vs. Performance  Workers always wants more How much workers think they deserve Users would be influenced by their paid amount Pay little at first, and incrementally increase the payment

Conclusion  Crowdsourcing provides a new paradigm and a new platform for computer science researches.  New applications, new methodologies, and new businesses are quickly developing with the aid of crowdsouring.