Crowdsourcing High Quality Labels with a Tight Budget Qi Li 1, Fenglong Ma 1, Jing Gao 1, Lu Su 1, Christopher J. Quinn 2 1 SUNY Buffalo; 2 Purdue University.

Slides:



Advertisements
Similar presentations
Donald T. Simeon Caribbean Health Research Council
Advertisements

Incentivize Crowd Labeling under Budget Constraint
OECD/INFE High-level Principles for the evaluation of financial education programmes Adele Atkinson, PhD OECD With the support of the Russian/World Bank/OECD.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Michael Gove’s mission to de-regulate your pay and conditions Spring 2013.
Demonstrating the value of Research during a restructure Natalie Westfall - NSPCC.
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Yi Wang, Wenjie Hu, Yibo Wu and Guohong Cao
Evaluation of representations in AI problem solving Eugene Fink.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Online Algorithms Motivation and Definitions Paging Problem Competitive Analysis Online Load Balancing.
Active Learning with Support Vector Machines
Abstract Extracting a matte by previous approaches require the input image to be pre-segmented into three regions (trimap). This pre-segmentation based.
Chapter 28 Design of Experiments (DOE). Objectives Define basic design of experiments (DOE) terminology. Apply DOE principles. Plan, organize, and evaluate.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
CrowdFlow Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility Alex Quinn, Ben Bederson, Tom Yeh, Jimmy Lin.
On Exploiting Dynamic Execution Patterns for Workload Offloading in Mobile Cloud Applications Wei Gao, Yong Li, and Haoyang Lu The University of Tennessee,
1 TRAINING SESSION ACCEPTANCE SAMPLING CONFIDENCE INTERVALS.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Scheduling of Flexible Resources in Professional Service Firms Arun Singh CS 537- Dr. G.S. Young Dept. of Computer Science Cal Poly Pomona.
Slide content created by Joseph B. Mosca, Monmouth University. Copyright © Houghton Mifflin Company. All rights reserved. 16 Managing Employee Motivation.
T IFFANY & C O. Transportation Analysis Presented by: Ping-Chun Chang Hsi-Chuan Chen Satish Mallya Chris Offensend Antonio Rodriguez.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Fenglong Ma1, Yaliang Li1, Qi Li1, Minghui Qiu2,
Load Balancing Tasks with Overlapping Requirements Milan Vojnovic Microsoft Research Joint work with Dan Alistarh, Christos Gkantsidis, Jennifer Iglesias,
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July
Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.
SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments Author Linlin Wu, Saurabh Kumar Garg and Rajkumar.
An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing Weijie Shi*, Linquan Zhang +, Chuan Wu*, Zongpeng Li +, Francis C.M. Lau*
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
A Confidence-Aware Approach for Truth Discovery on Long-Tail Data
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Warm-up: 1)Get the “Propaganda Matching” sheet. Glue it below the Warm-up essays from last class and complete the matching. 2)Have out your homework on.
The Challenges of Service Contracting Vernon J. Edwards Presented to the Cape Canaveral Chapter of the NCMA December 13, 2006.
Introduction A GENERAL MODEL OF SYSTEM OPTIMIZATION.
Welcome! Happy New Year!!! This is a time of new beginnings with so many exciting things to do and learn. So Welcome to Economics class! I am looking.
Audit Authority and Auditors Group of “European Territorial Cooperation” Programmes for Lessons to be learnt from the INTERREG IIIB NWE Auditors.
ENERGY-EFFICIENT FORWARDING STRATEGIES FOR GEOGRAPHIC ROUTING in LOSSY WIRELESS SENSOR NETWORKS Presented by Prasad D. Karnik.
Incentive-Oriented Downlink Scheduling for Wireless Networks with Real-Time and Non-Real-Time Flows I-Hong Hou, Jing Zhu, and Rath Vannithamby.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
《 Hierarchical Caching Management for Software Defined Content Network based on Node Value 》 Reporter : Jing Liu , China Affiliation : University of Science.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Implementation and follow up Critically important but relatively neglected stages of EIA process Surveillance, monitoring, auditing, evaluation and other.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
Scientific Method. Scientific Method continued Scientific Method= allows scientists to draw logical and reliable conclusions about phenomena. Observations=
Equity Theory HRMOB 570. Basic Tenets Comparisons between self and others People balance what they put into a job with what they get out of it Am I receiving.
Managing Employee Motivation and Performance
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Staff Training Day 4 th January All policies reviewed every year Each policy allocated to a governor committee and with an SLT lead to review Changes.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
MEASURING AND USING NUMBERS You are measuring and using numbers when you are….
Recall: Consumer behavior Why are we interested? –New good in the market. What price should be charged? How much more for a premium brand? –Subsidy program:
The Accuracy of Raster Data Tree Height Study in Prince George, Virginia Aerial Images Inaccurate Raster Data Created Using the LAS Dataset to Raster Tool.
Djohan Wahyudi Supervised by: Prof. Dr. Pericles A. Mitkas Vivia Nikolaidou 1.
Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.
Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.
Essentials of Planning © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation Qi Li 1, Yaliang Li 1, Jing Gao 1, Bo Zhao 2, Wei Fan 3,
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Urban Sensing Based on Human Mobility
Xi Chen Mentor: Denny Zhou In collaboration with: Qihang Lin
Qi Han, Matthew Ba Nguyen Sandy Irani, Nalini Venkatasubramanian
Optimal Electricity Supply Bidding by Markov Decision Process
Web Science Course Crowdsourcing Report based on paper:
Actively Learning Ontology Matching via User Interaction
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Adaptive Dynamic Bipartite Graph Matching: A Reinforcement Learning Approach Yansheng Wang 1, Yongxin Tong 1, Cheng Long 2, Pan Xu 3, Ke Xu 1, Weifeng.
Presentation transcript:

Crowdsourcing High Quality Labels with a Tight Budget Qi Li 1, Fenglong Ma 1, Jing Gao 1, Lu Su 1, Christopher J. Quinn 2 1 SUNY Buffalo; 2 Purdue University 1

What is Crowdsourcing? Terminology Requester Worker HITs Instance Basic procedure Requester posts HITs Worker chooses HITs to work on Requester gets labels and pay 2 Same …… requester Are the two images of the same person? Same … different

Budget Allocation Since crowdsourcing costs money, we need to use the budget wisely. 3

Budget Allocation Since crowdsourcing costs money, we need to use the budget wisely. Budget allocation: Which instance should we query for labels and how many? Which worker should we choose? Impossible on most current crowdsourcing platforms. 4

Challenges Under a Tight Budget Quantity and Quality Trade-off Different Requirements of Quality 5 Q1 Q2 Q3 or Q1 Q2 Q3 I want my results are not randomly guessed. I will approve a result if more than 75% of the workers agree on that label. Existing work would behave.

Inputs and Goal Inputs Requester's requirement The budget T: the maximum amount of labels can be afforded Goal Label as many instances as possible which achieve the requirement under the budget 6

Problem Settings 7

Notations Definition Maximum number of labels given the budget 8

Examples of Requirement 9

Completeness Ratio between the observed total vote counts and the minimum count of labels it needs to achieve the requirement. 10

Completeness 11 Observed total vote counts Minimum count to achieve the requirement

Completeness 12

Maximize Completeness The goal is to label instances as many as possible that achieve the requirement of quality. 13

Maximize Completeness The goal is to label instances as many as possible that achieve the requirement of quality. Maximize the overall completeness Formally: 14

Maximize Completeness 15

Expected Completeness 16

Expected Completeness 17

Markov Decision Process 18

Requallo Framework 19 Q1 Requirement: Minimum Ratio of 3 Q2 Q3

Requallo Framework 20 Q1 Requirement: Minimum Ratio of 3 Completeness 100% Q2 Completeness Q3 Completeness 72% 50%

Requallo Framework 21 Q1 Requirement: Minimum Ratio of 3 Completeness 100% Q2 Completeness Q3 Completeness Reward 72% 50%

Requallo Framework 22 Q1 Requirement: Minimum Ratio of 3 Completeness Selected Unselected 100% Q2 Completeness Q3 Completeness Reward 72% 50%

Extension: Workers’ Reliability 23

Experiments on Real-World Crowdsourcing Tasks Dataset RTE dataset: conducted on mTurk for recognizing textual entailment Game Dataset: conducted using an Android app based on a TV game show “Who Wants to Be a Millionaire“ Performance Measures Quantity Quality 24

Experiments on Real-World Crowdsourcing Tasks RTE DatasetGame Dataset 25 Quantity

Experiments on Real-World Crowdsourcing Tasks RTE DatasetGame Dataset 26 Absolute count

Experiments on real-world crowdsourcing tasks RTE DatasetGame Dataset 27 Accuracy rate

Comparison of Different Requallo Policies (on Game dataset) MethodCost#Instances#CorrectAccuracy Requallo-p Requallo-p Requallo-p Requallo-c Requallo-c Requallo-m This result confirms our intuition. If a requester wants high quality results, he can set a strict requirement, but should expect a lower quantity of labeled instances or a higher cost.

Conclusions In this paper, we study how to allocate a tight budget for crowdsourcing tasks The requesters can specify their needs on label quality The goal is to maximize quantity under the budget while guarantee the quality The proposed Requallo framework uses greedy strategy to sequentially label instances. Extension to incorporate workers’ reliabilities. 29

30