Introduction Client Motivations  Tasks Categories Crowd Motivation Pros & Cons Quality Management Scale up with Machine Learning Workflows for Complex.

Slides:



Advertisements
Similar presentations
Panos Ipeirotis Stern School of Business
Advertisements

Quality Management on Amazon Mechanical Turk Panos Ipeirotis Foster Provost Jing Wang New York University.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers New York University Stern School Victor Sheng Foster Provost Panos.
Chapter 28 Promotion and Place Name 12 SAM.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Data Mining Classification: Alternative Techniques
Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;
Ind – Develop a foundational knowledge of pricing to understand its role in marketing. (Part II) Entrepreneurship I.
Seminar in Auctions and Mechanism Design Based on J. Hartline’s book: Approximation in Economic Design Presented by: Miki Dimenshtein & Noga Levy.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Mean, Proportion, CLT Bootstrap
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University Title Page.
Fehr and Falk Wage Rigidity in a Competitive Incomplete Contract Market Economics 328 Spring 2005.
Estimating the Completion Time of Crowdsourced Tasks using Survival Analysis Jing Wang, New York University Siamak Faridani, University of California,
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Internal information 1 EPi/Policy training UK September 12, 2008.
WELCOME TO THE ANALYSIS PLATFORM V4.1. HOME The updated tool has been simplified and developed to be more intuitive and quicker to use: 3 modes for all.
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
Ensemble Learning: An Introduction
OS 352 4/22/08 I. Reminders. Read Hammonds and Combs et al. articles for Thurs. There will be a check of articles so please bring them to class. II.Exam.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Task and Workflow Design I KSE 801 Uichin Lee. TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and.
Crowdsourcing Quality Management and other stories Panos Ipeirotis New York University & Tagasauris.
Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.
Crowdsourcing using Mechanical Turk: Quality Management and Scalability Panos Ipeirotis New York University Joint work with Jing Wang, Foster Provost,
Personalized Spam Filtering for Gray Mail Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Robert McCann Microsoft Corporation.
Business and Management Research
Crowdsourcing. What is it? Image credit: Alexander Kesselaar.
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
Crowdsourcing using Mechanical Turk: Quality Management and Scalability Panos Ipeirotis New York University Joint work with Jing Wang, Foster Provost,
IMSS005 Computer Science Seminar
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Christopher Harris Informatics Program The University of Iowa Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011) Hong Kong, Feb. 9, 2011.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor Sheng, Foster Provost, Panos Ipeirotis KDD 2008 New York.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Data Structures & Algorithms and The Internet: A different way of thinking.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Spam? No, thanks! Panos Ipeirotis – New York University ProPublica, Apr 1 st 2010 (Disclaimer: No jokes included)
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Joint work with Foster Provost & Panos Ipeirotis New York University.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.
Lecture 7 Course Summary The tools of strategy provide guiding principles that that should help determine the extent and nature of your professional interactions.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
1 Nonparametric Statistical Techniques Chapter 17.
Overview: Electronic Commerce Electronic Commerce, Seventh Annual Edition.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Design Reuse Earlier we have covered the re-usable Architectural Styles as design patterns for High-Level Design. At mid-level and low-level, design patterns.
Classification Ensemble Methods 1
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Introduction to Blackboard Rabie A. Ramadan Session 3.
Electronic Business: Concept and Applications Department of Electrical Engineering Gadjah Mada University.
Introduction Before the internet became an integral part of our lives, advertising a business was done mainly on outdoor billboards, posters, tv ads and.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
Crowdsourcing: How to Benefit from (Too) Many Great Ideas (Blohm et al., 2013) Olga Jemeljanova Joona Kanerva Niko Kuki Mikko Nummela Group
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Deep Learning Amin Sobhani.
Modeling Annotator Accuracies for Supervised Learning
Welcome! Knowledge Discovery and Data Mining
Place - Channels of Distribution/ International Marketing
Presentation transcript:

Introduction Client Motivations  Tasks Categories Crowd Motivation Pros & Cons Quality Management Scale up with Machine Learning Workflows for Complex tasks Market evolution  Reputation Systems ECCO, March 20,2011

Introduction June 2006: Jeff Howe created the term for his article in the Wired magazine "The Rise of Crowdsourcing".Wired Elements: At least 2 actors: - Client/Requester - Crowd or community (an online audience) A Challenge: - What has to be done? Need, task, etc. - Reward: money, prize, other motivators.

Ex: “Adult Websites” Classification Large number of sites to label Get people to look at sites and classify them as: –G(general audience) –PG (parental guidance) –R (restricted) –X (porn) [Panos Ipeirotis. WWW2011 tutorial]

Ex: “Adult Websites” Classification Large number of hand ‐ labeled sites Get people to look at sites and classify them as: –G(general audience) –PG (parental guidance) –R (restricted) –X (porn) Cost/Speed Statistics: Undergrad intern: 200 websites/hr, cost: $15/hr MTurk: 2500 websites/hr, cost: $12/hr [Panos Ipeirotis. WWW2011 tutorial]

Client motivation Need Suppliers:  Mass work, Distributed work, or just tedious work  Creative work  Look for specific talent  Testing  Support  To offload peak demands  Tackle problems that need specific communities or human variety  Any work that can be done cheaper this way.

Client motivation Need customers! Need Funding Need to be Backed up Crowdsourcing is your business!

Crowd Motivation Money €€€ Self-serving purpose (learning new skills, get recognition, avoid boredom, enjoyment, create a network with other profesionals) Socializing, feeling of belonging to a community, friendship Altruism (public good, help others)

Crowd Demography (background defines motivation) The 2008 survey at iStockphoto indicates that the crowd is quite homogenous and elite. Amazon’s Mechanical Turk workers come mainly from 2 countries: a) USA b) India

Crowd Demography

Client Tasks Parameters 3 main goals for a task to be done: 1.Minimize Cost (cheap) 2.Minimize Completion Time (fast) 3.Maximize Quality (good) Client has other goals when the crowd is not just a supplier

Pros Quicker: Parallellism reduces time Cheap, even free Creativity, Innovation Quality (depends) Availability of scarce ressources: Taps on the ‘long tail’ Multiple feedback Allows to create a community (followers) Business Agility Scales up!

Cons Lack of professionalism: Unverified quality Too many answers No standards No organisation of answers Not always cheap: Added costs to bring a project to conclusion Too few participants if task or pay is not attractive If worker is not motivated, lower quality of work

Cons Global language barriers. Different laws in each country: adds complexity No written contracts, so no possibility of non-disclosure agreements. Hard to maintain a long term working relationship with workers. Difficulty managing a large-scale, crowdsourced project. Can be targeted by malicious work efforts. Lack of guaranteed investment, thus hard to convince stakeholders.

Quality Management Ex: “Adult Website” Classification Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience) [Panos Ipeirotis. WWW2011 tutorial]

Quality Management Majority Voting and Label Quality Ask multiple labelers, keep majority label as “true” label Quality is probability of being correct

Dealing with Quality Majority vote works best when workers have similar quality Otherwise better to just pick the vote of the best worker Or model worker qualities and combine  Vote combination studies [Clemen and Winkler, 1999, Ariely et al. 2000] show that complex models work slightly better than simple average, but are less robust. Spammers try to go undetected Good willing workers may have bias  difficult to set apart.

Human Computation Biases Anchoring Effect: “Humans start with a first approximation (anchor) and then make adjustments to that number based on additional information.” [Tversky & Kahneman, 1974] Priming: Exposure to one stimulus (as stereotypes) influences another [Shih et al., 1999] Exposure Effect: Familiarity leads to liking... [Stone and Alonso, 2010] Framing Effect: Presenting the same option in different formats leads to different answers. [Tversky and Kahneman, 1981]  Need to remove sequential effects from human computation data…

Dealing with Quality Use this process to improve quality: 1.Initialize by aggregating labels (using majority vote) 2. Estimate error rates for workers (use aggregated labels) 3. Change aggregate labels (using error rates, weight worker votes according to quality) Note: Keep labels for “example data” unchanged 4. Iterate from Step 2 until convergence Or Use exploration ‐ exploitation scheme: – Explore: Learn about the quality of the workers – Exploit: Label new examples using the quality  In both cases, significant advantage on bad conditions like imbalanced datasets and bad workers

Effect of Payment: Quality Cost does not affect quality [Mason and Watts, 2009, AdSafe] Similar results for bigger tasks [Ariely et al, 2009] [Panos Ipeirotis. WWW2011 tutorial]

Effect of payment in #tasks Payment incentives increase speed, though [Panos Ipeirotis. WWW2011 tutorial]

Optimizing Quality Quality tends to remain the same, independent of completion time [Huang et al., HCOMP 2010]

Scale Up with Machine Learning Build an ‘Adult Website’ Classifier Crowdsourcing is cheap but not free – Cannot scale to web without help  Build automatic classification models using examples from crowdsourced data

Integration with Machine Learning Humans label training data Use training data to build model

Dealing w/Quality in Machine Learning Noisy labels lead to degraded task performance Labeling quality increases  Classification quality increases

Tradeoffs for Machine Learning Models Get more data  Improve model accuracy Improve data quality  Improve classification

Tradeoffs for Machine Learning Models Get more data: Active Learning, select which unlabeled example to label [Settles, Improve data quality: Repeated Labeling, label again an already labeled example [Sheng et al. 2008, Ipeirotis et al, 2010]

Model Uncertainty (MU) Model uncertainty: get more labels for instances that cause model uncertainty – for modeling: why improve training data quality if model already is certain there? (“Self ‐ healing” process :[Brodley et al, JAIR 1999], [Ipeirotis et al NYU 2010] ) – for data quality, low ‐ certainty “regions” may be due to incorrect labeling of corresponding instances

Quality Rule of Thumb With high quality labelers (80% and above):  One worker per case (more data better) With low quality labelers (~60%)  Multiple workers per case (to improve quality) [Sheng et al, KDD 2008; Kumar and Lease, CSDM 2011]

Complex tasks: Handle answers through workflow Q: “My task does not have discrete answers….” A: Break into two Human Intelligence Tasks (HITs): – “Create” HIT – “Vote” HIT Vote controls quality of Creation HIT Redundancy controls quality of Voting HIT Catch: If “creation” very good, voting workers just vote “yes” – Solution: Add some random noise (e.g. add typos)

Photo description But the free-form answer can be more complex, not just right or wrong… TurkIt toolkit [Little et al., UIST 2010]:

Description Versions 1.A partial view of a pocket calculator together with some coins and a pen A close ‐ up photograph of the following items: A CASIO multi ‐ function calculator. A ball point pen, uncapped. Various coins, apparently European, both copper and gold. Seems to be a theme illustration for a brochure or document cover treating finance, probably personal finance. 4.… 8.A close ‐ up photograph of the following items: A CASIO multi ‐ function, solar powered scientific calculator. A blue ball point pen with a blue rubber grip and the tip extended. Six British coins; two of £1value, three of 20p value and one of 1p value. Seems to be a theme illustration for a brochure or document cover treating finance ‐ probably personal finance.

Collective Problem Solving Exploration / exploitation tradeoff (Independence/or not) – Can accelerate learning, by sharing good solutions – But can lead to premature convergence on suboptimal solution [Mason and Watts, submitted to Science, 2011]

Independence or Not? Building iteratively (lack of independent) allows better outcomes for image description task…In the FoldIt game, workers built on each other’s results But lack of independence may cause high dependence on starting conditions and create Groupthink [Little et al, HCOMP 2010]

Exploration/Exploitation? With high quality labelers (80% and above):

Exploration/Exploitation?

Group Effect Individual search strategies affect group success: Players copying each other make less exploring  lower probability of finding peak on a round

Workflow Patterns Generate / Create Find Improve / Edit / Fix  Creation Vote for accept ‐ reject Vote up, vote down, to generate rank Vote for best / select top ‐ k  Quality Control Split task Aggregate Flow Control Iterate  Flow Control

AdSafe Crowdsourcing Experience

Detect pages that discuss swine flu – Pharmaceutical firm had drug “treating” (off-label) swine flu – FDA prohibited pharmaceuticals to display drug ad in pages about swine flu  Two days to comply! Big fast-food chain does not want ad to appear: – In pages that discuss the brand (99% negative sentiment) – In pages discussing obesity

Adsafe Crowdsourcing Experience Workflow to classify URLs Find URLs for a given topic (hate speech, gambling, alcohol abuse, guns, bombs, celebrity gossip, etc etc) ‐ collector.appspot.com/allTopics.jsp Classify URLs into appropriate categories ‐ annotator.appspot.com/AdminFiles/Categories.jsp Mesure quality of the labelers and remove spammers Get humans to “beat” the classifier by providing cases where the classifier fails ‐ beatthemachine.appspot.com/

Market Design of Crowdsourcing Aggregators: Create a crowd or community. Create a portal to connect a client to the crowd Deal with workflow of complex tasks, like decomposition in simpler tasks and answer recomposition  Allow anonymity  Consumers can benefit from a crowd without the need to create it.

Market Design: Crude vs Intelligent Crowdsourcing Intelligent Crowdsourcing uses an organized workflow to tackle CONS of crude crowdsourcing.  Complex task is divided by experts,  Given to relevant crowds, and not to everyone  Individual answers are recomposed by experts into general answer  Usually covert

Lack of Reputation and Market for Lemons “When quality of sold good is uncertain and hidden before transaction, prize goes to value of lowest valued good” [Akerlof, 1970; Nobel prize winner] Market evolution steps: 1. Employers pays $10 to good worker, $0.1 to bad worker 2. 50% good workers, 50% bad; indistinguishable from each other 3. Employer offers price in the middle: $5 4. Some good workers leave the market (pay too low) 5. Employer revised prices downwards as % of bad increased 6. More good workers leave the market… death spiral

Reputation systems Great number of reputation mechanisms Challenges in the Design of Reputation Systems - Insufficient participation - Overwhelmingly positive feedback - Dishonest reports - Identity changes - Value imbalance exploitation (“milking the reputation”)

Reputation systems [Panos Ipeirotis. WWW2011 tutorial]

Reputation systems Dishonest Reports 1. Ebay “Riddle for a PENNY! No shipping ‐ Positive Feedback”. Sets agreement in order to be given unfairly high ratings by them. 2 “Bad ‐ mouthing”: Same situation but to “bad ‐ mouth” other sellers that they want to drive out the market. Design incentive ‐ compatible mechanism to elicit honest feedbacks [Jurca and Faltings 2003: pay rater if report matches next; Miller et al. 2005: use a proper score rule to value report; Papaioannou and Stamoulis 2005: delay next transaction over time] [Panos Ipeirotis. WWW2011 tutorial]

Reputation systems Identity changes “Cheap pseudonyms”: easy to disappear and reregister under a new identity with almost zero cost. [Friedman and Resnick 2001] Introduce opportunities to misbehave without paying reputational consequences.  Increase the difficulty of online identity changes Impose upfront costs to new entrants: allow new identities (forget the past) but make it costly to create them

Challenges for Crowdsourcing Markets Two ‐ sided opportunistic behavior 1. In e ‐ commerce markets, only sellers are likely to behave opportunistically. 2. In crowdsourcing markets, both sides can be fraudulent. Imperfect monitoring and heavy ‐ tailed participation verifying the answers is sometimes as costly as providing them. - Sampling often does not work, due to heavy tailed participation distribution (lognormal, according to self ‐ reported surveys) [Panos Ipeirotis. WWW2011 tutorial]

Challenges for the Crowdsourcing Market Constrained capacity of workers Workers have constrained capacity (cannot do more than xxhours per day)  Machine Learning Techniques No “price premium” for high ‐ quality workers It is the requester who set the prices, which are generally the same for all the workers, regardless of their reputation or quality.

Market is Organizing the Crowd Reputation Mechanisms – Crowd: Ensure worker quality – Employer: Ensure employer trustworthiness Task organization for task discovery (worker finds employer/task) Worker expertise recording for task assignment (employer/task finds worker)

Crowdsourcing Market Possible Evolutions Optimize allocation of tasks to worker based on completion time and expected quality Recommender system for crowds (“workers like you performed well in…”) Create a market with dynamic pricing for tasks, following the pricing model of the stock market (prices increase for task when work supply low, and vice versa) [P. Ipeirotis, 2011]

References Wikipedia,2011 Dion Hinchcliffe Crowdsourcing: 5 Reasons Its Not Just For Start Ups Anymore,2009Dion HinchcliffeCrowdsourcing: 5 Reasons Its Not Just For Start Ups Anymore Tomoko A. Hosaka, MSNBC. "Facebook asks users to translate for free“,2008."Facebook asks users to translate for free“ Daren C. Brabham. "Moving the Crowd at iStockphoto: The Composition of the Crowd and Motivations for Participation in a Crowdsourcing Application", First Monday, 13(6),2008."Moving the Crowd at iStockphoto: The Composition of the Crowd and Motivations for Participation in a Crowdsourcing Application", Karim R. Lakhani, Lars Bo Jeppesen, Peter A. Lohse & Jill A. Panetta. The value of openness in scientific problem solving (Harvard Business School Working Paper No ),2007.The value of openness in scientific problem solving Klaus-Peter Speidel How to Do Intelligent Crowdsourcing,2011Klaus-Peter SpeidelHow to Do Intelligent Crowdsourcing Panos Ipeirotis. Managing Crowdsourced Human Computation, WWW2011 tutorial,2011 Omar Alonso & Matthew Lease. Crowdsourcing 101: Putting the WSDM of Crowds to Work for You, WSDM Hong Kong Sanjoy Dasgupta,