Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.

Slides:



Advertisements
Similar presentations
Panos Ipeirotis Stern School of Business
Advertisements

Quality Management on Amazon Mechanical Turk Panos Ipeirotis Foster Provost Jing Wang New York University.
1 Modeling Query-Based Access to Text Databases Eugene Agichtein Panagiotis Ipeirotis Luis Gravano Computer Science Department Columbia University.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers New York University Stern School Victor Sheng Foster Provost Panos.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Estimating the Completion Time of Crowdsourced Tasks using Survival Analysis Jing Wang, New York University Siamak Faridani, University of California,
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 9 – SEARCH AND ADVERTISING SEAN J. TAYLOR.
Anindya Ghose Panos Ipeirotis Arun Sundararajan Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation.
Evaluating Search Engine
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Machine Learning Case study. What is ML ?  The goal of machine learning is to build computer systems that can adapt and learn from their experience.”
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Introduction to machine learning
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Information Retrieval in Practice
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Crowdsourcing using Mechanical Turk: Quality Management and Scalability Panos Ipeirotis New York University Joint work with Jing Wang, Foster Provost,
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Panos Ipeirotis Stern School of Business New York University Joint.
Panos Ipeirotis Stern School of Business New York University Analyzing User-Generated Content using Econometrics.
Panos Ipeirotis Stern School of Business New York University Structuring and querying online opinions using econometrics.
Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd sourced Content Author - Anindya Ghose, Panagiotis G.
1 Modeling Query-Based Access to Text Databases Eugene Agichtein Panagiotis Ipeirotis Luis Gravano Computer Science Department Columbia University.
Panos Ipeirotis Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation Systems Join work with Anindya.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Search Engines and Information Retrieval Chapter 1.
Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.
Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers Panos Ipeirotis New York University Joint work with Jing.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Panos Ipeirotis Stern School of Business New York University Opinion Mining Using Econometrics.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor Sheng, Foster Provost, Panos Ipeirotis KDD 2008 New York.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Panos Ipeirotis Stern School of Business New York University Joint.
To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks Panos Ipeirotis – New York University Eugene Agichtein – Microsoft Research Pranay.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Joint work with Foster Provost & Panos Ipeirotis New York University.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
Panos Ipeirotis Stern School of Business New York University Text Mining of Electronic News Content for Economic Research “On the Record”: A Forum on Electronic.
Panos Ipeirotis Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation Systems Join work with Anindya.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.
Human Computation and Computer Vision CS143 Computer Vision James Hays, Brown University.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Extraction Lecture 3 – Rule-based Named Entity Recognition Dr. Alexander Fraser, U. Munich September 3rd, 2014 ISSALE: University of Colombo.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Information Extraction Lecture 3 – Rule-based Named Entity Recognition CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

Information Extraction Lecture 3 – Rule-based Named Entity Recognition
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
CSE 635 Multimedia Information Retrieval
Measuring Complexity of Web Pages Using Gate
Panagiotis G. Ipeirotis Luis Gravano
Probabilistic Databases
Presentation transcript:

Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University

2 Motivation Many task rely on high-quality labels for objects: – relevance judgments – duplicate database records – image recognition – song categorization – videos Labeling can be relatively inexpensive, using Mechanical Turk, ESP game …

ESP Game (by Luis von Ahn) 3

Mechanical Turk Example “Are these two documents about the same topic?” 4

Mechanical Turk Example 5

6 Motivation Labels can be used in training predictive models – Duplicate detection systems – Image recognition – Web search But: labels obtained from above sources are noisy. This directly affects the quality of learning models – How can we know the quality of annotators? – How can we know the correct answer? – How can we use best noisy annotators?

7 Quality and Classification Performance Labeling quality increases  classification quality increases Q = 0.5 Q = 0.6 Q = 0.8 Q = 1.0

8 How to Improve Labeling Quality Find better labelers – Often expensive, or beyond our control Use multiple, noisy labelers: repeated-labeling – Our focus

9 Multiple labelers and resulting label quality Multiple labelers and classification quality Selective label acquisition Our Focus: Labeling using Multiple Noisy Labelers

10 Majority Voting and Label Quality P=0.4 P=0.5 P=0.6 P=0.7 P=0.8 P=0.9 P=1.0 Ask multiple labelers, keep majority label as “true” label Quality is probability of majority label being correct P is probability of individual labeler being correct

So… (Sometimes) quality of multiple noisy labelers better than quality of best labeler in set 11 Multiple noisy labelers improve quality So, should we always get multiple labels?

12 Tradeoffs for Classification Get more labels  Improve label quality  Improve classification Get more examples  Improve classification Q = 0.5 Q = 0.6 Q = 0.8 Q = 1.0

13 Basic Labeling Strategies Get as many data points as possible, one label each Repeatedly-label everything, same number of times

14 Repeat-Labeling vs. Single Labeling P= 0.6, labeling quality K=5, #labels/example Repeated Single With high noise, repeated labeling better than single labeling

15 Repeat-Labeling vs. Single Labeling P= 0.8, labeling quality K=5, #labels/example Repeated Single With low noise, more (single labeled) examples better

Estimating Labeler Quality (Dawid, Skene 1979): “Multiple diagnoses” – Assume equal qualities – Estimate “true” labels for examples – Estimate qualities of labelers given the “true” labels – Repeat until convergence 16

17 Selective Repeated-Labeling We have seen: – With noise and enough (noisy) examples getting multiple labels better than single-labeling Can we do better? Select data points, in terms of uncertainty score, to allocate multi-label resource, e.g. {+,-,+,+,-,+,+} vs. {+,+,+,+}

18 Natural Candidate: Entropy Entropy is a natural measure of label uncertainty: E({+,+,+,+,+,+})=0 E({+,-, +,-, +,- })=1 Strategy: Get more labels for high-entropy examples

19 What Not to Do: Use Entropy Improves at first, hurts in long run Entropy Round robin

Why not Entropy In the presence of noise, entropy will be high even with many labels Entropy is scale invariant – (3+, 2-) has same entropy as (600+, 400-) 20

21 Estimating Label Uncertainty (LU) Observe +’s and –’s and compute Pr{+|obs} and Pr{-|obs} Label uncertainty = tail of beta distribution S LU Beta probability density function

Label Uncertainty p=0.7 5 labelers (3+, 2-) Entropy ~

Label Uncertainty p= labelers (7+, 3-) Entropy ~

Label Uncertainty p= labelers (14+, 6-) Entropy ~

Comparison 25 Label Uncertainty Uniform, round robin

26 Model Uncertainty (MU) However, we do not have only labelers A classifier can also give us labels! Model uncertainty: get more labels for ambiguous/difficult examples Intuitively: make sure that difficult cases are correct ? ? ?

27 Label + Model Uncertainty Label and model uncertainty (LMU): avoid examples where either strategy is certain

Comparison 28 Label Uncertainty Uniform, round robin Label + Model Uncertainty Model Uncertainty alone also improves quality

29 Classification Improvement

30 Conclusions Gathering multiple labels from noisy users is a useful strategy Under high noise, almost always better than single-labeling Selectively labeling using label and model uncertainty is more effective

31 More Work to Do Estimating the labeling quality of each labeler Increased compensation vs. labeler quality Example-conditional quality issues (some examples more difficult than others) Multiple “real” labels Hybrid labeling strategies using “learning-curve gradient”

Other Projects SQoUT project Structured Querying over Unstructured Text Faceted Interfaces EconoMining project The Economic Value of User Generated Content

33 SQoUT: Structured Querying over Unstructured Text Information extraction applications extract structured relations from unstructured text May , Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire, is finding itself hard pressed to cope with the crisis… DateDisease NameLocation Jan. 1995MalariaEthiopia July 1995Mad Cow DiseaseU.K. Feb. 1995PneumoniaU.S. May 1995EbolaZaire Information Extraction System (e.g., NYU’s Proteus) Disease Outbreaks in The New York Times

34 SQoUT: The Questions Output Tokens … Extraction System(s) Text Databases 3.Extract output tuples 2.Process documents 1.Retrieve documents from database/web/archive Questions: 1.How to we retrieve the documents? 2.How to configure the extraction systems? 3.What is the execution time? 4.What is the output quality? SIGMOD’06, TODS’07, + in progress

EconoMining Project Show me the Money! Applications (in increasing order of difficulty)  Buyer feedback and seller pricing power in online marketplaces (ACL 2007)  Product reviews and product sales (KDD 2007)  Importance of reviewers based on economic impact (ICEC 2007)  Hotel ranking based on “bang for the buck” (WebDB 2008)  Political news (MSM, blogs), prediction markets, and news importance Basic Idea  Opinion mining an important application of information extraction  Opinions of users are reflected in some economic variable (price, sales)

Some Indicative Dollar Values Positive Negative Natural method for extracting sentiment strength and polarity good packaging -$0.56 Naturally captures the pragmatic meaning within the given context captures misspellings as well Positive? Negative ?

Thanks! Q & A?