Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.

Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University

2 Motivation Many task rely on high-quality labels for objects: – relevance judgments – duplicate database records – image recognition – song categorization – videos Labeling can be relatively inexpensive, using Mechanical Turk, ESP game …

ESP Game (by Luis von Ahn) 3

Mechanical Turk Example “Are these two documents about the same topic?” 4

Mechanical Turk Example 5

6 Motivation Labels can be used in training predictive models – Duplicate detection systems – Image recognition – Web search But: labels obtained from above sources are noisy. This directly affects the quality of learning models – How can we know the quality of annotators? – How can we know the correct answer? – How can we use best noisy annotators?

7 Quality and Classification Performance Labeling quality increases  classification quality increases Q = 0.5 Q = 0.6 Q = 0.8 Q = 1.0

8 How to Improve Labeling Quality Find better labelers – Often expensive, or beyond our control Use multiple, noisy labelers: repeated-labeling – Our focus

9 Multiple labelers and resulting label quality Multiple labelers and classification quality Selective label acquisition Our Focus: Labeling using Multiple Noisy Labelers

10 Majority Voting and Label Quality P=0.4 P=0.5 P=0.6 P=0.7 P=0.8 P=0.9 P=1.0 Ask multiple labelers, keep majority label as “true” label Quality is probability of majority label being correct P is probability of individual labeler being correct

So… (Sometimes) quality of multiple noisy labelers better than quality of best labeler in set 11 Multiple noisy labelers improve quality So, should we always get multiple labels?

12 Tradeoffs for Classification Get more labels  Improve label quality  Improve classification Get more examples  Improve classification Q = 0.5 Q = 0.6 Q = 0.8 Q = 1.0

13 Basic Labeling Strategies Get as many data points as possible, one label each Repeatedly-label everything, same number of times

14 Repeat-Labeling vs. Single Labeling P= 0.6, labeling quality K=5, #labels/example Repeated Single With high noise, repeated labeling better than single labeling

15 Repeat-Labeling vs. Single Labeling P= 0.8, labeling quality K=5, #labels/example Repeated Single With low noise, more (single labeled) examples better

Estimating Labeler Quality (Dawid, Skene 1979): “Multiple diagnoses” – Assume equal qualities – Estimate “true” labels for examples – Estimate qualities of labelers given the “true” labels – Repeat until convergence 16

17 Selective Repeated-Labeling We have seen: – With noise and enough (noisy) examples getting multiple labels better than single-labeling Can we do better? Select data points, in terms of uncertainty score, to allocate multi-label resource, e.g. {+,-,+,+,-,+,+} vs. {+,+,+,+}

18 Natural Candidate: Entropy Entropy is a natural measure of label uncertainty: E({+,+,+,+,+,+})=0 E({+,-, +,-, +,- })=1 Strategy: Get more labels for high-entropy examples

19 What Not to Do: Use Entropy Improves at first, hurts in long run Entropy Round robin

Why not Entropy In the presence of noise, entropy will be high even with many labels Entropy is scale invariant – (3+, 2-) has same entropy as (600+, 400-) 20

21 Estimating Label Uncertainty (LU) Observe +’s and –’s and compute Pr{+|obs} and Pr{-|obs} Label uncertainty = tail of beta distribution S LU 0.5 0.01.0 Beta probability density function

Label Uncertainty p=0.7 5 labelers (3+, 2-) Entropy ~ 0.97 22

Comparison 25 Label Uncertainty Uniform, round robin

26 Model Uncertainty (MU) However, we do not have only labelers A classifier can also give us labels! Model uncertainty: get more labels for ambiguous/difficult examples Intuitively: make sure that difficult cases are correct + + + + + + + + + + - - - - - - - - - - - - - - - - ? ? ?

27 Label + Model Uncertainty Label and model uncertainty (LMU): avoid examples where either strategy is certain

Comparison 28 Label Uncertainty Uniform, round robin Label + Model Uncertainty Model Uncertainty alone also improves quality

29 Classification Improvement

30 Conclusions Gathering multiple labels from noisy users is a useful strategy Under high noise, almost always better than single-labeling Selectively labeling using label and model uncertainty is more effective

31 More Work to Do Estimating the labeling quality of each labeler Increased compensation vs. labeler quality Example-conditional quality issues (some examples more difficult than others) Multiple “real” labels Hybrid labeling strategies using “learning-curve gradient”

Other Projects SQoUT project Structured Querying over Unstructured Text http://sqout.stern.nyu.edu http://sqout.stern.nyu.edu Faceted Interfaces EconoMining project The Economic Value of User Generated Content http://economining.stern.nyu.edu http://economining.stern.nyu.edu 32

33 SQoUT: Structured Querying over Unstructured Text Information extraction applications extract structured relations from unstructured text May 19 1995, Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire, is finding itself hard pressed to cope with the crisis… DateDisease NameLocation Jan. 1995MalariaEthiopia July 1995Mad Cow DiseaseU.K. Feb. 1995PneumoniaU.S. May 1995EbolaZaire Information Extraction System (e.g., NYU’s Proteus) Disease Outbreaks in The New York Times

34 SQoUT: The Questions Output Tokens … Extraction System(s) Text Databases 3.Extract output tuples 2.Process documents 1.Retrieve documents from database/web/archive Questions: 1.How to we retrieve the documents? 2.How to configure the extraction systems? 3.What is the execution time? 4.What is the output quality? SIGMOD’06, TODS’07, + in progress

EconoMining Project Show me the Money! Applications (in increasing order of difficulty)  Buyer feedback and seller pricing power in online marketplaces (ACL 2007)  Product reviews and product sales (KDD 2007)  Importance of reviewers based on economic impact (ICEC 2007)  Hotel ranking based on “bang for the buck” (WebDB 2008)  Political news (MSM, blogs), prediction markets, and news importance Basic Idea  Opinion mining an important application of information extraction  Opinions of users are reflected in some economic variable (price, sales)

Some Indicative Dollar Values Positive Negative Natural method for extracting sentiment strength and polarity good packaging -$0.56 Naturally captures the pragmatic meaning within the given context captures misspellings as well Positive? Negative ?

Thanks! Q & A?

Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.

Similar presentations

Presentation on theme: "Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.

Similar presentations

Presentation on theme: "Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University."— Presentation transcript:

Similar presentations

About project

Feedback