Modeling Annotator Accuracies for Supervised Learning

Slides:



Advertisements
Similar presentations
Unsupervised Learning
Advertisements

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
1 Test-Cost Sensitive Naïve Bayes Classification X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology.
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Indian Statistical Institute Kolkata
Algorithm-Independent Machine Learning Anna Egorova-Förster University of Lugano Pattern Classification Reading Group, January 2007 All materials in these.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Ensemble Learning: An Introduction
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.
Thanks to Nir Friedman, HU
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Crash Course on Machine Learning
Machine Learning CS 165B Spring 2012
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Machine Learning: Experimental Evaluation. 2 Motivation Evaluating the performance of learning systems is important because: –Learning systems are usually.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor Sheng, Foster Provost, Panos Ipeirotis KDD 2008 New York.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
CS 391L: Machine Learning: Ensembles
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Diversification of Parameter Uncertainty Eric R. Ulm Georgia State University.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
1 A Comparison of Information Management using Imprecise Probabilities and Precise Bayesian Updating of Reliability Estimates Jason Matthew Aughenbaugh,
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
Classification Ensemble Methods 1
Validation methods.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
Machine Learning: Ensemble Methods
Re-active Learning: Active Learning with Re-labeling
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Evaluating Classifiers
Sofus A. Macskassy Fetch Technologies
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Eco 6380 Predictive Analytics For Economists Spring 2016
SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers.
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
COMP61011 : Machine Learning Ensemble Models
Bayesian Averaging of Classifiers and the Overfitting Problem
Data Mining Practical Machine Learning Tools and Techniques
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
Lecture 18: Bagging and Boosting
Ensemble learning.
Lecture 06: Bagging and Boosting
Active Learning with Unbalanced Classes & Example-Generation Queries
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Instructor: Vincent Conitzer
CS 391L: Machine Learning: Ensembles
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Modeling Annotator Accuracies for Supervised Learning Abhimanu Kumar Department of Computer Science University of Texas at Austin abhimanu@cs.utexas.edu http://abhimanukumar.com Matthew Lease School of Information University of Texas at Austin ml@ischool.utexas.edu

Supervised learning from noisy labels Labeling is inherently uncertain Even experts disagree and make mistakes Crowd tends to be noisier with higher variance Use wisdom of crowds to reduce uncertainty Multi-label + aggregation = consensus labels How to maximize learning rate (labeling effort)? Label a new example? Get another label for an already-labeled example? See: Sheng, Provost & Ipeirotis, KDD’08 Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Task Setup Task: Binary classification Learner: C4.5 decision tree Given An initial seed set of single-labeled examples (64) An unlimited pool of unlabeled examples Cost model Fixed unit cost for labeling any example Unlabeled examples are freely obtained Goal: Maximize learning rate (for labeling effort) Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Compare 3 methods: SL, MV, & NB Single labeling (SL): label a new example Multi-Labeling: get another label for pool Majority Vote (MV): consensus by simple vote Naïve Bayes (NB): weight vote by annotator accuracy Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Assumptions Example selection: random From pool for SL, from seed set for multi-labeling No selection based on active learning Fixed commitment to a single method a priori No switching between methods at run-time Balanced classes model & measure simple accuracy (not P/R, ROC) Assume uniform class prior for NB Annotator accuracies are known to system In practice, must estimate these: from gold data (Snow et al. ’08) or EM (Dawid & Skene’79) Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Simulation Each annotator Has parameter p (prob. of producing correct label) Generates exactly one label Uniform distribution of accuracies U(min,max) Generative model for simulation Pick an example x (with true label y*) at random Draw annotator accuracy p ~ U(min,max) Generate label y ~ P(y | p, y*) Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Evaluation Data: 4 datasets from UCI ML Repository Mushroom Spambase Tic-Tac-Toe Chess: King-Rook vs. King-Pawn Same trends across all 4, so we report first 2 Random 70 / 30 split of data for seed+pool / test Repeat each run 10 times and average results http://archive.ics.uci.edu/ml/datasets.html Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

p ~ U(0.6, 1.0) Fairly accurate annotators (mean = 0.8) Little uncertainty -> little gain from multi-labeling Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

p ~ U(0.4, 0.6) Very noisy (mean = 0.5, random coin flip) SL and MV learning rates are flat NB wins by weighting more accurate workers Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

p ~ U(0.3, 0.7) Same noisy mean (0.5), but widen range SL and MV stay flat NB further outperforms Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

p ~ U(0.1, 0.7) Worsen accuracies further (mean = 0.4) NB virtually unchanged SL and MV predictions become anti-correlated We should actually flip their predictions… Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

p ~ U(0.2, 0.6) Keep noisy mean 0.4, tighten range NB best of the worst, but only 50% Again, seems we should be flipping labels… Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Label flipping Is NB doing better due to how it uses accuracy, or simply because it’s using more information? If a worker’s average accuracy is below 50%, we know he tends to be wrong (we’ve ignored this) whatever he says, we should guess the opposite Flipping: put all methods on even-footing Assume a given p < 0.5 produces label = y Use label = (1-y) instead; for NB, use 1-p accuracy Same as changing distribution so p always > 0.5 Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

p ~ U(0.1, 0.7) No flipping fter With flipping Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

p ~ U(0.2, 0.6) No flipping With flipping Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Conclusion Moral: important to model annotator accuracies, even for simple SL and MV But what about… When accuracies are estimated (noisy)? With real annotation errors (real distribution)? With different learners or tasks (e.g. ranking)? With dynamic choice of new example or re-label? With active learning example selection? With imbalanced classes? … Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011

Thanks! Special thanks to all our diligent crowd annotators and their dedication to science… Kumar and Lease. Modeling Annotator Accuracies for Supervised Learning. CSDM 2011. February 9, 2011