Towards Minimizing the Annotation Cost of Certified Text Classification Mossaab Bagdouri 1 David D. Lewis 2 William Webber 1 Douglas W. Oard 1 1 University.

Slides:



Advertisements
Similar presentations
Member consultation 2007 Draft ISPM: Sampling of Consignments Steward: David Porritt.
Advertisements

Learning Algorithm Evaluation
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Ch11 Curve Fitting Dr. Deshi Ye
Model Assessment, Selection and Averaging
1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.
Feature Selection Presented by: Nafise Hatamikhah
QBM117 Business Statistics Statistical Inference Sampling 1.
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Copyright 2004 David J. Lilja1 Errors in Experimental Measurements Sources of errors Accuracy, precision, resolution A mathematical model of errors Confidence.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Adaptive Data Collection Strategies for Lifetime-Constrained Wireless Sensor Networks Xueyan Tang Jianliang Xu Sch. of Comput. Eng., Nanyang Technol. Univ.,
1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
Machine Learning CMPT 726 Simon Fraser University
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Ensemble Learning (2), Tree and Forest
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Collaborative Filtering Matrix Factorization Approach
1 Terminating Statistical Analysis By Dr. Jason Merrick.
Today Evaluation Measures Accuracy Significance Testing
RTI International is a trade name of Research Triangle Institute 3040 Cornwallis Road ■ P.O. Box ■ Research Triangle Park, North Carolina, USA
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Online Learning for Latent Dirichlet Allocation
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
ACTIVE LEARNING USING CONFORMAL PREDICTORS: APPLICATION TO IMAGE CLASSIFICATION HypHyp Introduction HypHyp Conceptual overview HypHyp Experiments and results.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Universit at Dortmund, LS VIII
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Baseline Optimization Studies D. Reyna Argonne National Lab.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Eurostat – Unit D5 Key indicators for European policies Third International Seminar on Early Warning and Business Cycle Indicators Annotated outline of.
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Latent Dirichlet Allocation
A local search algorithm with repair procedure for the Roadef 2010 challenge Lauri Ahlroth, André Schumacher, Henri Tokola
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
Chapter 7 Data for Decisions. Population vs Sample A Population in a statistical study is the entire group of individuals about which we want information.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Sampling Distributions
Queensland University of Technology
Non-Parametric Models
Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007.
Open-Category Classification by Adversarial Sample Generation
Inference for Proportions
Section 10.1: Confidence Intervals
Proposed Formative Evaluation Adaptive Topic Tracking Systems
Section 7.7 Introduction to Inference
Classification of class-imbalanced data
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Statistical Inference
Presentation transcript:

Towards Minimizing the Annotation Cost of Certified Text Classification Mossaab Bagdouri 1 David D. Lewis 2 William Webber 1 Douglas W. Oard 1 1 University of Maryland, College Park, MD, USA 2 David D. Lewis consulting, Chicago, IL, USA

Outline Introduction Economical assured effectiveness Solution framework Baseline solutions Conclusion 2

1. Build a good classifier 2. Certify that this classifier is good 3. Use nearly minimal total annotations Goal: Economical assured effectiveness 3 (Photo courtesy of ? + -

Notation F1F1 Annotations F1F1 ^ F1F1 θ τ Test Training 4 α = 0.05

Fixed test set Growing training set F1F1 Annotations τ Test Training 5 F1F1 ^ F1F1 θ

Stop Criterion Success Desired95.00% F 1 ≥ τ46.42% θ ≥ τ91.87% Fixed test set Growing training set Training documents Test Training τ 6 Collection = RCV1, Topic = M132, Freq = 3.33% ^

Fixed training set Growing test set F1F1 Annotations τ Training Test 7 F1F1 ^ F1F1 θ

Problem 1: Sequential testing  bias F1F1 Annotations τ Stop here Want to stop here Do not stop θ F1F1 8

Solution: Train sequentially, Test once F1F1 Training annotations τ θ Train without testing Test only once Training Test 9 θ

Problem 2: What is the size of the Test set? Training Test 10

Solution: Power analysis Observation 1 from power analysis: ◦True effectiveness greatly exceeds the target  Small test set needed Observation 2 from the shape of learning curves: ◦New training examples provide less of an increase in effectiveness Training documents 11 τ F1F1 β = 0.07 Power = 1 - β

+∞ Training Test +∞ Training True F 1 τ Designing annotation minimization policies 12 Training + Test ($$$)

Allocation policies in practice No closed form solution to go from an effect size on F 1 to a test set size ◦  Simulation methods True effectiveness invisible ◦  Cross-validation to estimate it No access to the entire curve Scattered and noisy estimates ◦  Need to decide online Training Training + Test ($$$) True F 1 τ Topic = C18, Frequency = 6.57% Training documents Training + Test ($$$) 13

Estimating the true F1 (Cross-validation) Training 14

Estimating the true F1 (Simulations) Training Posterior distribution 15

Infer test set size Infer test set size Training F1F1 Training annotations τ θ Test +∞ Minimizing the annotations 16 α τ β Measure (F 1 ) Algorithm (SVM)

Experiments Test collection: RCV1-v2 ◦29 topics with a prevalence ≥ 3% ◦20 randomized runs per topic Classifier: SVM Perf ◦Off-the-shelf classifier ◦Optimizes training for F 1 Settings ◦Budget: 10,000 documents ◦Power 1 - β = 0.93 ◦Confidence level 1 – α = 0.95 ◦Documents added in buckets of 20 17

Policies Training documents Training + Test ($$$) Topic = C18 Frequency = 6.57% 18

Stop as early as possible Budget achieved in 70.52% of times Failure rate of 20.54% > β (7%) Sequential testing bias pushed into process management Training documents Training + Test ($$$) 19 Topic = C18, Frequency = 6.57%

Minimum cost policy ◦Savings: 43.21% of the total annotations ◦Failure rate of 27.14% > β (7%) Minimum cost for success policy ◦Savings: 38.08% Training documents Training + Test ($$$) 20 Topic = C18, Frequency = 6.57% Oracle policies 20

Training documents Training + Test ($$$) 21 Topic = C18, Frequency = 6.57% Wait-a-while policies 21 Savings (%) Success (%) Cannot open (%) w W=0W=1 W=2 W=3 Last chance

Conclusion Re-testing introduces statistical bias Algorithm to indicate: ◦If / when a classifier can achieve a threshold ◦How many documents required to certify a trained model Subroutine for policies minimizing the cost Possibility to save 38% of cost 22

Towards Minimizing the Annotation Cost of Certified Text Classification Thank you!