Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.

Slides:

Advertisements

Similar presentations

Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey

Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Incentivize Crowd Labeling under Budget Constraint

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University

Imbalanced data David Kauchak CS 451 – Fall 2013.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.

Supervised Learning Recap

FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Resource Management of Highly Configurable Tasks April 26, 2004 Jeffery P. HansenSourav Ghosh Raj RajkumarJohn P. Lehoczky Carnegie Mellon University.

A Probabilistic Framework for Semi-Supervised Clustering

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.

DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.

+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.

Introduction to Monte Carlo Methods D.J.C. Mackay.

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

Active Learning for Class Imbalance Problem

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.

1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Jaime Carbonell, Pinar Donmez, Jingui He & Vamshi Ambati Language Technologies Institute Carnegie Mellon University 27 October 2010.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.

A Statistical Method for 3D Object Detection Applied to Face and Cars CVPR 2000 Henry Schneiderman and Takeo Kanade Robotics Institute, Carnegie Mellon.

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.

Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Classification Ensemble Methods 1

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

Introduction to Machine Learning © Roni Rosenfeld,

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

An Application of Divergence Estimation to Projection Retrieval for Semi-supervised Classification and Clustering MADALINA FITERAU ARTUR DUBRAWSKI ICML.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Introduction to Machine Learning

Semi-Supervised Clustering

Cluster Analysis II 10/03/2012.

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Presentation transcript:

Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language Technologies Institute, Language Technologies Institute, School of Computer Science, School of Computer Science, Carnegie Mellon University Carnegie Mellon University CIKM ’08, Napa Valley, October 2008 CIKM ’08, Napa Valley, October 2008

Active learning Assumptions and Real World ► unique oracle ► perfect oracle  always right  never tired ► works for free or charges uniformly ► multiple sources of information ► imperfect oracles  unreliable  reluctant ► expensive or charges non-uniformly Active LearningReal World

Solution: Proactive Learning ► Proactive learning is a generalization of active learning to relax these assumptions ► decision-theoretic framework to jointly optimize instance-oracle pair ► utility optimization problem under a fixed budget constraint

Outline ► Methodology  3 Scenarios ► Reluctance ► Fallibility ► Variable and Fixed Cost ► Evaluation  Problem Setup  Datasets  Results ► Conclusion

Scenario 1: Reluctance ► 2 oracles:  reliable oracle: expensive but always answers with a correct label  reluctant oracle: cheap but may not respond to some queries ► Define a utility score as expected value of information at unit cost

How to simulate oracle unreliability? ► depend on factors such as query difficulty (hard to classify), complexity of the data (requires long and time-consuming analysis), etc. In this work, we model it based on query difficulty ► Assumptions  Perfect oracle ~ classifier having zero training error on the entire data  Imperfect oracle ~ weak classifier trained on a subset of the entire data ► Train a logistic regression classifier on the subset to obtain ► Identify instances with ► These are the unreliable instances ► Challenge: tradeoff between the information value of an instance and the reliability of the oracle

How to estimate ? ► Cluster unlabeled data using k-means ► Ask the label of each cluster centroid to the reluctant oracle. If  label received: increase of nearby points  no label: decrease of nearby points equals 1 when label received, -1 otherwise ► # clusters depend on the clustering budget and oracle fee

► Algorithm works in rounds till no budget ► At each round, sampling continues until a label is obtained ► Be careful: You may spend the entire budget on a single attempt ► If no label, decrease the utility of remaining instances: ► This is adaptive Penalization of the Reluctant Oracle

Algorithm for Scenario 1

Scenario 2: Fallibility ► 2 oracles:  One perfect but expensive oracle  One fallible but cheap oracle, always answers ► Alg. Similar to Scenario 1 with slight modifications ► During exploration:  Fallible oracle provides the label with its confidence  Confidence = of fallible oracle  If then we don’t use the label but we still update but we still update

Outline of Scenario 2

Scenario 3: Non-uniform Cost ► Uniform cost: Fraud detection, face recognition, etc. ► Non-uniform cost: text categorization, medical diagnosis, protein structure prediction, etc. ► 2 oracles:  Fixed-cost Oracle  Variable-cost Oracle

Outline of Scenario 3

Evaluation ► Datasets: Face detection, UCI Letter (V-vs-Y), Spambase, and UCI Adult

Oracle Properties and Costs ► The cost is inversely proportional to reliability ► Higher costs for the fallible oracle since a noisy label should be penalized more than no label at all ► Cost ratio creates an incentive to choose between oracles

Underlying Sampling Strategy ► Conditional entropy based sampling, weighted by a density measure ► Captures the information content of a close neighborhood close neighbors of x

Results: Overall and Reluctance on Spambase Data

Results: Reluctance

Cost varies non-uniformly statistically significant results (p<0.01)

More light on the clustering step ► Run each baseline without the clustering step ► Entire budget is spent in rounds for data elicitation ► No separate clustering budget ► Results on Spambase under Scenario 1, cost 1:3

Conclusion ► Address issues with the assumptions of active learning ► Introduction to a Proactive Learning framework ► Analysis of imperfect oracles with differing properties and costs ► Expected utility maximization across oracle-instance pairs ► Effective against exploitation of a single oracle