1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M.

Slides:

Advertisements

Similar presentations

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Advertisements

Random Forest Predrag Radenković 3237/10

Imbalanced data David Kauchak CS 451 – Fall 2013.

1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.

Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.

Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? Gary Weiss, Kate McCarthy, Bibi Zabar Fordham.

Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.

A Combinatorial Fusion Method for Feature Mining Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Fordham University Presented by Gary Weiss.

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.

On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,

A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

Experimental Evaluation

Introduction to Data Mining Engineering Group in ACL.

1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)

1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Active Learning for Class Imbalance Problem

Bayesian Networks. Male brain wiring Female brain wiring.

by B. Zadrozny and C. Elkan

COMP3503 Intro to Inductive Modeling

Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu

Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.

Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.

Experimental Evaluation of Learning Algorithms Part 1.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

Classification Techniques: Bayesian Classification

Gary M. Weiss Alexander Battistin Fordham University.

CogNova Technologies 1 Evaluating Induced Models Evaluating Induced Models with Daniel L. Silver Daniel L. Silver Copyright (c), 2004 All Rights Reserved.

CpSc 881: Machine Learning Evaluating Hypotheses.

Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,

Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.

Classification using Co-Training

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Active, Semi-Supervised Learning for Textual Information Access Anastasia Krithara¹, Cyril Goutte², Massih-Reza Amini³, Jean-Michel Renders¹ Massih-Reza.

Machine Learning: Ensemble Methods

Semi-Supervised Clustering

Modeling Annotator Accuracies for Supervised Learning

Boosted Augmented Naive Bayes. Efficient discriminative learning of

CS 4/527: Artificial Intelligence

Classification Techniques: Bayesian Classification

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —

Machine Learning: Lecture 5

Presentation transcript:

1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M. Weiss KDD-09, Paris, France Department of Computer and Information Science † Also with the Office of Institutional Research Fordham University Fordham University, USA

2 KDD-09, Paris France Important Research Problem Distributions may change after model is induced Our research problem/scenario: –Class distribution changes but “concept” does not –Let x represent an example and y its label. We assume: P(y|x) is constant (i.e., concept does not change) P(y) changes (which means that P(x) must change) –Assume unlabeled data available from new class distribution (training and separate test)

3 KDD-09, Paris France Two research questions: – How can we maximize classifier performance when class distribution changes but is unknown? – How can we utilize unlabeled data from the changed class distribution to accomplish this? Our Goals – Outperform naïve methods that ignore these changes – Approach performance of “oracle” method which trains on labeled data from new distribution Research Questions and Goals

4 KDD-09, Paris France When Class Distribution Changes

5 KDD-09, Paris France Technical Approaches Quantification [Forman KDD 06 & DMKD 08] –Task of estimating a class distribution (CD) Much easier than classification –Adjust model to compensate for CD change [ Elkan 01, Weiss & Provost 03] –New examples not used directly in training –We call class distribution estimation (CDE) methods Semi-Supervised Learning (SSL) –Exploits unlabeled data, which are used for training Other approaches discussed later

6 KDD-09, Paris France CDE Methods CDE-Oracle (upper bound) –Determines new CD by peeking at class labels then adjusts model; CDE upper bound CDE-Iterate-n –Iterative algorithm because changes to class distribution will be underestimated 1. Builds model M on orig. training data (using last NEW CD ) 2. Labels new distribution to estimate NEW CD 3. Adjusts M using NEW CD estimate; Output M; 4. Increment n; Loop to step 1

7 KDD-09, Paris France CDE Methods CDE-AC –Based on Adjusted Count quantification –See [Forman KDD 06 and DMKD 08] for details –Adjusted Positive Rate pr * = (pr – fpr) / (tpr – fpr) pr is calculated from the predicted class labels fpr and tpr obtained via cross-validation of labeled training set Essentially compensates for fact that pr will underestimate changes to class distribution

8 KDD-09, Paris France SSL Methods SSL-Naïve 1.Build model from labeled training data 2.Label unlabeled data from new distribution 3.Build new model from predicted labels of new distr. –Note: Does not directly use original training data SSL-Self-Train –Similar to SSL-Naïve, but original training data used and examples from new distribution with most confident predictions (above median) – Iterates until all examples merged or max iterations (4)

9 KDD-09, Paris France Hybrid Method Combination of SSL-Self-Train and CDE-Iterate –Can view as SSL-Self-Train but at each iteration model adjusted to compensate for difference between CD of merged training data and model applied to new data

10 KDD-09, Paris France Experiment Methodology Use 5 relatively large UCI data sets Partition data to form “original” and “new” distributions –Original distribution made to be 50% positive –New distribution varied from 1% to 99% positive –Results averaged over 10 random runs Use WEKA’s J48 for experiments (like C4.5) Track accuracy and F-measure –F-measure places more emphasis on minority-class

11 KDD-09, Paris France Results: Accuracy (Adult Data Set)

12 KDD-09, Paris France Results: Accuracy (SSL-Naive)

13 KDD-09, Paris France Results: Accuracy (SSL-Self-Train)

14 KDD-09, Paris France Results: Accuracy (CDE-Iterate-1)

15 KDD-09, Paris France Results: Accuracy (CDE-Iterate-2)

16 KDD-09, Paris France Results: Accuracy (Hybrid)

17 KDD-09, Paris France Results: Accuracy (CDE-AC)

18 KDD-09, Paris France Results: Average Accuracy (99 pos rates)

19 KDD-09, Paris France Results: F-Measure (Adult Data Set)

20 KDD-09, Paris France Results: F-Measure (99 pos rates)

21 KDD-09, Paris France Why do Oracle Methods Perform Poorly? Oracle method: –Oracle trains only on new distribution –New distribution often very unbalanced –F-measure should do best with balanced data Weiss and Provost (2003) show balanced best for AUC CDE-Oracle method: –CDE-Iterate underestimates change in class distr. –May be helpful for F-measure since will better balance importance of minority class

22 KDD-09, Paris France Conclusion Can substantially improve performance by not ignoring changes to class distribution –Can exploit unlabeled data from new distribution, even if only to estimate NEW CD –Quantification methods can be very helpful and much better than semi-supervised learning alone

23 KDD-09, Paris France Future Work Problem reduced with well-calibrated probability models (Zadrozny & Elkan ’01) –Decision trees do not produce these –Evaluate methods that produce good estimates In our problem setting p(x) changes –Try methods that measure this change and compensate for it (e.g., via weighting the x’s) Experiment with initial distribution not 1:1 –Especially highly skewed distributions (e.g. diseases) Other issues: data streams/real time update

24 KDD-09, Paris France References [Forman 06] G. Forman, Quantifying trends accurately despite classifier error and class imbalance, KDD-06, [Forman 08] G. Forman, Quantifying counts and costs via classification, Data Mining and Knowledge Discovery, 17(2), [Weiss & Provost 03] G. Weiss & F. Provost, Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction, Journal of Artificial Intelligence Research, 19: [Zadrozny & Elkan 01] B. Zadrozny & C. Elkan, Obtaining calibrated probability estimates from decision trees and naïve bayesian classifiers, ICML-01,