IDSL, Intelligent Database System Lab

Slides:



Advertisements
Similar presentations
Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey
Advertisements

An Improved Categorization of Classifiers Sensitivity on Sample Selection Bias Wei Fan Ian Davidson Bianca Zadrozny Philip S. Yu.
Is Random Model Better? -On its accuracy and efficiency-
Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM T.J.Watson Janek Mathuria, and Chang-tien Lu Virginia.
Random Forest Predrag Radenković 3237/10
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Indian Statistical Institute Kolkata
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Ensemble Learning: An Introduction
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
by B. Zadrozny and C. Elkan
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Predicting Good Probabilities With Supervised Learning
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
Validation methods.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Ensemble Classifiers.
Chapter 7. Classification and Prediction
Bagging and Random Forests
An Empirical Comparison of Supervised Learning Algorithms
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Machine Learning Basics
Bias and Variance of the Estimator
CS 4/527: Artificial Intelligence
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Predictive Modeling
Ensembles.
Cross-validation for the selection of statistical models
15.1 The Role of Statistics in the Research Process
Statistical Thinking and Applications
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Machine Learning: Lecture 5
Presentation transcript:

IDSL, Intelligent Database System Lab Learning and Making Decisions When Costs and Probabilities are Both Uknown Authors:Bianca Zadrozny, Charles Elkan Advisor:Dr. Hsu Graduate:Yu-Wei Su 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Outline Motivation Objective Introduction MetaCost vs. direct cost-sensitive decision-making a testbed:The KDD’98 charitable donations dataset Probability estimation methods Estimaition donation amounts Experimental results Conclusion opinion 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Motivation Misclassification costs are different for different examples, in the same way of probabilities Problems of data unbalance in real world dataset 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Objective To make optimal decisions given cost and probabilities Solution of sample bias based on Nobel prize-winning economist, James Heckman 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Introduction Most supervised learning algorithms assume all errors(incorrect predictions) are equal—not true Cost-sensitive learning lead to the lowest expected cost Non cost-sensitive learning classified as accurate To present an alternative method call direct cost-sensitive decision-making 2019/2/25 IDSL, Intelligent Database System Lab

MetaCost vs. direct cost-sensitive decision-making Each example x is associated with a cost C(i,j,x) of predicting class i for x when the true class of x is j The optimal decision concerning x is the class i that leads to the lowest expected cost 2019/2/25 IDSL, Intelligent Database System Lab

MetaCost vs. direct cost-sensitive decision-making( cont.) Direct cost-sensitive decsion-making has the same central idea but two difference MetaCost is based on the assumption that costs are known in advance and are the same for all examples do not estimate probabilities using bagging, using simpler method based on single decison tree 2019/2/25 IDSL, Intelligent Database System Lab

A testbed:the KDD’98 charitable donations dataset Training set consists of 95412 records with known classes;test set consists of 96367 records without known classes The overall percentage of donors among population is about 5% The donation amount for persons who respond varies from $1 to $200 2019/2/25 IDSL, Intelligent Database System Lab

A testbed:the KDD’98 charitable donations dataset( cont.) In donation domain it is easier to talk consistently about benefit than than cost The optimal predicted label for example x is the class i that maximizes(j=1 mean the person does donate;j=0 not donate) 2019/2/25 IDSL, Intelligent Database System Lab

A testbed:the KDD’98 charitable donations dataset( cont.) The optimal policy 2019/2/25 IDSL, Intelligent Database System Lab

Probability estimation methods Deficiencies of decison tree methods Smoothing Curtailment Calibrating naive Bayes classifier scores Averaging probability estimates 2019/2/25 IDSL, Intelligent Database System Lab

Deficiencies of decison tree methods Standard decision tree methods assign by default the raw training frequency p=k/n These are not accurate conditional probability estimate for at least two reasons High bias High variance Pruning methods can alleviate it but it is not suitable for unbalanced datasets 2019/2/25 IDSL, Intelligent Database System Lab

Deficiencies of decison tree methods( cont.) The solution use C4.5 without pruning and without collapsing to obtain raw scores that can be transformed into accurate class membership probabilities 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Smoothing Using the Laplace correction method For a two-class problem, it replaces the conditional probability estimate p=k/n by p’=(k+1)/(n+2) that adjusts probabilities estimates to be closer to ½ With donation it replace the probability p=k/n by p’=(k+bm)/(n+m),where b is the base rate of the positive class and m is a parameter 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Smoothing( cont.) For example, a leaf contains four examples, one of which is positive, the raw C4.5 score of this leaf is 0.25. The smoothed score with m=200 and b=0.05 is 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Smoothing( cont.) 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Curtailment To overcome the problem of overfit Curtailment is not equivalent to any type of pruning 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Curtailment( cont.) 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Curtailment( cont.) 2019/2/25 IDSL, Intelligent Database System Lab

Calibrating naive Bayes classifier scores Using a histogram method to obtain calibrated probabilityestimates from a naive Bayesian classifier Sort the training examples acording to their scores and divide the sorted set into b equal size bins Given a test example x, place it in a bin according to its score n(x) and then estimate the corrected probability 2019/2/25 IDSL, Intelligent Database System Lab

Averaging probability estimates Combining the probability estimates given by different classifiers throught averaging can reduce the variance of the probability estimates[ Tumer and Ghosh,1995] Where is the variance of each original clasifier,N is the number of classifiers and is the correlatin factor among all classifiers 2019/2/25 IDSL, Intelligent Database System Lab

Estimaition donation amounts For non-donors in the training set it should impute a donation amount of zero since their actual donation amount is zero as analogous to donation probability It is also wrong to using the same donation estimate for all test examples means that the decision about donate is based on the probability 2019/2/25 IDSL, Intelligent Database System Lab

Estimaition donation amounts( cont.) These costs or benefits must be estimated for each example Using least-squares multiple linear regression(MLB) to estimate donaition Lastgift:dollar amount of most recent gift Ampergift:average gift amount in responses to the last 22 promotions 2019/2/25 IDSL, Intelligent Database System Lab

Estimaition donation amounts( cont.) The problem of sample selection bias Donation amounts estimated by the regression equation tend to be too low for test examples that have a low probability of donation 2019/2/25 IDSL, Intelligent Database System Lab

Estimaition donation amounts( cont.) Heckman correction To learn a probit linear model to estimate conditional probabilities P(j=1|x) To estimate y(x) by llinear regression using only the training examples x for which j(x)=1,but including value of P(j=1|x) Second step of Heckman’s procedure in this paper is obtain by decision tree or a navie Bayes classifier 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Experimental results 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Conclusion The method of cost-sensitive learning that performs systematically better than MetaCost in experiments To provide a solution to the fundamental problem of costs being different for different examples To identify and solve the problem of sample selection bias 2019/2/25 IDSL, Intelligent Database System Lab

IDSL, Intelligent Database System Lab Opinion Frequency is not the only metric Positive and negative classes are not 1 and 0 question 2019/2/25 IDSL, Intelligent Database System Lab