by B. Zadrozny and C. Elkan

by B. Zadrozny and C. Elkan
Learning and Making Decisions When Costs and Probabilities are Both Unknown by B. Zadrozny and C. Elkan

Contents Introduce the Problem Previous work
Direct Cost Sensitive Decision Making The Dataset Estimating Class Membership Probabilities Estimating Costs Results and Conclusions

Introduction Costs/Benefits are the values assigned to classification decisions. Cost are often different for different examples Often we are interested in the rare class in cost-sensitive learning Hence the problem of unbalanced data

Cost Sensitive Decisions
Each training and test example has associated cost General optimal prediction Methods differ w.r.t and Previous literature has assumed cost are known in advance and independent of examples.

MetaCost Estimation of Training changes labelling to its optimum
Estimated in training only. Example independent Training changes labelling to its optimum Learns a classifier to predict labeling of test examples.

Direct Cost-Sensitive-Decision-Making
Estimation of Average of Naïve Bayes and Decision Trees . Estimated on training and test sets. Multiple Linear Regression. Unbiased estimate using Heckman procedure Example dependant. Evaluation Evaluate against MetaCost and KDD competition results. Using a large and difficult dataset, KDD.

MetaCost Implementation
For evaluation MetaCost is adapted Probability class estimates found by simple methods using decision trees Cost are made example dependant during training Adapted MetaCost vs. DCSDM DCSDM uses two models on test example MetaCost one. Estimation was made in both training and test examples in DCSDM.

The Data Mining Task Data on persons who have donated in the past to a certain charity, KDD '98 competition Non-donor and donor labelling based on last campaign The task is to choose which donors to ask for new donations Training/Test set 95,412 records, labelled donors or non-donors and donation amount 96,367 unlabelled records from same donation campaign

Data Mining Task cont. Cost of soliciting $0.68
Donations range from $1-200 5% donors and 95% non-donors Very low response rate and varying donations make hard to beat soliciting to everyone. The dataset set is hard Already been filtered to be a reasonable set of prospects The task is to improve upon the unknown method that produced the set

Applied DCSDM In KDD we will change C(i,j,x) to B(i,j,x)
Costs become benefits B(1,1,x) is example dependant Replaced by a constant by previous literature Actual Non-donor Donor Predict Predict Donor -0.68 y(x)-0.68

Optimal policy The expected benefit of not soliciting, i = 0
Expected benefit of soliciting, i = 1 Optimal policy:

Optimal decisions require Class sizes may be highly unbalanced
Two methods proposed Decision Trees - Smoothing Curtailment Naïve Bayes - Binning

Problems w. Decision Trees
Decision trees assign as a score to each leaf the raw training frequency of that leaf. High Bias Decision trees growing methods try to make leaves Homogeneous. p’s tend to be over or under estimates High Variance When n is small p not to be trusted. Chat about classical pruning methods

Smoothing Pruning is no good for us.
To make the estimates less extreme lets replace: b – base rate, m – heuristic value (smoothing strength) Effect where k, n small p’ essentially just base rate. If k, n larger then p’ is ‘combination’ of base rate and original score

Smoothed Scores

Curtailment What if the leaves have enough training examples to be statistically reliable? Then smoothing seems to be unnecessary. Curtailment searches through the tree and removes nodes where n < v. V chosen either through cross-validation, or a heuristic, like b.v = 10.

Curtailed Tree

Curtailed Scores

Naïve Bayes Classifiers
Assumes that within any class the attribute values are independent variables. This assumption gives inaccurate probability estimates But, attributes tend to be positively correlated so naïve Bayes estimates tend to be too extreme, i.e. close to zero or one. So, they do rank examples well:

Calibrating Naïve Bayes Scores
The Histogram method: Sort training examples by n.b. scores Divide sorted set into b subsets of equal size, called bins For each bin compute lower and upper boundary n.b. scores Given a new data point x Calculate and find the associated bin Let = fraction of positive training examples in that bin

Averaging Probability Estimates
If probability estimates are partially uncorrelated then it follows that averaging these estimates will reduce their variance. Assuming all estimates have the same variance the average estimate will have a variance given by: The individual classifier variances The number of classifiers The correlation factor among all classifiers

Estimating Donation Amount
Solicit the person based on policy. Policy is estimated donation amount

Cost and Probability Good Decisions Why? Probability
Estimating Cost well is more important than estimating probabilities. Why? Relative variation of cost across different examples is much greater than the relative variation of probabilities Probability Estimating Donation probability is difficult. Estimating donation amount are easier because past amount are excellent predictor of future amounts.

Training and Test data Two random process
Donate or not to. How much to donate? Donation Amount. Method used for estimating donation amount is called as Multiple Linear regression (MLR). Donor Non donor Training data Known - Test data unknown

Multiple Linear Regression
Two attributes are used lastgift : dollar amount of most recent gift. ampergift : average gift amount in response to the last 22 promotions Linear Regression equation is used to estimate donation amount. 46 of 4843 donations recorded have donation amount more than $50. Donors that have donated at most $50 are used as input for linear regression.

Problem of Sample Selection Bias
Reasoning outside your learning space. Donation Amount Estimating Donation Amount Any donation estimator is learned on the basis of people who actually donated. This estimator is applied to different population consisting of donors and non-donors. Donor Non donor Training data Known - Test data unknown

Donation Amount and Probability Estimates are Negatively Correlated

Solution to Sample Selection Bias
Heckman’s procedure Estimate conditional probability p( j=1 | x) using linear probit model. Estimate y(x) on training dataset for which j (x) = 1 by including a transformation for each x using the estimated values of conditional probability. Their own procedure conditional probability is learned using decision tree or Naïve bayes classifier. These probability estimates are added as additional attributes by estimating y(x).

Experimental Results Direct cost sensitive Decision Making Meta cost

Experimental Results Interpretation
With Heckman profits on test set increases by $484, in all probability estimation methods. Systematic improvement indicates that Heckman’s procedure solves the problem of Sample Selection Bias Meta cost Best result of Meta cost is $14113. Best result of Direct cost sensitive method is $15329. On an average, profit achieved in Meta Cost on test set is $1751 lower than the profit achieved in case of direct cost-sensitive decision making.

Statistical Significance of Results
4872 donors in fixed test set Average donation of $15.62 Different Test set drawn randomly from same probability distribution would expect a standard deviation of sqrt(4872) Fluctuation will cause a change of about $1090. sqrt(4872) * = $1090. Profit Difference between two methods less than $1090 is not significant.

Conclusions Cost sensitive learning is better than Meta cost.
Provides solution to fundamental problem of cost being example dependent. Identify and solves the problem of Sample Selection Bias for KDD’98 dataset

Questions?

by B. Zadrozny and C. Elkan

Similar presentations

Presentation on theme: "by B. Zadrozny and C. Elkan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

by B. Zadrozny and C. Elkan

Similar presentations

Presentation on theme: "by B. Zadrozny and C. Elkan"— Presentation transcript:

Similar presentations

About project

Feedback