1 Estimating Accuracy Holdout method – Randomly partition data: training set + test set – accuracy = |correctly classified points| / |test data points|

Slides:



Advertisements
Similar presentations
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Advertisements

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
More Classifier and Accuracy Measure of Classifiers
Model Evaluation Metrics for Performance Evaluation
Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Math 310 Section 7.1 Probability. What is a probability? Def. In common usage, the word "probability" is used to mean the chance that a particular event.
Evaluation.
Ensemble Learning: An Introduction
Evaluating Hypotheses
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Chapter 5 Data mining : A Closer Look.
Ensembles of Classifiers Evgueni Smirnov
Today Evaluation Measures Accuracy Significance Testing
Evaluating Classifiers
CLassification TESTING Testing classifier accuracy
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
CS 391L: Machine Learning: Ensembles
Dr. Hanan Mohamed Aly 1 Basic Probability Concepts The concept of probability is frequently encountered in everyday communication. For example, a physician.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experimental Evaluation of Learning Algorithms Part 1.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CpSc 810: Machine Learning Evaluation of Classifier.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Population Estimation Objective : To estimate from a sample of households the numbers of animals in a population and to provide a measure of precision.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Non-Bayes classifiers. Linear discriminants, neural networks.
Physics 114: Lecture 14 Mean of Means Dale E. Gary NJIT Physics Department.
Evaluating Results of Learning Blaž Zupan
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.
Classification Vikram Pudi IIIT Hyderabad.
1 Evaluation of Learning Models Literature: Literature: T. Mitchel, Machine Learning, chapter 5 T. Mitchel, Machine Learning, chapter 5 I.H. Witten and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Validation methods.
PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Introduction Imagine the process for testing a new design for a propulsion system on the International Space Station. The project engineers wouldn’t perform.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Data Science Credibility: Evaluating What’s Been Learned
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Evaluating Classifiers
Chapter 13 – Ensembles and Uplift
Evaluating Results of Learning
A “Holy Grail” of Machine Learing
Genetic Algorithms (GA)
Introduction to Data Mining, 2nd Edition
Designing Samples Section 5.1.
CS639: Data Management for Data Science
Presentation transcript:

1 Estimating Accuracy Holdout method – Randomly partition data: training set + test set – accuracy = |correctly classified points| / |test data points| Stratification – Ensure each class has approximately equal proportions in both partitions Random subsampling – Repeat holdout k times. Output average accuracy. k-fold cross-validation – Randomly partition data: S 1,S 2,…,S k – First, keep S 1 as test set, remaining as training set – Next, keep S 2 as test set, remaining as training set, etc. – accuracy = |total correctly classified points| / |total data points| Recommendation: – Stratified 10-fold cross-validation. If possible, repeat 10 times and average results. (reduces variance)

2 Is Accuracy Enough? If only 1% population has cancer, then a test for cancer that classifies all people as non-cancer will have 99% accuracy. Instead output a confusion matrix: Actual/ Estimate Class 1Class 2Class 3 Class 190%5% Class 22%91%7% Class 38%3%89%

3 Receiver Operating Characteristic Useful in visually comparing classifiers. Top-left is best. Bottom-right is worst. Area under curve is a measure of accuracy.

4 Combining Classifiers Get k random samples with replacement as training sets (like in random subsampling).  We get k classifiers Bagging: Take a majority vote for the best class for each new record Boosting: Each classifier’s vote has a weight proportional to its accuracy on training data  Like a patient taking multiple opinions from several doctors

5 Performance Evaluation DatasetNBC4.5CBATANACME Austra Breast Cleve Diabetes German Heart Lymph Pima Waveform Performance of ACME as against to other classifiers.

6 Example Imagine s represented as sets of keywords and being classified as spam/non-spam. Let us focus on 3 keywords A, B, C. From the training data, let us say we get 4 "constraints" for the spam class: – P(A) = 0.2 = d_1 (say) – P(B) = 0.3 = d_2 – P(C) = 0.1 = d_3 – P(AB) = 0.1 = d_4 Now, our task is to use the GIS algorithm to determine P(ABC).

7 Define T_i as a bit-vector representing the presence/absence of keywords. E.g. T_3 = 011 represents the presence of B,C and absence of A. Note that P(T_3) is "not" the same as P(BC) since the latter doesn't care about the presence/absence of A. ABC T_0 = 000 T_1 = 001 T_2 = 010 T_3 = 011 T_4 = 100 T_5 = 101 T_6 = 110 T_7 = 111 The previous 4 constraints can be rewritten in terms of the T_i's: P(A) = P(T_4) + P(T_5) + P(T_6) + P(T_7) = 0.2 P(B) =... = 0.3 P(C) =... = 0.1 P(AB) = P(T_6) + P(T_7) = 0.1 //AB is satisfied by T_6 and T_7

8 Define: S_1 = P(T_4) + P(T_5) + P(T_6) + P(T_7) ----(1) S_2 = (2) S_3 = (3) S_4 = P(T_6) + P(T_7) (4) Start GIS with P(T_i) = 1/8 = (for all i) Also, set mu_1 = mu_2 = mu_3 = mu_4 = 1 and mu_0 = 1 There are 4 mu's because there are 4 constraints. Next, we calculate S_i for each constraint i, as per (1), (2), (3), (4) above. S_1 = 1/8 + 1/8 + 1/8 + 1/8 = 0.5 similarly, S_2 = S_3 = 0.5, S_4 = 0.25

9 S_1 is 0.5; but we want it to be 0.2 (since P(A) = 0.2). Since S_1 is the sum of some P(T_i)'s in equation (1), we want to reduce these P(T_i)'s. We want to scale them down by 0.2 / 0.5. So we set: mu_i *= d_i / S_i. Thus, we get: mu_1 = 1 * 0.2 / 0.5 = 0.4 mu_2 = 1 * 0.3 / 0.5 = 0.6 mu_3 = 1 * 0.1 / 0.5 = 0.2 mu_4 = 1 * 0.1 / 0.25 = 0.4 Using these mu's, we recalculate P(T_i)'s as: P(T_i) = product of those mu_j's whose cp constraint is satisfied by T_i Thus: P(T_0) = 1// T_0 (000) doesn't satisfy any constraint P(T_1) = mu_3// T_1 (001) satisfies constraint 3 only P(T_2) = mu_2 P(T_3) = mu_2 * mu_3 // T_3 (011) satisfies constraints 2 & 3 P(T_4) = mu_1 P(T_5) = mu_1 * mu_3 P(T_6) = mu_1 * mu_2 * mu_4 // T_6 (110) satisfies constraints 1, 2 & 4 P(T_7) = mu_1 * mu_2 * mu_3 * mu_4 // T_7 (111) satisfies all 4 constraints

10 But sum of above P(T_i)'s might not turn out to be 1. Infact, it turns out to be for this example. So we scale all of them down by to make the sum equal 1. Then, we get: P(T_0) = 1 / = 0.4 (approx) P(T_1) = 0.08 P(T_2) = 0.24 P(T_3) = P(T_4) = 0.16 P(T_5) = P(T_6) = 0.04 P(T_7) = That was the 1st iteration of GIS. These numbers are closer to the actual P(T_i) values. For example, we know that P(A)=0.2, P(B)=0.3, P(C)=0.1. If A,B,C are mutually exclusive, then P(A or B or C) = 0.6 (the sum). Notice that P(T_0) above is 0.4 which is If we run the algorithm for more iterations we get better results. Our task was to determine P(ABC). So we just output the value of P(T_7) above.

11 Conclusion Frequent Itemsets + Maximum Entropy An accurate theoretically appealing classifier