Download presentation
Presentation is loading. Please wait.
Published byPrecious Devon Modified over 9 years ago
1
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines 3.Ensemble Methods 4.ROC Curves
2
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 General Idea
3
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate, = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:
4
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting
5
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability (1 – 1/n) n of being selected
6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round
7
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds
8
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Example: AdaBoost l Base classifiers: C 1, C 2, …, C T l Error rate: l Importance of a classifier:
9
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Example: AdaBoost l Weight update: l If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the re-sampling procedure is repeated l Classification: Increase weight if misclassification; Increase is proportional to classifiers Importance.
10
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Basic AdaBoost Loop (Alg. 5.7 textbook) D 1 = initial dataset with equal weights FOR i=1 to k DO Learn new classifier C i; Computer i ; Update example weights; Create new training set D i+1 (using weighted sampling) END FOR Return Ensemble classifier that uses C i weighted by i (i=1,k)
11
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 Illustrating AdaBoost Data points for training Initial weights for each data point
12
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Illustrating AdaBoost
13
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Methods for Performance Evaluation l How to obtain a reliable estimate of performance? l Performance of a model may depend on other factors besides the learning algorithm: –Class distribution –Cost of misclassification –Size of training and test sets
14
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Learning Curve l Learning curve shows how accuracy changes with varying sample size l Requires a sampling schedule for creating learning curve: l Arithmetic sampling (Langley, et al) l Geometric sampling (Provost et al) Effect of small sample size: - Bias in the estimate - Variance of estimate
15
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 ROC (Receiver Operating Characteristic) l Developed in 1950s for signal detection theory to analyze noisy signals –Characterize the trade-off between positive hits and false alarms l ROC curve plots TP (on the y-axis) against FP (on the x-axis) l Performance of each classifier represented as a point on the ROC curve –changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point
16
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 ROC Curve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=0.88 - 1-dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive
17
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 ROC Curve (TP,FP): l (0,0): declare everything to be negative class l (1,1): declare everything to be positive class l (1,0): ideal l Diagonal line: –Random guessing –Below diagonal line: prediction is opposite of the true class
18
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Using ROC for Model Comparison l No model consistently outperform the other l M 1 is better for small FPR l M 2 is better for large FPR l Area Under the ROC curve l Ideal: Area = 1 l Random guess: Area = 0.5
19
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 How to Construct an ROC curve InstanceP(+|A)True Class 10.95+ 20.93+ 30.87- 40.85- 5 - 6 + 70.76- 80.53+ 90.43- 100.25+ Use classifier that produces posterior probability for each test instance P(+|A) Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN)
20
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 How to construct an ROC curve Threshold >= ROC Curve: + + - - - + - + - + - + -
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.