Download presentation
Presentation is loading. Please wait.
Published byBonnie Howard Modified over 8 years ago
2
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages 152-156
3
Training & Testing Dilemma Want a large training data set Want a large testing dataset Often don’t have enough good data
4
Training & Testing Resubstitution error rate – error rate resulting from testing on the training data This error rate will be highly optimistic Not a good indicator of what the performance will be on an independent test dataset
5
Evaluation in Weka
6
Over fitting - Negotiations
7
Over fitting - Diabetes 1R with default bucket value of 6 plas: tested_negative tested_positive tested_negative tested_positive tested_negative tested_positive tested_negative tested_positive tested_negative >= 154.5-> tested_positive (587/768 instances correct) 71.5% correct
8
Over fitting - Diabetes 1R with bucket value of 20 plas: tested_negative >= 139.5-> tested_positive (573/768 instances correct) 72.9% correct
9
Over fitting - Diabetes 1R with bucket value of 50 plas: tested_negative >= 143.5-> tested_positive (576/768 instances correct) 74.2% correct
10
Over fitting - Diabetes 1R with bucket value of 200 preg: tested_negative >= 6.5-> tested_positive (521/768 instances correct) 66.7% correct
11
Holdout Holdout procedure – hold out some data for testing Recommendation – when have enough data, holdout 1/3 of data for testing (use 2/3 rd for training)
12
Stratified Holdout Stratified holdout – check that each class is represented in approximately equal proportions in the testing dataset as it was in the overall dataset
13
Evaluation Techniques when don’t have enough data Techniques: Cross Validation, Stratified Cross Validation, Leave-One-Out Cross Validation and Bootstrapping
14
Repeated Holdout Method Repeated holdout method – Use multiple iterations, in each iteration a certain proportion of the dataset is randomly selected for training (possibly with stratification). The error rates on the different iterations are averaged to yield an overall error rate
15
Possible Problem This is still not optimum, when the proportion to be held out for testing is randomly selected, the testing sets may overlap.
16
Cross-Validation Cross-validation – decide a fixed number of “folds” or partitions of the dataset. For each of the n folds train with (n-1)/n of the dataset, test with 1/n of the dataset to estimate the error Typical stages: Split the data into n subsets of equal size Use each subset in turn for testing, the remaining for training Average the results
17
Stratified Cross-Validation Stratified n-folds cross validation, each split is made to have instances with the class variable represented proportionally
18
Recommendation When Insufficient Data 10-fold cross validation with stratification has become the standard. Book states: Extensive experiments have shown that this is the best choice to get an accurate estimate There is some theoretical evidence that this is the best choice Controversy still rages in the machine learning community
19
Leave-One-Out Cross-Validation Leave-One-Out Cross-Validation - the number of folds is the same as the number of training instances Pros: Makes the best use of the data since the greatest possible amount of data is used for training Involves no random sampling Cons: Computationally expensive (increases directly as there are more instances) None of the samples will be stratified
20
Bootstrap Methods Bootstrap – uses sampling with replacement to form the training set Sample a dataset of n instances n times with replacement to form a new dataset of n instances Use this data as the training set Use the instances from the original dataset that don’t occur in the new training set for testing
21
0.632 bootstrap Likelihood of an element not being chosen to be in the training set? (1 – 1/n) Repeat this process n times – likelihood of not being chosen? (1 – 1/n) n (1-1/2) 2 = 0.25 (1-1/3) 3 = 0.287 (1-1/4) 4 = 0.316 (1-1/5) 5 = 0.326 (1-1/6) 6 = 0.335 (1-1/7) 7 = 0.340 (1-1/8) 8 = 0.343 (1-1/9) 9 = 0.346 (1-1/10) 10 = 0.347 . (1-1/500) 500 = 0.368 (1-1/n) n converges to 0.368
22
0.632 bootstrap So an instance for largish n (n=500) has a 0.368 likelihood of not being chosen The instance has a 1-0.268 = 0.632 chance of being selected. 0.632 bootstrap method
23
0.632 bootstrap For Bootstrapping the error estimate on the test data will be very pessimistic since the training only occurred on ~63% of the instances. Therefore, combine it with weighted resubstitution error: Error = 0.632*error test_instances + 0.368 * error training_instances Repeat the process several times with different replacement samples; average the results Bootstrapping is probably the best way of estimating performance for small datasets
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.