Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages 152-156.

Similar presentations


Presentation on theme: "CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages 152-156."— Presentation transcript:

1

2 CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages 152-156

3 Training & Testing Dilemma Want a large training data set Want a large testing dataset Often don’t have enough good data

4 Training & Testing Resubstitution error rate – error rate resulting from testing on the training data  This error rate will be highly optimistic  Not a good indicator of what the performance will be on an independent test dataset

5 Evaluation in Weka

6 Over fitting - Negotiations

7 Over fitting - Diabetes 1R with default bucket value of 6 plas: tested_negative tested_positive tested_negative tested_positive tested_negative tested_positive tested_negative tested_positive tested_negative >= 154.5-> tested_positive (587/768 instances correct) 71.5% correct

8 Over fitting - Diabetes 1R with bucket value of 20 plas: tested_negative >= 139.5-> tested_positive (573/768 instances correct) 72.9% correct

9 Over fitting - Diabetes 1R with bucket value of 50 plas: tested_negative >= 143.5-> tested_positive (576/768 instances correct) 74.2% correct

10 Over fitting - Diabetes 1R with bucket value of 200 preg: tested_negative >= 6.5-> tested_positive (521/768 instances correct) 66.7% correct

11 Holdout Holdout procedure – hold out some data for testing Recommendation – when have enough data, holdout 1/3 of data for testing (use 2/3 rd for training)

12 Stratified Holdout Stratified holdout – check that each class is represented in approximately equal proportions in the testing dataset as it was in the overall dataset

13 Evaluation Techniques when don’t have enough data Techniques:  Cross Validation,  Stratified Cross Validation,  Leave-One-Out Cross Validation and  Bootstrapping

14 Repeated Holdout Method Repeated holdout method – Use multiple iterations, in each iteration a certain proportion of the dataset is randomly selected for training (possibly with stratification). The error rates on the different iterations are averaged to yield an overall error rate

15 Possible Problem This is still not optimum, when the proportion to be held out for testing is randomly selected, the testing sets may overlap.

16 Cross-Validation Cross-validation – decide a fixed number of “folds” or partitions of the dataset. For each of the n folds train with (n-1)/n of the dataset, test with 1/n of the dataset to estimate the error Typical stages:  Split the data into n subsets of equal size  Use each subset in turn for testing, the remaining for training  Average the results

17 Stratified Cross-Validation Stratified n-folds cross validation, each split is made to have instances with the class variable represented proportionally

18 Recommendation When Insufficient Data 10-fold cross validation with stratification has become the standard. Book states:  Extensive experiments have shown that this is the best choice to get an accurate estimate  There is some theoretical evidence that this is the best choice Controversy still rages in the machine learning community

19 Leave-One-Out Cross-Validation Leave-One-Out Cross-Validation - the number of folds is the same as the number of training instances Pros:  Makes the best use of the data since the greatest possible amount of data is used for training  Involves no random sampling Cons:  Computationally expensive (increases directly as there are more instances)  None of the samples will be stratified

20 Bootstrap Methods Bootstrap – uses sampling with replacement to form the training set  Sample a dataset of n instances n times with replacement to form a new dataset of n instances  Use this data as the training set  Use the instances from the original dataset that don’t occur in the new training set for testing

21 0.632 bootstrap Likelihood of an element not being chosen to be in the training set? (1 – 1/n) Repeat this process n times – likelihood of not being chosen? (1 – 1/n) n  (1-1/2) 2 = 0.25  (1-1/3) 3 = 0.287  (1-1/4) 4 = 0.316  (1-1/5) 5 = 0.326  (1-1/6) 6 = 0.335  (1-1/7) 7 = 0.340  (1-1/8) 8 = 0.343  (1-1/9) 9 = 0.346  (1-1/10) 10 = 0.347 .  (1-1/500) 500 = 0.368  (1-1/n) n converges to 0.368

22 0.632 bootstrap So an instance for largish n (n=500) has a 0.368 likelihood of not being chosen The instance has a 1-0.268 = 0.632 chance of being selected. 0.632 bootstrap method

23 0.632 bootstrap For Bootstrapping the error estimate on the test data will be very pessimistic since the training only occurred on ~63% of the instances. Therefore, combine it with weighted resubstitution error: Error = 0.632*error test_instances + 0.368 * error training_instances Repeat the process several times with different replacement samples; average the results Bootstrapping is probably the best way of estimating performance for small datasets


Download ppt "CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages 152-156."

Similar presentations


Ads by Google