Download presentation
Presentation is loading. Please wait.
1
National Taiwan University
Performance Evaluation: Accuracy Estimate for Classification and Regression J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept. National Taiwan University
2
Introduction to Performance Evaluation
Performance evaluation: An objective procedure to derive the performance index (or figure of merit) of a given model Typical performance indices Accuracy Recognition rate: ↗(望大) Error rate: ↘(望小) RMSE (root mean squared error): ↘(望小) Computation load: ↘ (望小)
3
Synonyms Sets of synonyms to be used interchangeably (since we are focusing on classification): Classifiers, models Recognition rate, accuracy Training, design-time Test, run-time
4
Performance Indices for Classifiers
Performance indices of a classifier Recognition rate How to have an objective method or procedure to derive it Computation load Design-time computation (training) Run-time computation (test) Our focus Recognition rate and the procedures to derive it The estimated accuracy depends on Dataset partitioning Model (types and complexity) Quiz!
5
Methods for Performance Evaluation
Methods to derive the recognition rates Inside test (resubstitution accuracy) One-side holdout test Two-side holdout test M-fold cross validation Leave-one-out cross validation
6
Canonical Dataset Partition (1/3)
Data partitioning: to make the best use of the dataset for both model construction & evaluation Training set: For model construction Validation set: For model evaluation & selection Test set: For evaluating the final model (which was the model selected by the selection process). Quiz! Whole dataset Training set Validation set Test set For model construction For model Evaluation & selection For final model evaluation Disjoint!
7
Canonical Dataset Partition (2/3)
Usage scenarios: Your instructor wants you to develop a classifier using training set which should perform good on validation set. And, most likely, your instructor will finally check the results of your classifier on the test set, which he didn't share with you. You can use historical data to create a stock predictor, where the data is partitioned into training set for model construction and validating set for model selection (to prevent overfitting). Then you can use future data as the test set (which won’t be available until the date comes) for evaluating the performance of your predictor. Quiz!
8
Canonical Dataset Partition (3/3)
Characteristics: Validation set is used for model selection, particularly for preventing overfitting. Performance on validation set should be close to (or a little higher than) the performance on test set To make a better use of the dataset when it is not too big: Use cross validation on the training & validation sets to have a more reliable estimate of the accuracy. Do not allocate test set and estimate the classifier’s performance based on validation set
9
Simplified Dataset Partition (1/3)
Data partitioning Training set: For model construction Test set: For model evaluation Drawback No validation set for model selection. Model is selected by the training error and it is likely to run into the risk of overfitting. Whole dataset Training set Test set For model construction For final model evaluation Disjoint!
10
Simplified Dataset Partition (2/3)
Data partitioning Training set: For model construction Validation set: For model evaluation & selection Drawback No test set for final model evaluation. Performance on validation set may also lead to overfitting. Whole dataset Training set Validation set For model construction For model Evaluation & selection
11
Simplified Dataset Partition (3/3)
Data partitioning Training set: For model construction Validation set: For model evaluation & selection Characteristics Can obtain a more robust estimate of the model’s performance based on the average performance of the validation sets Could also lead to overfitting Whole dataset Training set & validation set Use cross validation to rotate training and validation sets
12
Why It’s So Complicated?
To avoid overfitting! Overfitting in regression Overfitting in classification
13
Inside Test (1/2) Dataset partitioning Recognition rate (RR)
Use the whole dataset for training & evaluation Recognition rate (RR) Inside-test or resubstitution recognition rate
14
Inside Test (2/2) Characteristics
Too optimistic since RR tends to be higher For instance, 1-NNC always has an RR of 100%! Can be used as the upper bound of the true RR. Potential reasons for low inside-test RR: Bad features of the dataset Bad method for model construction, such as Bad results from neural network training Bad results from k-means clustering
15
One-side Holdout Test (1/2)
Dataset partitioning Training set for model construction Validation set for performance evaluation Recognition rate Inside-test RR Outside-test RR
16
One-side Holdout Test (2/2)
Characteristics Highly affected by data partitioning Usually Adopted when training (design-time) computation load is high (for instance, deep neural networks)
17
Two-side Holdout Test (1/3)
Dataset partitioning Training set for model construction Validation set for performance evaluation Role reversal
18
Two-side Holdout Test (2/3)
Two-side holdout test (two-fold cross-validation) construction Dataset A Model A evaluation RRBA Model B evaluation RRAB Dataset B construction Outside test!
19
Two-sided Holdout Test (3/3)
Characteristics Better use of the dataset Still highly affected by the partitioning Suitable for models with high training (design-time) computation load
20
M-fold Cross Validation (1/3)
Data partitioning Partition the dataset into m folds One fold for test, the other folds for training Repeat m times Quiz!
21
M-fold Cross Validation (2/3)
construction . m disjoint sets . Model k evaluation . Outside test!
22
M-fold Cross Validation (3/3)
Characteristics When m=2 Two-sided holdout test When m=n Leave-one-out cross validation The value of m depends on the computation load imposed by the selected model.
23
Leave-one-out Cross Validation (1/3)
Data partitioning When m=n and Di = (xi, yi) Quiz!
24
Leave-one-out Cross Validation (2/3)
Leave-one-out CV construction . n i/o pairs . Either 0% or 100%! Model k evaluation . Outside test!
25
Leave-one-out Cross Validation (3/3)
LOOCV (leave-one-out cross validation) Strength: Best use of the dataset to derive a reliable accuracy estimate Drawback: Perform model construction n times Slow! To speed up the computation LOOCV Construct a common part that is used repeatedly, such as Global mean and covariance for QC or NBC How to construct the final model after CV? Use the selected model structure on the whole dataset (training set + validation set) to construct the final model. Quiz!
26
Applications of Cross Validation
Applications of CV Input (feature) selection Model complexity determination, for instance: Polynomial order in polynomial fitting Number of prototypes for VQ-based 1-NNC Performance comparison among different models
27
Misuse of Cross Validation (CV)
Caveat of CV Do not try to boost validation RR too much, or you are running the risk of indirectly training on the left-out data! This is likely to happen when The dataset is small. The dimension is high.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.