Cross-validation Brenda Thomson/ Peter Fox Data Analytics

Slides:



Advertisements
Similar presentations
Model Assessment and Selection
Advertisements

Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Resampling techniques
Evaluation.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Today Evaluation Measures Accuracy Significance Testing
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
CLassification TESTING Testing classifier accuracy
Model Building III – Remedial Measures KNNL – Chapter 11.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
CpSc 810: Machine Learning Evaluation of Classifier.
Resampling techniques
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Validation methods.
Chapter 6 Becoming Acquainted With Statistical Concepts.
PREDICT 422: Practical Machine Learning Module 3: Resampling Methods in Machine Learning Lecturer: Nathan Bastian, Section: XXX.
Estimating standard error using bootstrap
Data Science Credibility: Evaluating What’s Been Learned
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Chapter 3 INTERVAL ESTIMATES
7. Performance Measurement
Why Model? Make predictions or forecasts where we don’t have data.
Becoming Acquainted With Statistical Concepts
Chapter 3 INTERVAL ESTIMATES
Chapter 7 (b) – Point Estimation and Sampling Distributions
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
Sampling Population: The overall group to which the research findings are intended to apply Sampling frame: A list that contains every “element” or.
Chapter 13 – Ensembles and Uplift
9. Credibility: Evaluating What’s Been Learned
Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess
Statistical Data Analysis
4 Sampling.
Understanding Standards Event Higher Statistics Award
Empirical Evaluation (Ch 5)
Ungraded quiz Unit 6.
Estimates of Bias & The Jackknife
Data Mining Practical Machine Learning Tools and Techniques
Test for Mean of a Non-Normal Population – small n
Machine Learning Techniques for Data Mining
2. Stratified Random Sampling.
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
10701 / Machine Learning Today: - Cross validation,
Ch13 Empirical Methods.
Learning Algorithm Evaluation
Evaluating Hypotheses
CSCI N317 Computation for Scientific Applications Unit Weka
Sampling.
Bootstrapping Jackknifing
Model Evaluation and Selection
Chapter 6: Becoming Acquainted with Statistical Concepts
Statistical Data Analysis
Lecture 5. Learning (II) Sampling
Ensemble learning Reminder - Bagging of Trees Random Forest
Bootstrap and randomization methods
CS639: Data Management for Data Science
Introduction to Machine learning
COSC 4368 Intro Supervised Learning Organization
Machine Learning: Lecture 5
Investigating Populations
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Cross-validation Brenda Thomson/ Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960 Group 2 Module 7, October 16, 2018

Contents

Numeric v. non-numeric

Cross-validation Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. I.e. predictive and prescriptive analytics…

Cross-validation In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). Sound familiar?

Cross-validation The goal of cross validation is to define a dataset to "test" the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting And, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc.

Common type of x-validation K-fold 2-fold (do you know this one?) Rep-random-subsample Leave out-subsample Lab in a few weeks … to try these out

K-fold Original sample is randomly partitioned into k equal size subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. Repeat cross-validation process k times (folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (usually) to produce a single estimation.

Leave out subsample As the name suggests, leave-one-out cross-validation (LOOCV) involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. i.e. K=n-fold cross-validation Leave out > 1 = bootstraping and jackknifing

boot(strapping) Generate replicates of a statistic applied to data (parametric and nonparametric). nonparametric bootstrap, possible methods: ordinary bootstrap, the balanced bootstrap, antithetic resampling, and permutation. For nonparametric multi-sample problems stratified resampling is used: this is specified by including a vector of strata in the call to boot. importance resampling weights may be specified.

Jackknifing Systematically recompute the statistic estimate, leaving out one or more observations at a time from the sample set From this new set of replicates of the statistic, an estimate for the bias and an estimate for the variance of the statistic can be calculated. Often use log(variance) [instead of variance] especially for non-normal distributions

Repeat-random-subsample Random split of the dataset into training and validation data. For each such split, the model is fit to the training data, and predictive accuracy is assessed using the validation data. Results are then averaged over the splits. Note: for this method can the results will vary if the analysis is repeated with different random splits.

Advantage? The advantage of K-fold over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used The advantage of rep-random over k-fold cross validation is that the proportion of the training/validation split is not dependent on the number of iterations (folds).

Disadvantage The disadvantage of rep-random is that some observations may never be selected in the validation subsample, whereas others may be selected more than once. i.e., validation subsets may overlap.

Assignment 6 Your term projects should fall within the scope of a data analytics problem of the type you have worked with in class/ labs, or know of yourself – the bigger the data the better. This means that the work must go beyond just making lots of figures. You should develop the project to indicate you are thinking of and exploring the relationships and distributions within your data. Start with a hypothesis, think of a way to model and use the hypothesis, find or collect the necessary data, and do both preliminary analysis, detailed modeling and summary (interpretation). 6000-level students must develop at least two types of models. Note: You do not have to come up with a positive result, i.e. disproving the hypothesis is just as good. Introduction (2%) % may change… Data Description (3%) Analysis (5%) Model Development (12%) Conclusions and Discussion (3%) Oral presentation (5%) (~5 mins)