Cross-validation Brenda Thomson/ Peter Fox Data Analytics

Slides:

Advertisements

Similar presentations

Model Assessment and Selection

Advertisements

Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Resampling techniques

2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.

Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.

On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.

Today Evaluation Measures Accuracy Significance Testing

EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.

CLassification TESTING Testing classifier accuracy

Model Building III – Remedial Measures KNNL – Chapter 11.

“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.

CpSc 810: Machine Learning Evaluation of Classifier.

Resampling techniques

Limits to Statistical Theory Bootstrap analysis ESM April 2006.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages

Validation methods.

Chapter 6 Becoming Acquainted With Statistical Concepts.

PREDICT 422: Practical Machine Learning Module 3: Resampling Methods in Machine Learning Lecturer: Nathan Bastian, Section: XXX.

Estimating standard error using bootstrap

Data Science Credibility: Evaluating What’s Been Learned

Ensemble Classifiers.

Machine Learning: Ensemble Methods

Chapter 3 INTERVAL ESTIMATES

7. Performance Measurement

Why Model? Make predictions or forecasts where we don’t have data.

Becoming Acquainted With Statistical Concepts

Chapter 3 INTERVAL ESTIMATES

Chapter 7 (b) – Point Estimation and Sampling Distributions

Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN

Sampling Population: The overall group to which the research findings are intended to apply Sampling frame: A list that contains every “element” or.

Chapter 13 – Ensembles and Uplift

9. Credibility: Evaluating What’s Been Learned

Kaniz Rashid Lubana Mamun MS Student: CSU Hayward Dr. Eric A. Suess

Statistical Data Analysis

Understanding Standards Event Higher Statistics Award

Empirical Evaluation (Ch 5)

Ungraded quiz Unit 6.

Estimates of Bias & The Jackknife

Data Mining Practical Machine Learning Tools and Techniques

Test for Mean of a Non-Normal Population – small n

Machine Learning Techniques for Data Mining

2. Stratified Random Sampling.

Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka

10701 / Machine Learning Today: - Cross validation,

Ch13 Empirical Methods.

Learning Algorithm Evaluation

Evaluating Hypotheses

CSCI N317 Computation for Scientific Applications Unit Weka

Bootstrapping Jackknifing

Model Evaluation and Selection

Chapter 6: Becoming Acquainted with Statistical Concepts

Statistical Data Analysis

Lecture 5. Learning (II) Sampling

Ensemble learning Reminder - Bagging of Trees Random Forest

Bootstrap and randomization methods

CS639: Data Management for Data Science

Introduction to Machine learning

COSC 4368 Intro Supervised Learning Organization

Machine Learning: Lecture 5

Investigating Populations

Evaluation David Kauchak CS 158 – Fall 2019.

Presentation transcript:

Cross-validation Brenda Thomson/ Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960 Group 2 Module 7, October 16, 2018

Contents

Numeric v. non-numeric

Cross-validation Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. I.e. predictive and prescriptive analytics…

Cross-validation In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). Sound familiar?

Cross-validation The goal of cross validation is to define a dataset to "test" the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting And, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc.

Common type of x-validation K-fold 2-fold (do you know this one?) Rep-random-subsample Leave out-subsample Lab in a few weeks … to try these out

K-fold Original sample is randomly partitioned into k equal size subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. Repeat cross-validation process k times (folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (usually) to produce a single estimation.

Leave out subsample As the name suggests, leave-one-out cross-validation (LOOCV) involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. i.e. K=n-fold cross-validation Leave out > 1 = bootstraping and jackknifing

boot(strapping) Generate replicates of a statistic applied to data (parametric and nonparametric). nonparametric bootstrap, possible methods: ordinary bootstrap, the balanced bootstrap, antithetic resampling, and permutation. For nonparametric multi-sample problems stratified resampling is used: this is specified by including a vector of strata in the call to boot. importance resampling weights may be specified.

Jackknifing Systematically recompute the statistic estimate, leaving out one or more observations at a time from the sample set From this new set of replicates of the statistic, an estimate for the bias and an estimate for the variance of the statistic can be calculated. Often use log(variance) [instead of variance] especially for non-normal distributions

Repeat-random-subsample Random split of the dataset into training and validation data. For each such split, the model is fit to the training data, and predictive accuracy is assessed using the validation data. Results are then averaged over the splits. Note: for this method can the results will vary if the analysis is repeated with different random splits.

Advantage? The advantage of K-fold over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used The advantage of rep-random over k-fold cross validation is that the proportion of the training/validation split is not dependent on the number of iterations (folds).

Disadvantage The disadvantage of rep-random is that some observations may never be selected in the validation subsample, whereas others may be selected more than once. i.e., validation subsets may overlap.

Assignment 6 Your term projects should fall within the scope of a data analytics problem of the type you have worked with in class/ labs, or know of yourself – the bigger the data the better. This means that the work must go beyond just making lots of figures. You should develop the project to indicate you are thinking of and exploring the relationships and distributions within your data. Start with a hypothesis, think of a way to model and use the hypothesis, find or collect the necessary data, and do both preliminary analysis, detailed modeling and summary (interpretation). 6000-level students must develop at least two types of models. Note: You do not have to come up with a positive result, i.e. disproving the hypothesis is just as good. Introduction (2%) % may change… Data Description (3%) Analysis (5%) Model Development (12%) Conclusions and Discussion (3%) Oral presentation (5%) (~5 mins)