CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Validation.

Slides:



Advertisements
Similar presentations
Learning Algorithm Evaluation
Advertisements

Lecture Notes for Chapter 4 (2) Introduction to Data Mining
Model generalization Test error Bias, variance and complexity
Model assessment and cross-validation - overview
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Midterm Review Goodness of Fit and Predictive Accuracy
Three kinds of learning
Evaluation and Credibility How much should we believe in what was learned?
ROC & AUC, LIFT ד"ר אבי רוזנפלד.
Evaluation and Credibility
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Regression Model Building
Today Evaluation Measures Accuracy Significance Testing
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Evaluating Classifiers
Computational Biology
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
CLassification TESTING Testing classifier accuracy
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Bootstrap and Cross-Validation Bootstrap and Cross-Validation.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Experimental Evaluation of Learning Algorithms Part 1.
Learning from Observations Chapter 18 Through
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CpSc 810: Machine Learning Evaluation of Classifier.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Motivation Conclusion Two EMSE-CMP original contributions in variable ranking and selection were implemented with the LS-SVM regression method in a Matlab.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Evaluating Results of Learning Blaž Zupan
Chapter 6 Cross Validation.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Data Analytics CMIS Short Course part II Day 1 Part 1: Introduction Sam Buttrey December 2015.
Validation methods.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
7. Performance Measurement
Evaluating Classifiers
Evaluating Results of Learning
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Evaluation and Its Methods
Learning Algorithm Evaluation
CSCI N317 Computation for Scientific Applications Unit Weka
Model Evaluation and Selection
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
(A) Illustration of the receiver operating characteristic (ROC) curve (discrimination) of the recalibrated model on the external validation set data. (A)
Evaluation and Its Methods
CS639: Data Management for Data Science
Evaluation and Its Methods
Introduction to Machine learning
Presentation transcript:

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Validation

CS685 : Special Topics in Data Mining, UKY Introduction Validation addresses the problem of over- fitting. Internal Validation: Validate your model on your current data set (cross-validation) External Validation: Validate your model on a completely new dataset

CS685 : Special Topics in Data Mining, UKY Holdout validation One way to validate your model is to fit your model on half your dataset (your “training set”) and test it on the remaining half of your dataset (your “test set”). If over-fitting is present, the model will perform well in your training dataset but poorly in your test dataset. Of course, you “waste” half your data this way, and often you don’t have enough data to spare…

CS685 : Special Topics in Data Mining, UKY Alternative strategies: Leave-one-out validation (leave one observation out at a time; fit the model on the remaining training data; test on the held out data point). K-fold cross-validation—what we will discuss today.

CS685 : Special Topics in Data Mining, UKY When is cross-validation used? Anytime you want to prove that your model is not over-fit, that it will have good prediction in new datasets.

CS685 : Special Topics in Data Mining, UKY Division of Data for Cross-Validation with Disjoint Test Subsets 13-6

CS685 : Special Topics in Data Mining, UKY 10-fold cross-validation (one example of K-fold cross-validation) 1. Randomly divide your data into 10 pieces, 1 through k. 2. Treat the 1 st tenth of the data as the test dataset. Fit the model to the other nine-tenths of the data (which are now the training data). 3. Apply the model to the test data. 4. Repeat this procedure for all 10 tenths of the data. 5. Calculate statistics of model accuracy and fit (e.g., ROC curves) from the test data only.