Introduction to Machine learning

Slides:



Advertisements
Similar presentations
Model Assessment, Selection and Averaging
Advertisements

The loss function, the normal equation,
Evaluation.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Evaluation.
Evaluating Hypotheses
Machine Learning CMPT 726 Simon Fraser University
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Evaluating Classifiers
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Efficient Model Selection for Support Vector Machines
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Experimental Evaluation of Learning Algorithms Part 1.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Validation methods.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
PREDICT 422: Practical Machine Learning Module 3: Resampling Methods in Machine Learning Lecturer: Nathan Bastian, Section: XXX.
Bootstrap and Model Validation
Data Science Credibility: Evaluating What’s Been Learned
7. Performance Measurement
Bagging and Random Forests
Evaluating Classifiers
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Learning Recommender Systems with Adaptive Regularization
Ch3: Model Building through Regression
CSE 4705 Artificial Intelligence
CSE 4705 Artificial Intelligence
Dipartimento di Ingegneria «Enzo Ferrari»,
Bias and Variance of the Estimator
Introduction to Data Mining, 2nd Edition by
Probabilistic Models for Linear Regression
Introduction to Data Mining, 2nd Edition by
Logistic Regression Classification Machine Learning.
Machine Learning Techniques for Data Mining
Unfolding Problem: A Machine Learning Approach
10701 / Machine Learning Today: - Cross validation,
Learning Algorithm Evaluation
Linear Model Selection and regularization
Pattern Recognition and Machine Learning
Simple Linear Regression
Model Evaluation and Selection
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
The loss function, the normal equation,
Ensemble learning Reminder - Bagging of Trees Random Forest
Mathematical Foundations of BME Reza Shadmehr
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Machine learning overview
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
CS639: Data Management for Data Science
Shih-Yang Su Virginia Tech
INTRODUCTION TO Machine Learning 3rd Edition
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Introduction to Machine learning Prof. Eduardo Bezerra (CEFET/RJ) ebezerra@cefet-rj.br

Model Evaluation and Selection

Visão Geral Model Evaluation Model Selection

Model Evaluation

Generalization error Error we would get if we could evaluate the model over the entire population. Just because a hypothesis fits the training set well does not mean that it is a good hypothesis. The training error (empirical error) most likely will be less than the generalization error. Generalization error (out-of-sample error) is the error we would get if we could evaluate the model over the entire population (i.e., over all the underlying joint distribution).

Model Evaluation A hypothesis may present a low training error, but still be poor (due to overfitting) Therefore, it is appropriate to evaluate the performance of an algorithm on unseen data. Model Evaluation refers to the process of estimating the generalization error of a ML model.

Evaluation metrics Accuracy, precision Precision, recall, F1 measure Squared errors Likelihood Posterior probability Cost/utility Margin KL divergence ....

Evaluation techniques Common techniques for estimating the generalization performance of a ML model: holdout method k-fold cross-validation

Holdout method (training/test) Randomly split original dataset into separate training and test datasets: training dataset: used for training a model M test dataset: used to estimate the generalization error of M Can lead to a misleading estimate of the generalization error if the test data is also used for model selection.

Holdout method (training/validation/test) A better way: separate the data into three parts. training set: used to fit several models validation set: performance on this set if used for model selection. test set: used only to estimate the generalization error. Typical proportions are 70%/15%/15% and 60%/20%/20% .

Holdout method (training/validation/test) Source: Python Machine Lerning, 2nd ed., pp 191

k-fold cross-validation Randomly split the training dataset into k folds without replacement k-1 folds are used for the model training one fold is used for performance evaluation. Procedure is repeated k times so that we obtain k models and performance estimates. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Kohavi, Ron, IIJCAI, 14 (12): 1137-43, 1995

k-fold cross-validation Source: Python Machine Lerning, 2nd ed., pp 191

k-fold cross-validation Keep in mind that, as k increases: Bias in the estimation of the generalization error decreases; Computational cost also increases. Empirical evidence shows that k=10 is a good value for datasets of moderate size. For large sized datasets, k can be safely decreased.

Other techniques Stratified k-fold cross-validation Leave-one-out cross-validation Stratified k-fold cross-validation Bootstrap validation Improvements on Cross-validation: The .632+ Bootstrap Method, B. Efron and R. Tibshirani, Journal of the American Statistical Association, 92(438): 548-560, 1997 A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, International Joint Conference on Artificial Intelligence (IJCAI), 14 (12): 1137-43, 1995

Notation : training set size : training data set : test set size              : test dataset           : error calculated in test set (test error)

Model Selection

Model Selection Hyperparameters are parameters that are not directly learnt within models. Model selection is the process of selecting the optimal hyperparameters values for a ML algorithm. Examples: degree of the polynomial in polynomial regression; regularization term; learning rate; many more…

Model Selection - example Given many models, we apply a systematic approach to identify the "best" model. The “best” model is chosen by using some quality measure (e.g., MSE in linear regression). Let us see an example in the context of polynomials with different degrees…

Model Selection - example (cont.) Suppose we want to choose the right degree to fit a polynomial regression. Suppose we want to choose (from a set of 10 possible options) the right degree to fit a polynomial regression.

Model Selection - example (cont.) To choose one of these models, we select the one with the least validation error. Suppose we want to choose (from a set of 10 possible options) the right degree to fit a polynomial regression.

Model Selection - example (cont.) Finally, we calculated the test error on the polynomial that produced the smallest validation error.

Model selection - general procedure Optimize the hyperparameters in Θ using the training set for each degree of polynomial. Find Θ*, the setting of hyperparameters with the smallest error in the validation set. Estimate the generalization by computing error of the final model in the test set.

Hyperparameter search How to do search? There are two main approaches to search in the space of hyperparameters: Grid search exhaustively considers all hyperparameter combinations, for a set of given values. Randomized search can sample a given number of candidates from a hyperparameter space with a specified distribution.

Hyperparameter search