Overfitting and Underfitting

Slides:



Advertisements
Similar presentations
Lazy Paired Hyper-Parameter Tuning
Advertisements

Neural networks Introduction Fitting neural networks
Machine Learning and Data Mining Linear regression
CMPUT 466/551 Principal Source: CMU
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
x – independent variable (input)
Sparse vs. Ensemble Approaches to Supervised Learning
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Part I: Classification and Bayesian Learning
Ensemble Learning (2), Tree and Forest
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
The problem of overfitting
Regularization (Additional)
M Machine Learning F# and Accord.net.
CORRECTIONS L2 regularization ||w|| 2 2, not ||w|| 2 Show second derivative is positive or negative on exams, or show convex – Latter is easier (e.g. x.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Machine Learning 5. Parametric Methods.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Evaluating Classifiers
Lecture 19. Review CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Boosting and Additive Trees (2)
CSE 4705 Artificial Intelligence
Project 4: Facial Image Analysis with Support Vector Machines
10701 / Machine Learning.
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
ECE 5424: Introduction to Machine Learning
Machine Learning for dotNET Developer Bahrudin Hrnjica, MVP
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
ECE 5424: Introduction to Machine Learning
Probabilistic Models for Linear Regression
Logistic Regression Classification Machine Learning.
CSC 578 Neural Networks and Deep Learning
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining Practical Machine Learning Tools and Techniques
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Hyperparameters, bias-variance tradeoff, validation
Large Scale Support Vector Machines
10701 / Machine Learning Today: - Cross validation,
Linear Model Selection and regularization
Approaching an ML Problem
The Bias Variance Tradeoff and Regularization
Neural Networks Geoff Hulten.
Neural networks (1) Traditional multi-layer perceptrons
Machine learning overview
Shih-Yang Su Virginia Tech
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Introduction to Neural Networks
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Reinforcement Learning (2)
Image recognition.
CSC 578 Neural Networks and Deep Learning
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Reinforcement Learning (2)
Logistic Regression Geoff Hulten.
Presentation transcript:

Overfitting and Underfitting Geoff Hulten

Don’t expect your favorite learner to always be best Try different approaches and compare

TotalLoss = Bias + Variance (+ noise) Bias and Variance Bias – error caused because the model can not represent the concept Variance – error caused because the learning algorithm overreacts to small changes (noise) in the training data TotalLoss = Bias + Variance (+ noise)

Visualizing Bias Goal: produce a model that matches this concept True Concept

Visualizing Bias Goal: produce a model that matches this concept Training Data for the concept Training Data

Visualizing Bias Goal: produce a model that matches this concept Bias Mistakes Goal: produce a model that matches this concept Training Data for concept Bias: Can’t represent it… Model Predicts + Model Predicts - Fit a Linear Model

Visualizing Variance Goal: produce a model that matches this concept Different Bias Mistakes Goal: produce a model that matches this concept New data, new model Model Predicts + Model Predicts - Fit a Linear Model

Visualizing Variance Goal: produce a model that matches this concept Mistakes will vary Goal: produce a model that matches this concept New data, new model New data, new model… Variance: Sensitivity to changes & noise Model Predicts + Model Predicts - Fit a Linear Model

Another way to think about Bias & Variance

Bias and Variance: More Powerful Model Predicts + Powerful Models can represent complex concepts No Mistakes! Model Predicts -

Bias and Variance: More Powerful Model Predicts + But get more data… Not good! Model Predicts -

Overfitting vs Underfitting Fitting the data too well Features are noisy / uncorrelated to concept Modeling process very sensitive (powerful) Too much search Learning too little of the true concept Features don’t capture concept Too much bias in model Too little search to fit model

The Effect of Noise

The Effect of Features Not much info Won’t learn well Powerful -> high variance Throw out 𝑥 2 Captures concept Simple model -> low bias Powerful -> low variance New 𝑥 3

The Power of a Model Building Process Weaker Modeling Process ( higher bias ) More Powerful Modeling Process (higher variance) Simple Model (e.g. linear) Fixed sized Model (e.g. fixed # weights) Small Feature Set (e.g. top 10 tokens) Constrained Search (e.g. few iterations of gradient descent) Complex Model (e.g. high order polynomial) Scalable Model (e.g. decision tree) Large Feature Set (e.g. every token in data) Unconstrained Search (e.g. exhaustive search)

Example of Under/Over-fitting

Ways to Control Decision Tree Learning Increase minToSplit Increase minGainToSplit Limit total number of Nodes Penalize complexity 𝐿𝑜𝑠𝑠 𝑆 = 𝑖 𝑛 𝐿𝑜𝑠𝑠( 𝑦 𝑖 ^ , 𝑦 𝑖 ) + 𝛼 𝐿𝑜𝑔 2 (# 𝑁𝑜𝑑𝑒𝑠)

Ways to Control Logistic Regression Adjust Step Size Adjust Iterations / stopping criteria of Gradient Descent Regularization 𝐿𝑜𝑠𝑠 𝑆 = 𝑖 𝑛 𝐿𝑜𝑠𝑠( 𝑦 𝑖 ^ , 𝑦 𝑖 ) + 𝛼 𝑗 # 𝑊𝑒𝑖𝑔ℎ𝑡𝑠 | 𝑤 𝑗 |

Modeling to Balance Under & Overfitting Data Learning Algorithms Feature Sets Complexity of Concept Search and Computation Parameter sweeps!

Parameter Sweep # optimize first parameter for p in [ setting_certain_to_underfit, …, setting_certain_to_overfit]: # do cross validation to estimate accuracy # find the setting that balances overfitting & underfitting # optimize second parameter # etc… # examine the parameters that seem best and adjust whatever you can…

Types of Parameter Sweeps Optimize one parameter at a time Optimize one, update, move on Iterate a few times Gradient descent on meta- parameters Start somewhere ‘reasonable’ Computationally calculate gradient wrt change in parameters Grid Try every combination of every parameter Quick vs Full runs Expensive parameter sweep on ‘small’ sample of data (e.g. grid) A bit of iteration on full data to refine Intuition & Experience Learn your tools Learn your problem

Summary of Overfitting and Underfitting Bias / Variance tradeoff a primary challenge in machine learning Internalize: More powerful modeling is not always better Learn to identify overfitting and underfitting Tuning parameters & interpreting output correctly is key