Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Machine Learning BITS C464/BITS F464
Neural networks Introduction Fitting neural networks
Machine Learning and Data Mining Linear regression
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Linear Regression & Classification
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
Machine Learning CMPT 726 Simon Fraser University
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Part I: Classification and Bayesian Learning
Collaborative Filtering Matrix Factorization Approach
Introduction Mohammad Beigi Department of Biomedical Engineering Isfahan University
Gaussian process modelling
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Regression Regression relationship = trend + scatter
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
2011 COURSE IN NEUROINFORMATICS MARINE BIOLOGICAL LABORATORY WOODS HOLE, MA Introduction to Spline Models or Advanced Connect-the-Dots Uri Eden BU Department.
INTRODUCTION TO Machine Learning 3rd Edition
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.
CS 2750: Machine Learning Linear Regression Prof. Adriana Kovashka University of Pittsburgh February 10, 2016.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Linear Algebra Curve Fitting. Last Class: Curve Fitting.
Basic Technical Concepts in Machine Learning
CEE 6410 Water Resources Systems Analysis
CSE 4705 Artificial Intelligence
Linear Regression (continued)
Special Topics In Scientific Computing
Data Mining Lecture 11.
Bias and Variance of the Estimator
CSCI 5822 Probabilistic Models of Human and Machine Learning
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Collaborative Filtering Matrix Factorization Approach
Linear regression Fitting a straight line to observations.
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Biointelligence Laboratory, Seoul National University
Machine learning overview
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Presentation transcript:

Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India

Polynomial Curve Fitting  Seems a very trivial concept!!  Why are we discussing it in Machine Learning course?  A simple regression problem!!  It motivates a number of key concepts of ML!!  Let’s discover…

Polynomial Curve Fitting Observe Real-valued input variable x Use x to predict value of target variable t Synthetic data generated from sin(2 π x) Random noise in target values Input Variable Target Variable Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Polynomial Curve Fitting Input Variable Target Variable N observations of x x = (x1,..,xN)T t = (t1,..,tN)T Goal is to exploit training set to predict value of from x Inherently a difficult problem Data Generation: N = 10 Spaced uniformly in range [0,1] Generated from sin(2 π x) by adding small Gaussian noise Noise typical due to unobserved variables Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Polynomial Curve Fitting Input Variable Target Variable Where M is the order of the polynomial Is higher value of M better? We’ll see shortly! Coefficients w0,…wM are denoted by vector w Nonlinear function of x, linear function of coefficients w Called Linear Models Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Sum-of-Squares Error Function Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Polynomial curve fitting Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Polynomial curve fitting  Choice of M??  Called the model selection or model comparison Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

0 th Order Polynomial Poor representations of sin(2 π x) Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

1 st Order Polynomial Poor representations of sin(2 π x) Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

3 rd Order Polynomial Best Fit to sin(2 π x) Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

9 th Order Polynomial Over Fit: Poor representation of sin(2 π x) Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Polynomial Curve Fitting  Good generalization is the objective  Dependence of generalization performance on M?  Consider a data set of 100 points  Calculate E(w*) for both training data & test data  Choose M which minimizes E(w*)  Root Mean Square Error (RMS)  Sometimes convenient to use as division by N allows us to compare different sizes of data sets on equal footing  Square root ensures E RMS is measure on the same scale ( and in same units) as the target variable t Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Flexibility & Model Complexity  M=0, very rigid!! Only 1 parameter to play with!

Flexibility & Model Complexity  M=1, not so rigid!! 2 parameters to play with!

Flexibility & Model Complexity  So what value of M is most suitable?  Any Answers???

Over-fitting For small M(0,1,2) Inflexible to handle oscillations of sin(2 π x) M(3-8) flexible enough to handle oscillations of sin(2 π x) For M=9 Too flexible!! TE = 0 GE = high Why is it happening? Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Polynomial Coefficients Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Data Set Size M=9 - Larger the data set, the more complex model we can afford to fit to the data - No. of data pts should be no less than times the no. of adaptive parameters in the model Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Over-fitting Problem Should we limit the no. of parameters according to the available training set? Complexity of the model should depend only on the complexity of the problem! LSE represents a specific case of Maximum Likelihood Over-fitting is a general property of maximum likelihood Over-fitting Problem can be avoided using the Bayesian Approach!

Over-fitting Problem In Bayesian Approach, the effective number of parameters adapts automatically to the size of the data set In Bayesian Approach, models can have more parameters than the number of data points Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Regularization Penalize large coefficient values Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Regularization: Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Regularization: Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Regularization: vs. Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Polynomial Coefficients Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

Take Away from Polynomial Curve Fitting  Concept of over-fitting  Model Complexity & Flexibility Will keep revisiting it from time to time…