Probabilistic Models for Linear Regression

Slides:

Advertisements

Similar presentations

Bayesian Learning & Estimation Theory

Advertisements

Pattern Recognition and Machine Learning

Linear Regression.

Regularization David Kauchak CS 451 – Fall 2013.

Various Regularization Methods in Computer Vision Min-Gyu Park Computer Vision Lab. School of Information and Communications GIST.

Pattern Recognition and Machine Learning

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Lecture 4: Embedded methods

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University

Visual Recognition Tutorial

x – independent variable (input)

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.

Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Visual Recognition Tutorial

Collaborative Filtering Matrix Factorization Approach

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.

Andrew Ng Linear regression with one variable Model representation Machine Learning.

Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.

Generative verses discriminative classifier

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.

INTRODUCTION TO Machine Learning 3rd Edition

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Biointelligence Laboratory, Seoul National University

Linear Models for Classification

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Machine Learning 5. Parametric Methods.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

LEARNING FROM EXAMPLES AIMA CHAPTER 18 (4-5) CSE 537 Spring 2014 Instructor: Sael Lee Slides are mostly made from AIMA resources, Andrew W. Moore’s tutorials:

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Who am I? Work in Probabilistic Machine Learning Like to teach 

Probability Theory and Parameter Estimation I

Empirical risk minimization

Probability for Machine Learning

Ch3: Model Building through Regression

CSE 4705 Artificial Intelligence

Linear Regression (continued)

Lecture 09: Gaussian Processes

Roberto Battiti, Mauro Brunato

Probabilistic Models with Latent Variables

Collaborative Filtering Matrix Factorization Approach

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

10701 / Machine Learning Today: - Cross validation,

Biointelligence Laboratory, Seoul National University

Lecture 10: Gaussian Processes

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Parametric Methods Berlin Chen, 2005 References:

Empirical risk minimization

Machine learning overview

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Presentation transcript:

Probabilistic Models for Linear Regression Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regression Problem N iid training samples { 𝑥 𝑛 , 𝑦 𝑛 } Response / Output / Target : 𝑦 𝑛 ∈𝑅 Input / Feature vector: 𝑋∈ 𝑅 𝑑 Linear Regression 𝑦 𝑛 = 𝑤 𝑇 𝑥 𝑛 + 𝜖 𝑛 Polynomial Regression 𝑦 𝑛 = 𝑤 𝑇 𝜙 𝑥 𝑛 + 𝜖 𝑛 𝜙 𝑗 𝑥 = 𝑥 𝑗 Still linear function of w Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Least Squares Formulation Deterministic error term 𝜖 𝑛 Minimize total error 𝐸 𝑤 = 𝑛 𝜖 𝑛 2 𝑤 ∗ = arg min 𝑤 𝐸(𝑤) Find gradient wrt 𝑤 and equate to 0 𝑤 ∗ = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑦 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regularization for Regression How does regression overfit? Adding regularization to regression 𝐸 1 𝑤,𝐷 + 𝜆𝐸 2 𝑤 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regularization for Regression Possibilities for regularizers 𝑙 2 norm 𝑤 𝑇 𝑤 (Ridge regression) Quadratic: Continuous, convex 𝑤 ∗ = 𝜆𝐼+ 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑌 𝑙 1 norm (Lasso) Choosing 𝜆 Cross validation: wastes training data … Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic formulation Model X and Y as random variables Directly model conditional distribution of Y IID 𝑌 𝑖 | 𝑋 𝑖 =𝑥∼𝑖𝑖𝑑 𝑝(𝑦|𝑥) Linear 𝑌 𝑖 = 𝑤 𝑇 𝑋 𝑖 + 𝜖 𝑛 , 𝜖 𝑛 ∼𝑖𝑖𝑑 𝑝 𝜖 Gaussian noise 𝑝 𝜖 =𝑁 0, 𝜎 2 𝑝 𝑦 𝑥 = 1 2𝜋 𝜎 exp{− 𝑦− 𝑤 𝑇 𝑥 2 2 𝜎 2 } Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic formulation Image from Michael Jordan’s book Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Maximum Likelihood Estimation Formulate loglikelihood 𝐿 𝑤 = 𝑛 𝑝 𝑦 𝑛 𝑥 𝑛 ;𝑤 = 1 2𝜋𝜎^2 𝑁/2 exp⁡{− 1 2𝜋 𝜎 𝑛 𝑦− 𝑤 𝑇 𝑥 𝑛 2 } 𝑙 𝑤 = 𝑛 𝑦− 𝑤 𝑇 𝑥 𝑛 2 Recovers LMS formulation! Maximize to get MLE 𝑤 𝑀𝐿 = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑌 𝜎 2 𝑀𝐿 = 1 𝑁 𝑛 ( 𝑦 𝑛 − 𝑤 𝑀𝐿 𝑇 𝑥 𝑛 ) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Bayesian Linear Regression Model W as random variable with prior distribution 𝑝 𝑤 =𝑁 𝑚 0 , 𝑆 0 ;𝑤, 𝑚 0 is 𝑀×1, 𝑆 0 is 𝑀×𝑀 Derive posterior distribution 𝑝 𝑤 𝑦 =𝑁 𝑚 𝑁 , 𝑆 𝑁 (for some 𝑚 𝑁 , 𝑆 𝑁 ) Derive mean of posterior distribution 𝑤 𝐵 =𝐸 𝑊 𝑦 = 𝑚 𝑁 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Iterative Solutions for Normal Equations Direct solutions have limitations Iterative solutions First order method: Gradient descent 𝑤 (𝑡+1) ← 𝑤 (𝑡) +𝜌 𝑛 𝑦 𝑛 − 𝑤 𝑡 𝑇 𝑥 𝑛 𝑥 𝑛 Convergence guarantees Convergence in probability to correct solution for appropriate fixed step size Sure convergence with decreasing step sizes Stochastic gradient descent Update based on a single data point as each step Often converges faster Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Advantages of Probabilistic Modeling Makes assumptions explicit Modularity Conceptually simple to change a model by replacing with appropriate distributions Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Summary Probabilistic formulation of linear regression Recovers least squares formulation Iterative algorithms for training Forms of regularization Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya