Probabilistic Models for Linear Regression

Slides:



Advertisements
Similar presentations
Bayesian Learning & Estimation Theory
Advertisements

Pattern Recognition and Machine Learning
Linear Regression.
Regularization David Kauchak CS 451 – Fall 2013.
Various Regularization Methods in Computer Vision Min-Gyu Park Computer Vision Lab. School of Information and Communications GIST.
Pattern Recognition and Machine Learning
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Lecture 4: Embedded methods
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Visual Recognition Tutorial
x – independent variable (input)
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Visual Recognition Tutorial
Collaborative Filtering Matrix Factorization Approach
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
Generative verses discriminative classifier
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
INTRODUCTION TO Machine Learning 3rd Edition
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Machine Learning 5. Parametric Methods.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
LEARNING FROM EXAMPLES AIMA CHAPTER 18 (4-5) CSE 537 Spring 2014 Instructor: Sael Lee Slides are mostly made from AIMA resources, Andrew W. Moore’s tutorials:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Who am I? Work in Probabilistic Machine Learning Like to teach 
Probability Theory and Parameter Estimation I
Empirical risk minimization
Probability for Machine Learning
Ch3: Model Building through Regression
CSE 4705 Artificial Intelligence
Linear Regression (continued)
Lecture 09: Gaussian Processes
Roberto Battiti, Mauro Brunato
Probabilistic Models with Latent Variables
Collaborative Filtering Matrix Factorization Approach
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Biointelligence Laboratory, Seoul National University
Lecture 10: Gaussian Processes
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Parametric Methods Berlin Chen, 2005 References:
Empirical risk minimization
Machine learning overview
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Presentation transcript:

Probabilistic Models for Linear Regression Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regression Problem N iid training samples { 𝑥 𝑛 , 𝑦 𝑛 } Response / Output / Target : 𝑦 𝑛 ∈𝑅 Input / Feature vector: 𝑋∈ 𝑅 𝑑 Linear Regression 𝑦 𝑛 = 𝑤 𝑇 𝑥 𝑛 + 𝜖 𝑛 Polynomial Regression 𝑦 𝑛 = 𝑤 𝑇 𝜙 𝑥 𝑛 + 𝜖 𝑛 𝜙 𝑗 𝑥 = 𝑥 𝑗 Still linear function of w Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Least Squares Formulation Deterministic error term 𝜖 𝑛 Minimize total error 𝐸 𝑤 = 𝑛 𝜖 𝑛 2 𝑤 ∗ = arg min 𝑤 𝐸(𝑤) Find gradient wrt 𝑤 and equate to 0 𝑤 ∗ = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑦 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regularization for Regression How does regression overfit? Adding regularization to regression 𝐸 1 𝑤,𝐷 + 𝜆𝐸 2 𝑤 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Regularization for Regression Possibilities for regularizers 𝑙 2 norm 𝑤 𝑇 𝑤 (Ridge regression) Quadratic: Continuous, convex 𝑤 ∗ = 𝜆𝐼+ 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑌 𝑙 1 norm (Lasso) Choosing 𝜆 Cross validation: wastes training data … Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic formulation Model X and Y as random variables Directly model conditional distribution of Y IID 𝑌 𝑖 | 𝑋 𝑖 =𝑥∼𝑖𝑖𝑑 𝑝(𝑦|𝑥) Linear 𝑌 𝑖 = 𝑤 𝑇 𝑋 𝑖 + 𝜖 𝑛 , 𝜖 𝑛 ∼𝑖𝑖𝑑 𝑝 𝜖 Gaussian noise 𝑝 𝜖 =𝑁 0, 𝜎 2 𝑝 𝑦 𝑥 = 1 2𝜋 𝜎 exp{− 𝑦− 𝑤 𝑇 𝑥 2 2 𝜎 2 } Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Probabilistic formulation Image from Michael Jordan’s book Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Maximum Likelihood Estimation Formulate loglikelihood 𝐿 𝑤 = 𝑛 𝑝 𝑦 𝑛 𝑥 𝑛 ;𝑤 = 1 2𝜋𝜎^2 𝑁/2 exp⁡{− 1 2𝜋 𝜎 𝑛 𝑦− 𝑤 𝑇 𝑥 𝑛 2 } 𝑙 𝑤 = 𝑛 𝑦− 𝑤 𝑇 𝑥 𝑛 2 Recovers LMS formulation! Maximize to get MLE 𝑤 𝑀𝐿 = 𝑋 𝑇 𝑋 −1 𝑋 𝑇 𝑌 𝜎 2 𝑀𝐿 = 1 𝑁 𝑛 ( 𝑦 𝑛 − 𝑤 𝑀𝐿 𝑇 𝑥 𝑛 ) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Bayesian Linear Regression Model W as random variable with prior distribution 𝑝 𝑤 =𝑁 𝑚 0 , 𝑆 0 ;𝑤, 𝑚 0 is 𝑀×1, 𝑆 0 is 𝑀×𝑀 Derive posterior distribution 𝑝 𝑤 𝑦 =𝑁 𝑚 𝑁 , 𝑆 𝑁 (for some 𝑚 𝑁 , 𝑆 𝑁 ) Derive mean of posterior distribution 𝑤 𝐵 =𝐸 𝑊 𝑦 = 𝑚 𝑁 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Iterative Solutions for Normal Equations Direct solutions have limitations Iterative solutions First order method: Gradient descent 𝑤 (𝑡+1) ← 𝑤 (𝑡) +𝜌 𝑛 𝑦 𝑛 − 𝑤 𝑡 𝑇 𝑥 𝑛 𝑥 𝑛 Convergence guarantees Convergence in probability to correct solution for appropriate fixed step size Sure convergence with decreasing step sizes Stochastic gradient descent Update based on a single data point as each step Often converges faster Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Advantages of Probabilistic Modeling Makes assumptions explicit Modularity Conceptually simple to change a model by replacing with appropriate distributions Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya

Summary Probabilistic formulation of linear regression Recovers least squares formulation Iterative algorithms for training Forms of regularization Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya