Lecture 02: Linear Regression

Slides:

Advertisements

Similar presentations

Ordinary Least-Squares

Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Kin 304 Regression Linear Regression Least Sum of Squares

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.

Prediction with Regression

Ch11 Curve Fitting Dr. Deshi Ye

The loss function, the normal equation,

Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

x – independent variable (input)

Curve-Fitting Regression

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Basics of regression analysis

Lecture 19 Simple linear regression (Review, 18.5, 18.8)

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”

Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2014 Professor Brandon A. Jones Lecture 26: Singular Value Decomposition.

Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Math 4030 – 11b Method of Least Squares. Model: Dependent (response) Variable Independent (control) Variable Random Error Objectives: Find (estimated)

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Machine Learning 5. Parametric Methods.

Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Computacion Inteligente Least-Square Methods for System Identification.

Regularized Least-Squares and Convex Optimization.

Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,

Lecture 10: Image alignment CS4670/5760: Computer Vision Noah Snavely

Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:

Lecture 16: Image alignment

Matt Gormley Lecture 5 September 14, 2016

Kernel Regression Prof. Bennett

Regression Usman Roshan.

Chapter 7. Classification and Prediction

Lecture 04: Logistic Regression

Empirical risk minimization

Boosting and Additive Trees (2)

CSE 4705 Artificial Intelligence

Lecture 05: K-nearest neighbors

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Lecture 09: Gaussian Processes

Lecture 04: Logistic Regression

Bias and Variance of the Estimator

Probabilistic Models for Linear Regression

Roberto Battiti, Mauro Brunato

The regression model in matrix form

The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.

Hypothesis testing and Estimation

10701 / Machine Learning Today: - Cross validation,

Lecture 18: Bagging and Boosting

Lecture 07: Hard-margin SVM

The Simple Linear Regression Model: Specification and Estimation

Integration of sensory modalities

Lecture 02: Linear Regression

Lecture 04: Logistic Regression

CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.

Simple Linear Regression

OVERVIEW OF LINEAR MODELS

Generally Discriminant Analysis

Lecture 10: Gaussian Processes

Regression Usman Roshan.

Lecture 06: Bagging and Boosting

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Lecture 03: K-nearest neighbors

Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.

Lecture 8: Image alignment

Empirical risk minimization

Lecture 16. Classification (II): Practical Considerations

Presentation transcript:

Lecture 02: Linear Regression CS489/698: Intro to ML Lecture 02: Linear Regression 9/14/17 Yao-Liang Yu

I’d rather die than telling you my password! Transfer success! 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Announcements Assignment 1 is out. TA office hour? Enrollment Due in two weeks TA office hour? Enrollment CS698: permission numbers sent CS489: ~10 seats available on Quest, ask CS advisors! 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

How much should I bid for? Interpolation vs. Extrapolation Linear vs. Nonlinear 9/14/17 Yao-Liang Yu

Regression Given a pair (X, Y), find function f such that X: feature vector, d-dim real vector Y: response, m-dim real vector (m=1 say) Two problems: (X,Y) is uncertain: samples from an unknown distribution How to measure the error: need a loss function 9/14/17 Yao-Liang Yu

Risk minimization Minimize the expected loss, aka risk Which loss to use? Not always clear; convenience dominates for now Least squares: 9/14/17 Yao-Liang Yu

The regression function Many ways to estimate m(X) Simplest: Let’s assume it is linear (affine)! Inherent noise variance 9/14/17 Yao-Liang Yu

Linear regression Assumption: Dream: Law of Large Numbers: Reality: distribution unknown… empirical risk 9/14/17 Yao-Liang Yu

Simplification, again 9/14/17 Yao-Liang Yu

Finally Sum of square residuals True responses Hyperplane (again!) parameterized by W 9/14/17 Yao-Liang Yu

Why least squares? Theorem (Sondermann’86; Friedland and Torokhti’07; Yu and Schuurmans’11) Among all minimizers of minW ||AWB – C||F, W=A+CB+ is the one that has minimal F-norm. Pseudo-inverse A+ is the unique matrix G such that AGA = A, GAG = G, (AG)T=AG, (GA)T=GA Singular Value Decomposition A=USVT A+=VS-1UT 9/14/17 Yao-Liang Yu

Optimization detour Fermat’s Theorem. Necessarily (Fréchet) Derivative at x. Example. f(x) = xTAx + xTb + c Df(x) = (A+AT)x + b 9/14/17 Yao-Liang Yu

Solving least squares Normal Equation XTX may not be invertible, but there is always a solution Even invertible, never ever compute W = (XTX)-1XTY ! Instead, solve the linear system 9/14/17 Yao-Liang Yu

Prediction Once have W, can predict How to evaluate? Sometimes we evaluate using a different Leads to a beautiful theory of calibration 9/14/17 Yao-Liang Yu

Robustness 9/14/17 Yao-Liang Yu

Gauss vs. Laplace 9/14/17 Yao-Liang Yu

Multi-task learning Everything we’ve shown still holds if Y is m-dim But, can solve each column of Y independently Things are more interesting if we had regularization 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Ill-posedness Let x1=0, x2=ε, y1=1, y2=-1 X = y= w=X-1y= Slight perturbation leads to chaotic behaviour 9/14/17 Yao-Liang Yu

Tiknohov regularization (Hoerl and Kennard’70) Reg. constant (hyperparameter) Ridge regression With positive lambda, slight perturbation in input leads to proportional (wrt 1/lambda) perturbation in output 9/14/17 Yao-Liang Yu

Data augmentation 9/14/17 Yao-Liang Yu

Sparsity Ridge regression weight is always dense Lasso (Tibshirani’96) Computationally heavy Interpretationally cumbersome Lasso (Tibshirani’96) 9/14/17 Yao-Liang Yu

Regularization vs. Constraint Computationally appealing Always true Mild conditions Theoretically appealing 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Cross-validation … Training set Validation Test set 1 5 k-1 k 2 3 4 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k For each lambda, perf1 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k For each lambda, perf1 + perf2 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k For each lambda, perf1 + perf2 + … + perfk 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k Wlambda* For each lambda, perf(lambda) = perf1 + perf2 + … + perfk lambda* = argmaxlambda perf(lambda) 9/14/17 Yao-Liang Yu

Questions? 9/14/17 Yao-Liang Yu