Lecture 02: Linear Regression

Slides:



Advertisements
Similar presentations
Ordinary Least-Squares
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Prediction with Regression
Ch11 Curve Fitting Dr. Deshi Ye
The loss function, the normal equation,
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
x – independent variable (input)
Curve-Fitting Regression
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Basics of regression analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2014 Professor Brandon A. Jones Lecture 26: Singular Value Decomposition.
Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Math 4030 – 11b Method of Least Squares. Model: Dependent (response) Variable Independent (control) Variable Random Error Objectives: Find (estimated)
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computacion Inteligente Least-Square Methods for System Identification.
Regularized Least-Squares and Convex Optimization.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Lecture 10: Image alignment CS4670/5760: Computer Vision Noah Snavely
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Lecture 16: Image alignment
Matt Gormley Lecture 5 September 14, 2016
Kernel Regression Prof. Bennett
Regression Usman Roshan.
Chapter 7. Classification and Prediction
Lecture 04: Logistic Regression
Empirical risk minimization
Boosting and Additive Trees (2)
CSE 4705 Artificial Intelligence
Lecture 05: K-nearest neighbors
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Lecture 09: Gaussian Processes
Lecture 04: Logistic Regression
Bias and Variance of the Estimator
Probabilistic Models for Linear Regression
Roberto Battiti, Mauro Brunato
The regression model in matrix form
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Hypothesis testing and Estimation
10701 / Machine Learning Today: - Cross validation,
Lecture 18: Bagging and Boosting
Lecture 07: Hard-margin SVM
The Simple Linear Regression Model: Specification and Estimation
Integration of sensory modalities
Lecture 02: Linear Regression
Lecture 04: Logistic Regression
CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Generally Discriminant Analysis
Lecture 10: Gaussian Processes
Regression Usman Roshan.
Lecture 06: Bagging and Boosting
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Lecture 03: K-nearest neighbors
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Lecture 8: Image alignment
Empirical risk minimization
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Lecture 02: Linear Regression CS489/698: Intro to ML Lecture 02: Linear Regression 9/14/17 Yao-Liang Yu

I’d rather die than telling you my password! Transfer success! 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Announcements Assignment 1 is out. TA office hour? Enrollment Due in two weeks TA office hour? Enrollment CS698: permission numbers sent CS489: ~10 seats available on Quest, ask CS advisors! 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

How much should I bid for? Interpolation vs. Extrapolation Linear vs. Nonlinear 9/14/17 Yao-Liang Yu

Regression Given a pair (X, Y), find function f such that X: feature vector, d-dim real vector Y: response, m-dim real vector (m=1 say) Two problems: (X,Y) is uncertain: samples from an unknown distribution How to measure the error: need a loss function 9/14/17 Yao-Liang Yu

Risk minimization Minimize the expected loss, aka risk Which loss to use? Not always clear; convenience dominates for now Least squares: 9/14/17 Yao-Liang Yu

The regression function Many ways to estimate m(X) Simplest: Let’s assume it is linear (affine)! Inherent noise variance 9/14/17 Yao-Liang Yu

Linear regression Assumption: Dream: Law of Large Numbers: Reality: distribution unknown… empirical risk 9/14/17 Yao-Liang Yu

Simplification, again 9/14/17 Yao-Liang Yu

Finally Sum of square residuals True responses Hyperplane (again!) parameterized by W 9/14/17 Yao-Liang Yu

Why least squares? Theorem (Sondermann’86; Friedland and Torokhti’07; Yu and Schuurmans’11) Among all minimizers of minW ||AWB – C||F, W=A+CB+ is the one that has minimal F-norm. Pseudo-inverse A+ is the unique matrix G such that AGA = A, GAG = G, (AG)T=AG, (GA)T=GA Singular Value Decomposition A=USVT A+=VS-1UT 9/14/17 Yao-Liang Yu

Optimization detour Fermat’s Theorem. Necessarily (Fréchet) Derivative at x. Example. f(x) = xTAx + xTb + c Df(x) = (A+AT)x + b 9/14/17 Yao-Liang Yu

Solving least squares Normal Equation XTX may not be invertible, but there is always a solution Even invertible, never ever compute W = (XTX)-1XTY ! Instead, solve the linear system 9/14/17 Yao-Liang Yu

Prediction Once have W, can predict How to evaluate? Sometimes we evaluate using a different Leads to a beautiful theory of calibration 9/14/17 Yao-Liang Yu

Robustness 9/14/17 Yao-Liang Yu

Gauss vs. Laplace 9/14/17 Yao-Liang Yu

Multi-task learning Everything we’ve shown still holds if Y is m-dim But, can solve each column of Y independently Things are more interesting if we had regularization 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Ill-posedness Let x1=0, x2=ε, y1=1, y2=-1 X = y= w=X-1y= Slight perturbation leads to chaotic behaviour 9/14/17 Yao-Liang Yu

Tiknohov regularization (Hoerl and Kennard’70) Reg. constant (hyperparameter) Ridge regression With positive lambda, slight perturbation in input leads to proportional (wrt 1/lambda) perturbation in output 9/14/17 Yao-Liang Yu

Data augmentation 9/14/17 Yao-Liang Yu

Sparsity Ridge regression weight is always dense Lasso (Tibshirani’96) Computationally heavy Interpretationally cumbersome Lasso (Tibshirani’96) 9/14/17 Yao-Liang Yu

Regularization vs. Constraint Computationally appealing Always true Mild conditions Theoretically appealing 9/14/17 Yao-Liang Yu

Outline Announcements Linear Regression Regularization Cross-validation 9/14/17 Yao-Liang Yu

Cross-validation … Training set Validation Test set 1 5 k-1 k 2 3 4 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k For each lambda, perf1 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k For each lambda, perf1 + perf2 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k For each lambda, perf1 + perf2 + … + perfk 9/14/17 Yao-Liang Yu

Cross-validation … Training set Test set 1 2 3 4 5 k-1 k Wlambda* For each lambda, perf(lambda) = perf1 + perf2 + … + perfk lambda* = argmaxlambda perf(lambda) 9/14/17 Yao-Liang Yu

Questions? 9/14/17 Yao-Liang Yu