Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 02: Linear Regression

Similar presentations

Presentation on theme: "Lecture 02: Linear Regression"— Presentation transcript:

1 Lecture 02: Linear Regression
CS489/698: Intro to ML Lecture 02: Linear Regression 9/14/17 Yao-Liang Yu

2 I’d rather die than telling you my password!
Transfer success! 9/14/17 Yao-Liang Yu

3 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

4 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

5 Announcements Assignment 1 is out. TA office hour? Enrollment
Due in two weeks TA office hour? Enrollment CS698: permission numbers sent CS489: ~10 seats available on Quest, ask CS advisors! 9/14/17 Yao-Liang Yu

6 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

7 How much should I bid for?
Interpolation vs. Extrapolation Linear vs. Nonlinear 9/14/17 Yao-Liang Yu

8 Regression Given a pair (X, Y), find function f such that
X: feature vector, d-dim real vector Y: response, m-dim real vector (m=1 say) Two problems: (X,Y) is uncertain: samples from an unknown distribution How to measure the error: need a loss function 9/14/17 Yao-Liang Yu

9 Risk minimization Minimize the expected loss, aka risk
Which loss to use? Not always clear; convenience dominates for now Least squares: 9/14/17 Yao-Liang Yu

10 The regression function
Many ways to estimate m(X) Simplest: Let’s assume it is linear (affine)! Inherent noise variance 9/14/17 Yao-Liang Yu

11 Linear regression Assumption: Dream: Law of Large Numbers: Reality:
distribution unknown… empirical risk 9/14/17 Yao-Liang Yu

12 Simplification, again 9/14/17 Yao-Liang Yu

13 Finally Sum of square residuals True responses
Hyperplane (again!) parameterized by W 9/14/17 Yao-Liang Yu

14 Why least squares? Theorem (Sondermann’86; Friedland and Torokhti’07; Yu and Schuurmans’11) Among all minimizers of minW ||AWB – C||F, W=A+CB+ is the one that has minimal F-norm. Pseudo-inverse A+ is the unique matrix G such that AGA = A, GAG = G, (AG)T=AG, (GA)T=GA Singular Value Decomposition A=USVT A+=VS-1UT 9/14/17 Yao-Liang Yu

15 Optimization detour Fermat’s Theorem. Necessarily
(Fréchet) Derivative at x. Example. f(x) = xTAx + xTb + c Df(x) = (A+AT)x + b 9/14/17 Yao-Liang Yu

16 Solving least squares Normal Equation XTX may not be invertible, but there is always a solution Even invertible, never ever compute W = (XTX)-1XTY ! Instead, solve the linear system 9/14/17 Yao-Liang Yu

17 Prediction Once have W, can predict How to evaluate?
Sometimes we evaluate using a different Leads to a beautiful theory of calibration 9/14/17 Yao-Liang Yu

18 Robustness 9/14/17 Yao-Liang Yu

19 Gauss vs. Laplace 9/14/17 Yao-Liang Yu

20 Multi-task learning Everything we’ve shown still holds if Y is m-dim
But, can solve each column of Y independently Things are more interesting if we had regularization 9/14/17 Yao-Liang Yu

21 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

22 Ill-posedness Let x1=0, x2=ε, y1=1, y2=-1 X = y= w=X-1y=
Slight perturbation leads to chaotic behaviour 9/14/17 Yao-Liang Yu

23 Tiknohov regularization (Hoerl and Kennard’70)
Reg. constant (hyperparameter) Ridge regression With positive lambda, slight perturbation in input leads to proportional (wrt 1/lambda) perturbation in output 9/14/17 Yao-Liang Yu

24 Data augmentation 9/14/17 Yao-Liang Yu

25 Sparsity Ridge regression weight is always dense Lasso (Tibshirani’96)
Computationally heavy Interpretationally cumbersome Lasso (Tibshirani’96) 9/14/17 Yao-Liang Yu

26 Regularization vs. Constraint
Computationally appealing Always true Mild conditions Theoretically appealing 9/14/17 Yao-Liang Yu

27 Outline Announcements Linear Regression Regularization
Cross-validation 9/14/17 Yao-Liang Yu

28 Cross-validation … Training set Validation Test set 1 5 k-1 k 2 3 4
9/14/17 Yao-Liang Yu

29 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k
For each lambda, perf1 9/14/17 Yao-Liang Yu

30 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k
For each lambda, perf1 + perf2 9/14/17 Yao-Liang Yu

31 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k
For each lambda, perf1 + perf2 + … + perfk 9/14/17 Yao-Liang Yu

32 Cross-validation … Training set Test set 1 2 3 4 5 k-1 k Wlambda*
For each lambda, perf(lambda) = perf1 + perf2 + … + perfk lambda* = argmaxlambda perf(lambda) 9/14/17 Yao-Liang Yu

33 Questions? 9/14/17 Yao-Liang Yu

Download ppt "Lecture 02: Linear Regression"

Similar presentations

Ads by Google