Download presentation
Presentation is loading. Please wait.
Published byVirginia Blankenship Modified over 6 years ago
1
Contact: Wilson.McKerrow@nyumc.org
Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
2
Linear Regression – one independent variable
Data: Want: (residual) is “small” 2 Predictor Independent Response Dependent
3
Linear Regression – one independent variable
3 What is “small”? Define a loss function: To measure how far off the model is. Want:
4
Linear Regression – one independent variable
4 Standard choice: Sum of square errors (least squares)
5
Linear Regression – one independent variable
5 Advantages of least squares: Gauss-Markov theorem: Lowest variance among unbiased methods. IF: Residuals have mean 0: 𝐸 𝜖 𝑖 =0 Residuals have constant variance: 𝑉 𝜖 𝑖 = 𝜎 2 Errors are uncorrelated: 𝑐𝑜𝑣 𝜖 𝑖 , 𝜖 𝑗 =0
6
Linear Regression – one independent variable
6 Advantages of least squares: Gauss-Markov theorem: Lowest variance among unbiased methods. Equivalent to MLE for
7
Linear Regression – one independent variable
7 Advantages of least squares: Gauss-Markov theorem: Lowest variance among unbiased methods. Equivalent to MLE for Easy to calculate
8
Linear Regression – one independent variable
8 Disadvantages of least squares: Variance can still be large Pays attention to outliers
9
Minimizing the loss function, L (sum of squared errors):
Linear Regression – One Independent Variable Minimizing the loss function, L (sum of squared errors): 9
10
Minimizing the loss function, L (sum of squared errors):
Linear Regression – One Independent Variable Minimizing the loss function, L (sum of squared errors): 10
11
Linear Regression – Vector notation
11 Relationship: 𝑦= 𝑤 1 𝑥+ 𝑤 0 +𝜖 𝒙=(1, 𝑥 1 ) 𝒘=( 𝑤 0 , 𝑤 1 ) 𝑦=𝒙∙𝒘+𝜖
12
Linear Regression - Multiple Independent Variables
𝑦=𝒙∙𝒘+𝜖 12 𝒙=(1, 𝑥 1 , 𝑥 2 , 𝑥 3 ,…, 𝑥 𝑘 ) 𝒘=( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑘 )
13
𝒚=𝑿𝒘+𝝐 Linear Regression – Matrix notation
13 𝒚=𝑿𝒘+𝝐 Data: 𝑦 𝑗 , 𝑥 1𝑗 , 𝑥 2𝑗 ,…, 𝑥 𝑘𝑗 for j=1..n 𝒚= ( 𝑦 1 , 𝑦 1 , 𝑦 2 , 𝑦 3 ,… , 𝑦 𝑛 ) 𝑇 𝑿= 1 𝑥 11 … 𝑥 𝑘1 1 𝑥 21 …𝑥 𝑘2 ⋮ 1 ⋮ 𝑥 𝑘1 ⋮ … 𝑥 𝑘𝑛 𝒘= ( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑘 ) 𝑇
14
𝒚=𝑿𝒘+𝝐 Linear Regression – Matrix notation
14 𝒚= 1 𝑥 11 … 𝑥 𝑘1 1 𝑥 21 …𝑥 𝑘2 ⋮ 1 ⋮ 𝑥 2𝑛 ⋮ … 𝑥 𝑘𝑛 𝑤 0 𝑤 1 … 𝑤 𝑘 +𝜖= 𝑤 0 + 𝑗 𝑥 𝑖𝑗 𝑤 𝑗 +𝜖 Row by column
15
Multiple Linear Regression
𝒚=𝑿𝒘+𝝐 Minimizing the loss function, L (sum of squared errors): 𝑑𝐿 𝑑𝒘 = 𝑑 𝑑𝒘 𝝐 𝑻 𝝐 =0 ⇒ 𝑑 𝑑𝒘 𝒚−𝑿𝒘 𝑻 𝒚−𝑿𝒘 = 𝑑 𝑑𝒘 𝒚 𝑻 𝒚− 𝒚 𝑻 𝑿𝒘− 𝑿𝒘 𝑻 𝒚+ 𝑿𝒘 𝑻 𝑿𝒘 𝒚 𝑻 𝒚− 𝒚 𝑻 𝑿𝒘− 𝑿𝒘 𝑻 𝒚+ 𝑿𝒘 𝑻 𝑿𝒘 =−𝟐 𝑿 𝑻 𝒚+𝟐 𝑿 𝑻 𝑿 𝒘 =0 ⇒ 𝒘 = (𝑿 𝑻 𝑿) −𝟏 𝑿 𝑻 𝒚
16
Non-constant variance
𝜎~𝑥^2 Truth Fit
17
Non-constant variance
100 repetitions
18
𝒘 = (𝑿 𝑻 𝑺𝑿) −𝟏 𝑿 𝑻 𝑺𝒚 𝑆= 1/𝜎 1 2 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 1/𝜎 𝑛 2
Weighted least squares 𝑠𝑑( 𝜖 𝑖 )~ 𝜎 𝑖 but 𝑠𝑑( 𝜖 𝑖 / 𝜎 𝑖 )~1, constant New loss function: 𝐿( 𝑤 1 , 𝑤 0 )= 𝜖 𝑖 𝜎 𝑖 2 Minimized when: 𝒘 = (𝑿 𝑻 𝑺𝑿) −𝟏 𝑿 𝑻 𝑺𝒚 𝑆= 1/𝜎 1 2 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 1/𝜎 𝑛 2
19
Weighted least squares
𝜎~𝑥^2 100 repetitions
20
𝒙=(1, 𝑓 1 (𝑥), 𝑓 2 (𝑥), 𝑓 3 (𝑥),…, 𝑓 𝑙 (𝑥))
(Non)Linear Regression – Sum of Functions 𝑦=𝒙∙𝒘+𝜖 20 𝒙=(1, 𝑓 1 (𝑥), 𝑓 2 (𝑥), 𝑓 3 (𝑥),…, 𝑓 𝑙 (𝑥)) 𝒘=( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑙 ) 𝒙∙𝒘= 𝑤 0 + 𝑓 1 𝑥 𝑤 1 +…+ 𝑓 𝑙 𝑥 𝑤 𝑙
21
(Non)Linear Regression – Polynomial
𝑦=𝒙∙𝒘+𝜖 21 𝒙=(1, 𝑥, 𝑥 2 , 𝑥 3 ,…, 𝑥 𝑙 ) 𝒘=( 𝑤 0 , 𝑤 1 , 𝑤 2 , 𝑤 3 ,… , 𝑤 𝑙 ) 𝒙∙𝒘= 𝑤 0 + 𝑤 1 𝑥+ 𝑤 2 𝑥 2 +…+ 𝑤 𝑙 𝑥 𝑙
22
Model Capacity: Overfitting and Underfitting
22
23
Model Capacity: Overfitting and Underfitting
23
24
Model Capacity: Overfitting and Underfitting
24
25
Model Capacity: Overfitting and Underfitting
25 Training Error Error on Training Set Degree of polynomial
26
Model Capacity: Overfitting and Underfitting
26 With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. John von Neumann
27
Training and Testing Data Set Test Training
28
Training and Testing – Linear relationship
Error Testing Error Training Error Degree of polynomial
29
Testing Error and Training Set Size
Low Variance High Variance 10 10 10 30 30 100 30 300 Error 100 1000 100 300 3000 Degree of polynomial Degree of polynomial Degree of polynomial
30
Coefficients and Training Set Size
Degree of polynomial = 9 10 100 1000 Absolute Value of coefficient Coefficient Coefficient Coefficient
31
Training and Testing – Non-linear relationship
Error Testing Error Training Error Degree of polynomial
32
Testing Error and Training Set Size
Low Variance High Variance 10 30 30 30 100 Log(Error) 100 100 300 300 1000 300 3000 Degree of polynomial Degree of polynomial Degree of polynomial
33
Multiple Linear Regression
Correlation of independent variables increase the uncertainty in parameter determination Standard deviation R2 = 0.9 Uncorrelated Number of Data Points
34
min 𝒘 𝑳 𝒘 min 𝒘 𝑳 𝒘 +𝜆𝑔( 𝒘 ) min 𝒘 𝒚−𝑿𝒘 2 +𝜆 𝒘 2 Regularization
No Regularization: min 𝒘 𝑳 𝒘 Regularization: min 𝒘 𝑳 𝒘 +𝜆𝑔( 𝒘 ) Ridge Regression: min 𝒘 𝒚−𝑿𝒘 2 +𝜆 𝒘 2
35
𝑑𝐿 𝑑𝒘 =−𝟐 𝑿 𝑻 𝒚+𝟐 𝑿 𝑻 𝑿𝒘 Ridge Regression
Recall: 𝑑𝐿 𝑑𝒘 =−𝟐 𝑿 𝑻 𝒚+𝟐 𝑿 𝑻 𝑿𝒘 So: 𝑑 𝑑𝑤 𝐿+𝜆 𝒘 𝑻 𝒘 =−𝟐 𝑿 𝑻 𝒚+𝟐 𝑿 𝑻 𝑿𝒘+𝟐𝝀𝒘 and: −𝟐 𝑿 𝑻 𝒚+𝟐 𝑿 𝑻 𝑿 𝒘 +𝟐𝝀 𝒘 =𝟎 𝑿 𝑇 𝑿+2𝜆𝑰 𝒘= 𝑿 𝑇 𝒚 yielding: 𝒘 =( 𝑿 𝑇 𝑿+2𝜆𝑰) −1 𝑿 𝑇 𝒚
36
Regularization: Coefficients and Training Set Size
Degree of polynomial = 9 10 100 1000 Coefficient Coefficient Coefficient
37
Nearest Neighbor Regression – Fixed Distance
38
Nearest Neighbor Regression – Fixed Number (kNN)
39
Nearest Neighbor Regression
Estimate by taking the average over NNs Or use a distance weighted average
40
Nearest Neighbor Regression
Linear Data Non-Linear Data 10 10 30 30 Error Log(Error) 100 300 1000 100 3000 Number of Neighbors Number of Neighbors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.