Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Linear Regression

Similar presentations


Presentation on theme: "Introduction to Linear Regression"— Presentation transcript:

1 Introduction to Linear Regression

2 Linear Regression Prediction on continuous variables -- Given GPA, can we predict salaries ? -- Given user data, can we predict ad clicks ? -- etc.

3 More formally Response variable: y Input variables: x1, x2, x3 … y = b0 + b1*x1 + b2*x3 … Can we find values of b0, b1, b2 ?

4 Supervised learning mpg hp wt gear Mazda RX Mazda RX4 Wag Datsun Hornet 4 Drive Hornet Sportabout Valiant Training data: Samples where y, x1, x2, x3 are given

5 Because we love matrices
Generalize our problem Y = X * B where Y is a column vector of all responses X is a matrix (samples x features) B is a column vector (features x 1)

6 Solving for a model Loss or residual = || Y – X * B ||2 = (Y – X*B)t * (Y-X*B) Minimize loss to get optimal value of B Differentiating w.r.t B, solving B^ = (Xt*X)-1 * Xt * Y

7 Predicting values Given model (b0, b1, b2, b3) new data point Z (z1, z2, z3) ypred = b0 + b1*z1 + b2*z2 + b3*z3

8 Evaluating Models Residual = Observed - Predicted Predicted value also called fitted value 𝜖 𝑖 = 𝑦 𝑖 −𝑏 𝑥 𝑖 Residual Sum of Squares (RSS): 𝑖=1 𝑛 𝜖 𝑖 2

9 Explaining variance Residual variance = variance of 𝜖 𝑖 values R2 = Explained variance 1 -- variance of residuals variance of observation

10 Adjusted R2 R2 always improves with more features Too many features ! Adjusted R2 scales variance of residuals, data adjusted variance = variance degrees of freedom

11 Degrees of freedom Number of samples: n Number of features: k Degrees of freedom = n – k

12 Residual vs. Fitted BAD GOOD

13 Residuals vs. Normal (QQPlot)
BAD GOOD

14 Transforming variables
From

15 Transforming variables
Overfitting !! From An illustration of the Bias Variance Tradeoff - by Gene Leynes

16 Regression vs. Classification
Example Stock Price Prediction Spam Filtering Prediction Continuous variables Discrete variables Loss Function Least Squares Loss Logistic Loss, Hinge Loss (SVM)


Download ppt "Introduction to Linear Regression"

Similar presentations


Ads by Google