Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.

Similar presentations


Presentation on theme: "Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression."— Presentation transcript:

1 Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression

2 Data mining and statistical learning, lecture 3 Ordinary least squares regression (OLS) x1x1 x2x2 xpxp … y Model: Terminology:  0 : intercept (or bias)  1, …,  p : regression coefficients (or weights) The response variable responds directly and linearly to changes in the inputs

3 Data mining and statistical learning, lecture 3 Least squares regression Assume that we have observed a training set of data Estimate the  coefficients by minimizing the residual sum of squares

4 Data mining and statistical learning, lecture 3 Matrix formulation of OLS regression Differentiating the residual sum of squares and setting the first derivatives equal to zero we obtain where and

5 Data mining and statistical learning, lecture 3 Parameter estimates and predictions Least squares estimates of the parameters Predicted values

6 Data mining and statistical learning, lecture 3 Different sources of inputs Quantitative inputs Transformations of quantitative inputs Numeric or dummy coding of the levels of qualitative inputs Interactions between variables (e.g. X 3 = X 1 X 2 ) Example of dummy coding:

7 Data mining and statistical learning, lecture 3 An example of multiple linear regression Response variable: Requested price of used Porsche cars (1000 SEK) Inputs: X 1 = Manufacturing year X 2 = Milage (km) X 3 = Model (0 or 1) X 4 = Equipment (1 2, 3) X 5 = Colour (Red Black Silver Blue Black White Green)

8 Data mining and statistical learning, lecture 3 Price of used Porsche cars Response variable: Requested price of used Porsche cars (1000 SEK) Inputs: X 1 = Manufacturing year X 2 = Milage (km)

9 Data mining and statistical learning, lecture 3 Interpretation of multiple regression coefficients Assume that and that the regression coefficients are estimated by ordinary least squares regression Then the multiple regression coefficient represents the additional contribution of x j on y, after x j has been adjusted for x 0, x 1, …, x j-1, x j+1, …, x p

10 Data mining and statistical learning, lecture 3 Confidence intervals for regression parameters Assume that where the X- variables are fixed and the error terms are i.i.d. and N(0,  ) Then where v j is the jth diagonal element of

11 Data mining and statistical learning, lecture 3 Interpretation of software outputs Adding new independent variables to a regression model alters at least one of the old regression coefficients unless the columns of the X -matrix are orthogonal, i.e.

12 Data mining and statistical learning, lecture 3 Stepwise Regression: Price (1000SEK) versus Year, Milage (km),... The p -value refers to a t -test of the hypothesis that the regression coefficient of the last entered x -variable is zero Classical statistical model selection techniques are model-based. In data-mining the model selection is data-driven.

13 Data mining and statistical learning, lecture 3 Stepwise Regression: Price (1000SEK) versus Year, Milage (km),... - model validation by visual inspection of residuals Residual = Observed - Predicted

14 Data mining and statistical learning, lecture 3 The Gram-Schmidt procedure for regression by successive orthogonalization and simple linear regression 1. Intialize z 0 = x 0 = 1 2. For j = 1, …, p, compute where  depicts the inner product (the sum of coordinate-wise products) 3. Regress y on z p to obtain the multiple regression coefficient

15 Data mining and statistical learning, lecture 3 Prediction of a response variable using correlated explanatory variables - daily temperatures in Stockholm, Göteborg, and Malmö

16 Data mining and statistical learning, lecture 3 Absorbance records for ten samples of chopped meat 1 response variable (protein) 100 predictors (absorbance at 100 wavelengths or channels) The predictors are strongly correlated to each other

17 Data mining and statistical learning, lecture 3 Absorbance records for 240 samples of chopped meat The target is poorly correlated to each predictor

18 Data mining and statistical learning, lecture 3 Ridge regression The ridge regression coefficients minimize a penalized residual sum of squares: or Normally, inputs are centred prior to the estimation of regression coefficients

19 Data mining and statistical learning, lecture 3 Matrix formulation of ridge regression for centred inputs If the inputs are orthogonal, the ridge estimates are just a scaled version of the least squares estimates Shrinking enables estimation of regression coefficients even if the number of parameters exceeds the number of cases Figure 3.7

20 Data mining and statistical learning, lecture 3 Ridge regression – pros and cons Ridge regression is particularly useful if the explanatory variables are strongly correlated to each other. The variance of the estimated regression coefficient is reduced at the expensive of (slightly) biased estimates

21 Data mining and statistical learning, lecture 3 The Gauss-Markov theorem Consider a linear regression model in which: –the inputs are regarded as fixed –the error terms are i.i.d. with mean 0 and variance  2. Then, the least squares estimator of a parameter a T  has variance no bigger than any other linear unbiased estimator of a T  Biased estimators may have smaller variance and mean squared error!

22 Data mining and statistical learning, lecture 3 SAS code for an ordinary least squares regression proc reg data=mining.dailytemperature outest = dtempbeta; model daily_consumption = stockholm g_teborg malm_; run;

23 Data mining and statistical learning, lecture 3 SAS code for ridge regression proc reg data=mining.dailytemperature outest = dtempbeta ridge=0 to 10 by 1; model daily_consumption = stockholm g_teborg malm_; proc print data=dtempbeta; run;


Download ppt "Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression."

Similar presentations


Ads by Google