Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Regression Models Andy Wang CIS 5930-03 Computer Systems Performance Analysis.

Similar presentations


Presentation on theme: "Linear Regression Models Andy Wang CIS 5930-03 Computer Systems Performance Analysis."— Presentation transcript:

1 Linear Regression Models Andy Wang CIS 5930-03 Computer Systems Performance Analysis

2 2 Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions Verifying assumptions visually

3 3 What Is a (Good) Model? For correlated data, model predicts response given an input Model should be equation that fits data Standard definition of “fits” is least- squares –Minimize squared error –Keep mean error zero –Minimizes variance of errors

4 4 Least-Squared Error If then error in estimate for x i is Minimize Sum of Squared Errors (SSE) Subject to the constraint

5 5 Estimating Model Parameters Best regression parameters are where Note error in book!

6 6 Parameter Estimation Example Execution time of a script for various loop counts: = 6.8, = 2.32,  xy = 88.7,  x 2 = 264 b 0 = 2.32  (0.30)(6.8) = 0.28

7 7 Graph of Parameter Estimation Example

8 8 Allocating Variation If no regression, best guess of y is Observed values of y differ from, giving rise to errors (variance) Regression gives better guess, but there are still errors We can evaluate quality of regression by allocating sources of errors

9 9 The Total Sum of Squares Without regression, squared error is

10 10 The Sum of Squares from Regression Recall that regression error is Error without regression is SST So regression explains SSR = SST - SSE Regression quality measured by coefficient of determination

11 11 Evaluating Coefficient of Determination Compute where R = R(x,y) = correlation(x,y)

12 12 Example of Coefficient of Determination For previous regression example –  y = 11.60,  y 2 = 29.88,  xy = 88.7, –SSE = 29.88-(0.28)(11.60)-(0.30)(88.7) = 0.028 –SST = 29.88-26.9 = 2.97 –SSR = 2.97-.03 = 2.94 –R 2 = (2.97-0.03)/2.97 = 0.99

13 13 Standard Deviation of Errors Variance of errors is SSE divided by degrees of freedom –DOF is n  2 because we’ve calculated 2 regression parameters from the data –So variance (mean squared error, MSE) is SSE/(n  2) Standard dev. of errors is square root: (minor error in book)

14 14 Checking Degrees of Freedom Degrees of freedom always equate: –SS0 has 1 (computed from ) –SST has n  1 (computed from data and, which uses up 1) –SSE has n  2 (needs 2 regression parameters) –So

15 15 Example of Standard Deviation of Errors For regression example, SSE was 0.03, so MSE is 0.03/3 = 0.01 and s e = 0.10 Note high quality of our regression: –R 2 = 0.99 –s e = 0.10

16 16 Confidence Intervals for Regressions Regression is done from a single population sample (size n) –Different sample might give different results –True model is y =  0 +  1 x –Parameters b 0 and b 1 are really means taken from a population sample

17 17 Calculating Intervals for Regression Parameters Standard deviations of parameters: Confidence intervals are where t has n - 2 degrees of freedom –Not divided by sqrt(n)

18 18 Example of Regression Confidence Intervals Recall s e = 0.13, n = 5,  x 2 = 264, = 6.8 So Using 90% confidence level, t 0.95;3 = 2.353

19 19 Regression Confidence Example, cont’d Thus, b 0 interval is –Not significant at 90% And b 1 is –Significant at 90% (and would survive even 99.9% test)

20 20 Confidence Intervals for Predictions Previous confidence intervals are for parameters –How certain can we be that the parameters are correct? Purpose of regression is prediction –How accurate are the predictions? –Regression gives mean of predicted response, based on sample we took

21 21 Predicting m Samples Standard deviation for mean of future sample of m observations at x p is Note deviation drops as m  Variance minimal at x = Use t-quantiles with n–2 DOF for interval

22 22 Example of Confidence of Predictions Using previous equation, what is predicted time for a single run of 8 loops? Time = 0.28 + 0.30(8) = 2.68 Standard deviation of errors s e = 0.10 90% interval is then

23 23 Prediction Confidence

24 24 Verifying Assumptions Visually Regressions are based on assumptions: –Linear relationship between response y and predictor x Or nonlinear relationship used in fitting –Predictor x nonstochastic and error-free –Model errors statistically independent With distribution N(0,c) for constant c If assumptions violated, model misleading or invalid

25 25 Testing Linearity Scatter plot x vs. y to see basic curve type LinearPiecewise Linear OutlierNonlinear (Power)

26 26 Testing Independence of Errors Scatter-plot  i versus Should be no visible trend Example from our curve fit:

27 27 More on Testing Independence May be useful to plot error residuals versus experiment number –In previous example, this gives same plot except for x scaling No foolproof tests –“Independence” test really disproves particular dependence –Maybe next test will show different dependence

28 28 Testing for Normal Errors Prepare quantile-quantile plot of errors Example for our regression:

29 29 Testing for Constant Standard Deviation Tongue-twister: homoscedasticity Return to independence plot Look for trend in spread Example:

30 30 Linear Regression Can Be Misleading Regression throws away some information about the data –To allow more compact summarization Sometimes vital characteristics are thrown away –Often, looking at data plots can tell you whether you will have a problem

31 31 Example of Misleading Regression IIIIIIIV xyxyxyxy 108.04109.14107.4686.58 86.9588.1486.7785.76 137.58138.741312.7487.71 98.8198.7797.1188.84 118.33119.26117.8188.47 149.96148.10148.8487.04 67.2466.1366.0885.25 44.2643.1045.391912.50 1210.84129.13128.1585.56 74.8277.2676.4287.91 55.6854.7455.7386.89

32 32 What Does Regression Tell Us About These Data Sets? Exactly the same thing for each! N = 11 Mean of y = 7.5 Y = 3 +.5 X Standard error of regression is 0.118 All the sums of squares are the same Correlation coefficient =.82 R 2 =.67

33 33 Now Look at the Data Plots III IIIIV

34 White Slide


Download ppt "Linear Regression Models Andy Wang CIS 5930-03 Computer Systems Performance Analysis."

Similar presentations


Ads by Google