Presentation is loading. Please wait.

Presentation is loading. Please wait.

SE-280 Dr. Mark L. Hornick 1 Statistics Review Linear Regression & Correlation.

Similar presentations


Presentation on theme: "SE-280 Dr. Mark L. Hornick 1 Statistics Review Linear Regression & Correlation."— Presentation transcript:

1 SE-280 Dr. Mark L. Hornick 1 Statistics Review Linear Regression & Correlation

2 In subsequent labs, we’ll be predicting actual size or time using linear regression based on historical estimated size data from previous labs. Note: this example shows historical data for 13 labs

3 SE-280 Dr. Mark L. Hornick 3 Linear Regression prediction for Actual LOC vs Estimated LOC (Proxy LOC)

4 By fitting a regression line to historical data, we can compensate for estimating errors. Slope =  1 Offset =  0 Projected value (corrected estimate) Raw x estimate

5 To compute a new estimate, we use the regression line equation. x est = raw estimate y proj = projected value (corrected estimate)  0 = offset of regression line  1 = slope of regression line

6 These formulas are used to calculate the regression parameters.

7 SE-280 Dr. Mark L. Hornick 7 Correlation (r) is a measure of the strength of linear relationship between two sets of variables Value is +1 in the case of a (perfectly) increasing linear relationship Or -1 in the case of a perfectly decreasing relationship Some value in-between in all other cases Indicates the degree of linear dependence between the variables The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables r > 0.7 is considered “good” for PSP planning purposes

8 After calculating the regression parameters (  values), we can also calculate the correlation coefficient. To get the correlation coefficient (r), we first need to calculate r 2. With a single independent variable (x), we can get a signed correlation coefficient. In the general case, we only get the absolute value of the correlation coefficient (|r|); the "direction" of the correlation is determined by the sign of the   "slope" value. The correlation coefficient [0.0 to 1.0] is a measure of how well (high) or poorly (low) the historical data points fall on or near the regression line.

9 Let's look at an example of calculating the correlation. For future reference, these data points come from test case 4 of lab 2.

10 We have already discussed how to calculate the regression parameters (beta values).  0 = 3.29467  1 = 0.01463

11 If we evaluate the regression line equation at each x value, we get the predicted y values. y pred = 0.01463 x + 3.29467

12 To determine the correlation, we also need to calculate the mean y value (y). y= 6.07 (Mean of original y values)

13 Next, we need to sum the squares of two differences: (y – y) and (y pred – y). y – y y pred – y

14 Once we have the two sums, we can calculate the correlation coefficient. Just in case you are curious, statisticians label the sum-square values like this: Total sum of squares (variability) Sum of squares – predicted (explained) Sum of squares – error (unexplained) One more time, where do the "y pred " values come from?

15 Here are the actual numbers used to calculate the correlation in this example. X valuesY valuesY meanY predYpred-YavgY-YavgYpred-Y(Ypred-Yavg)^2(Y-Yavg)^2(Ypred-Y)^2 3509.706.078.412.343.63-1.295.4913.171.65 2006.506.076.220.150.43-0.280.020.180.08 372.206.073.84-2.24-3.871.645.0014.992.68 1405.906.075.34-0.73-0.17-0.560.530.030.31 2433.606.076.850.78-2.473.250.606.1110.55 686.706.074.29-1.780.63-2.413.180.405.81 2917.906.077.551.481.83-0.352.193.340.12 17.038.221.2

16 SE-280 Dr. Mark L. Hornick 16 We said we needed historical data to make predictions based on regression analysis How do we know when it’s OK to use regression? 1. Quantity of data is satisfactory We must have at least three points!  It’s good to have a lot more  10 or more most recent projects are adequate 2. Quality of data is satisfactory Data points must correlate (r 2  0.5, |r|  0.707)  The means that your process must be stable (repeatable)

17 SE-280 Dr. Mark L. Hornick 17 We can also use linear regression to predict actual time. Other examples on page 39. 0 0.5 1 1.5 2 2.5 3 3.5 050100150200 Actual Size (LOC) Actual Time (hrs) xkxk ykyk


Download ppt "SE-280 Dr. Mark L. Hornick 1 Statistics Review Linear Regression & Correlation."

Similar presentations


Ads by Google