SE-280 Dr. Mark L. Hornick Multiple Regression (Cycle 4)

SE-280 Dr. Mark L. Hornick 2 0 0.5 1 1.5 2 2.5 3 3.5 050100150200 Est Proxy Size (LOC) Time (hrs) Review: Linear Regression (from Cycle 3) x k = estimated LOC (in this example) y k = estimated time (in this example)  0 = offset  1 = slope xkxk ykyk

SE-280 Dr. Mark L. Hornick 3 The regression algorithm assumed a single independent variable. Estimated proxy size (E) Added+modified size (A+M) Development effort (time) y size =β s0+ β s1 x k y time =β t0 +β t1 x k

SE-280 Dr. Mark L. Hornick 4 Can we still apply regression if our estimates involve more than one independent variable? Web pages (JSP) Database tables Java classes If development of each component type is completely independent, we can make separate estimates and add them up. But what if they are so interdependent that we can't do that?

One possible solution is to use multiple regression m = number of independent variables (j =1..m)  0 = offset  j = "slope" relative to each independent variable x k,j = current independent value estimates (e.g. proxy size) y k = projected value (e.g., time) Where do the  values come from?

SE-280 Dr. Mark L. Hornick 6 The  values are calculated by solving a system of linear equations:. n = number of historical data points (i = 1..n) x i,j = historical independent variable values y i = historical dependent variable values

SE-280 Dr. Mark L. Hornick 7 The same equation in matrix form. Use these slides as reference when you implement Cycle 3

SE-280 Dr. Mark L. Hornick 8 When rank(A)=2 (that is, 1 independent variable), the familiar regression equations result when the equations are refactored:

To solve numerically for the  values, we need to calculate values for the A and b matrices. where

If you look close, you will see a pattern to the equation coefficients. What about the x ?,0 terms? They are “fictitious” values that are treated as being equal to one (1.0)!!!

SE-280 Dr. Mark L. Hornick 11 In general, (especially for rank(A)>2), we have to solve the system of equations to get the  i values. Mathematically, we can do this by inverting the coefficient matrix A. However, it's more common to solve the equations using a technique like Gauss-Jordan elimination with back substitution (remember that?).

OK, let's try an example, using test case 4-3 from Cycle 3. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 0 Row 1 Row 2 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1 What is the value of a 0,0 ? We pretend there is an x0 column, with all "1" values.

Since A 0,0 is always "n", the number of points, let's try another matrix element. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07 Row 1 Row 2 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1 What is the value of a 0,1 ?

Since a 0,1 is the sum of the x 1 values, a 0,2 must be the sum of the x 2 values. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 1 Row 2 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1

Now, what about the rest of the first column in the A matrix? x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 1 Row 2 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1 What is the value of a 1,0 ?

Yes, the values in column 0 are the same as those in row 0, making the matrix symmetric (at least so far). x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 190 Row 21571 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1

Next, let's try an element on the diagonal of matrix A. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 190 Row 21571 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1 Sum of squares = 1746 What is the value of a 1,1 ?

Since the same independent variable appears twice in the diagonal terms, they are computing by summing the squares. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 1901746 Row 21571440239 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1

Let's try one of the remaining terms. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 1901746 Row 21571440239 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1 Sum of products = 26905 What is the value of a 1,2 ?

Since sum of products for x 1 and x 2 is the same as for x 2 and x 1, the last remaining value is the same. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 190174626905 Row 2157126905440239 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1

OK, now how about the b vector? x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 190174626905 Row 2157126905440239 bCol 0 Row 0 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1 What is the value of b 0 (b 0,0 )? Sum of y values = 2438

b 0 is always the sum of the y values; let's try the next one. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 190174626905 Row 2157126905440239 bCol 0 Row 02438 Row 1 Row 2 "x 0 " 1 1 1 1 1 1 1 Sum of products = 36506 What is the value of b 1 (b 1,0 )?

OK, we need just one more value. x1x1 x2x2 y 23279367 28421584 11256387 242131 6164351 16265218 4144400 ACol 0Col 1Col 2 Row 07901571 Row 190174626905 Row 2157126905440239 bCol 0 Row 02438 Row 136506 Row 2 What is the value of b 2 (b 2,0 )? Sum of products = 625765 "x 0 " 1 1 1 1 1 1 1

Finally, we have our A and b matrices, and can solve for the  values. ACol 0Col 1Col 2 Row 07901571 Row 190174626905 Row 2157126905440239 bCol 0 Row 02438 Row 136506 Row 2625765  Col 0 Row 098.472 Row 1-11.261 Row 21.7582 Matrix betas = a.solve(b);

To extend the correlation calculation to handle multiple independent variables, the only change is in calculating the predicted y values. One independent variable ("linear regression") One or more independent variables ("multiple regression") Obviously, both are forms of "linear" regression, despite the names.

We use the usual approach to calculate the correlation coefficient. Just in case you are curious, the statisticians label the sum-square values like this: Total sum of squares (variability) Sum of squares – predicted (explained) Sum of squares – error (unexplained)

SE-280 Dr. Mark L. Hornick 27 Another Numerical Example y = {16, 30, 44, 58, 96} x 1 = {0, 2, 4, 6, 8} x 2 = {1, 3, 5, 7, 11} x 3 = {1, 2, 3, 4, 10} Determine  values such that

SE-280 Dr. Mark L. Hornick 28 The values in the data vectors are used in the summation terms of the matrices: y = {16, 30, 44, 58, 96} x 1 = {0, 2, 4, 6, 8} x 2 = {1, 3, 5, 7, 11} x 3 = {1, 2, 3, 4, 10}

SE-280 Dr. Mark L. Hornick 29 Evaluating terms in the matrices:.

SE-280 Dr. Mark L. Hornick Multiple Regression (Cycle 4)

Similar presentations

Presentation on theme: "SE-280 Dr. Mark L. Hornick Multiple Regression (Cycle 4)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SE-280 Dr. Mark L. Hornick Multiple Regression (Cycle 4)

Similar presentations

Presentation on theme: "SE-280 Dr. Mark L. Hornick Multiple Regression (Cycle 4)"— Presentation transcript:

Similar presentations

About project

Feedback