Multiple regression Partial F-tests.

Multiple regression Partial F-tests

Predicting a child’s weight based on his/her height and age

Which model to use? Y=Bo+B1 Hgt + B2 Age + E
Y=Bo+B1 Hgt + B2 Age + B3 Hgt2+ E Y=Bo+B1 Hgt + B2 Age + B3 Age2 + E Y=Bo+B1 Hgt + B2 Age + B3 Hgt2 + B4 Age2 +B5 Hgt*Age + E ?????????????????????????

Relationships

Y=Bo+B1 Hgt + E

Y=Bo+B2 Age + E

Y=Bo+B3 Age2 + E

Y=Bo+B1 Hgt +B2 Age+ E

Y=Bo+B1 Hgt +B3 Age2+ E

Y=Bo+B1 Hgt +B2 Age+B3 Age2+E

F tests Set up: for our model we have available a bunch of variables say X1, X2,…,Xk H0: reduced model (model using some of the variables, leaving out Xi1, Xi2,…,Xir ) Bi1=0, Bi2=0,…,Bir=0 Ha: full model ( model using all the variables) At least one of the Bi1, Bi2,…,Bir is not 0

Test statistic Notation: SSE(R)= SSE for reduced model
dfR= degrees of fredom of SSE(R) SSE(F)= SSE for full model dfF= degrees of fredom of SSE(F) Statistic: (SSE(R)-SSE(F))/(dfR- dfF) F*= SSE(F)/ dfF

Conclusions P-value not small -> go with H0
the addition of the variables Xi1, Xi2,…,Xir was not significant P-value small -> go with Ha Xi1, Xi2,…,Xir was significant

Notation: SST can be decomposed into different but related sums of squares. For example:
SST=SSE(X1)+SSR(X1) SST=SSE(X1,X2)+SSR(X1,X2) SSE(X1,X2) is smaller than SSE(X1), so the part of sum of square errors has experienced a decrease. Now, what ever it lost, it was gained by the regression sum of squares. SSE(X1,X2)= SSE(X1)-A SSR(X1,X2)= SSR(X1)+A This reduction in sum of square error is the same as the gain in regression sum of squares and we denote it by A=SSR(X2|X1)= SSE(X1)-SSE(X1,X2) the additional contribution obtained by adding the variable X2 to a model with X1 already in it.

In particular notice that
SSR(X1,X2)=SSR(X1)+A=SSR(X1)+SSR(X2|X1) That is, we have decomposed SSR(X1): SSR(X1,X2)=SSR(X1)+SSR(X2|X1) If we had more variables to consider, we can continue this decomposition: SSR(X1,X2,X3)=SSR(X1)+ SSR(X2|X1)+SSR(X3|X1,X2) And so on.

Following that notation we can write the F-statistic in the F-test as:
let SSR(F|R)=SSE(R)-SSE(F) and dfF|R=dfR - dfF Then the F-statistic can be written as SSR(F|R)/dfF|R F*= SSE(F)/ dfF

We will consider in our problem, adding one variable at a time and doing an F test to see at each step if the added variable is significant. We will use the following table. Notice the decomposition of the regression sum of squares:

Computations for our problem:

Partial F tests: one at a time
Test 1: adding X1 to the model without any variables. Reduced model: Y= Bo+ E Full model: Y= Bo+ B1 X1+ E F(X1)=SSR(X1)/MSE(X1)=888.92/(299.3/10)=19.67 R2=.663 Ra2=.629 (for full model) P-value= > X1 is significant

Test 2: adding X2 to the model with X1 in it.
Reduced model: Y= Bo+B1 X1+ E Full model: Y= Bo+B1 X1+B2 X2 + E F(X2|X1)=SSR(X2|X1)/(SSE(X1,X2)/9)=4.78 R2=.779 Ra2=.73 (for full model) P- value= > X2 is significant Test 3: adding X3 to the model with X1 and X2 in it. Reduced model: Y= Bo+B1 X1 +B2 X2 + E Full model: Y= Bo+B1 X1+B2 X2 +B3 X3 + E F(X3|X1,X2)=SSR(X3|X1, X2)/(SSE(X1,X2,X3)/8)=.009 R2=.78 Ra2=.697 (for full model) P-value= > adding X3 is not significant

Conclusion Following the strategy of adding one variable at a time we conclude that an appropriate model to explain the weight of children base on their height and age is Y= Bo+ B1 X1 + B2 X2 + E that is Wgt= Bo+ B1 Hgt + B2 Age+ E

Now let’s do a different analysis: variables added last
Now let’s do a different analysis: variables added last. We will see the significant of each variable when they are added last to a model with all the other variables in it

Test 1: adding X1 to the model with X2 and X3 in it.
Reduced model: Y= Bo+ B2 X2 + B3 X3 + E Full model: Y= Bo+ B1 X1 + B2 X2 + B3 X3 + E F(X1|X2,X3)=SSR(X1|X2,X3)/MSE(X1,X2,X3) =166.58/(195.19/8)=6.83 P-value= > X1 is significant Test 2: adding X2 to the model with X1 and X3 in it. Reduced model: Y= Bo+ B1 X1 + B3 X3 + E F(X2|X1,X3)=SSR(X2|X1,X3)/MSE(X1,X2,X3) =101.82/(195.19/8)=4.17 P-value= > X2 is significant

Test 3: adding X3 to the model with X1 and X2 in it.
Reduced model: Y= Bo+ B1 X1 + B2 X2 + E Full model: Y= Bo+ B1 X1 + B2 X2 + B3 X3 + E F(X3|X1,X2)=SSR(X3|X1,X2)/MSE(X1,X2,X3)=.009 P-value= > X3 is not significant In this case, with this technique we obtained the same conclusion

Compare it to t-test

Labor costs vs cases, indirect costs and holidays problem
Let’s use the strategy of adding one variable at a time. X1=cases, X2=indirect costs, X3=holidays X4=cases * indirect costs X5=cases * holidays X6= indirect costs * holidays In the table below, the computations of the F-statistics are in the framed boxes

Labor vs Cases, IndCost,Holiday

Model: Y= Bo+ B1 X1 + B3 X3 + B4 X4 + B5 X5 + B6 X6 + E

Coefficients of partial determination
Recall R2=SSR/SST is the ratio of variation in the model with all the variables in it (full), to the total, which is a model with no variables in it (reduced). Similarly we can define: R2Y2|1=SSR(X2|X1)/SSE(X1) =[SSE(X1)-SSE(X1,X2)]/SSE(X1) This is the proportionate reduction of variation in Y remaining after X1 is included in the model that is gained by also including X2 in the model. In general, going from a reduced model to a full model: R2Y F|R=SSR(F|R)/SSE(R)

Children’s weight Labor vs Cases, IndCost,Holiday

Multiple regression Partial F-tests.

Similar presentations

Presentation on theme: "Multiple regression Partial F-tests."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiple regression Partial F-tests.

Similar presentations

Presentation on theme: "Multiple regression Partial F-tests."— Presentation transcript:

Similar presentations

About project

Feedback