Download presentation
Presentation is loading. Please wait.
1
Multiple regression Partial F-tests
2
Predicting a child’s weight based on his/her height and age
3
Which model to use? Y=Bo+B1 Hgt + B2 Age + E
Y=Bo+B1 Hgt + B2 Age + B3 Hgt2+ E Y=Bo+B1 Hgt + B2 Age + B3 Age2 + E Y=Bo+B1 Hgt + B2 Age + B3 Hgt2 + B4 Age2 +B5 Hgt*Age + E ?????????????????????????
4
Relationships
5
Y=Bo+B1 Hgt + E
6
Y=Bo+B2 Age + E
7
Y=Bo+B3 Age2 + E
8
Y=Bo+B1 Hgt +B2 Age+ E
9
Y=Bo+B1 Hgt +B3 Age2+ E
10
Y=Bo+B1 Hgt +B2 Age+B3 Age2+E
11
F tests Set up: for our model we have available a bunch of variables say X1, X2,…,Xk H0: reduced model (model using some of the variables, leaving out Xi1, Xi2,…,Xir ) Bi1=0, Bi2=0,…,Bir=0 Ha: full model ( model using all the variables) At least one of the Bi1, Bi2,…,Bir is not 0
12
Test statistic Notation: SSE(R)= SSE for reduced model
dfR= degrees of fredom of SSE(R) SSE(F)= SSE for full model dfF= degrees of fredom of SSE(F) Statistic: (SSE(R)-SSE(F))/(dfR- dfF) F*= SSE(F)/ dfF
13
Conclusions P-value not small -> go with H0
the addition of the variables Xi1, Xi2,…,Xir was not significant P-value small -> go with Ha Xi1, Xi2,…,Xir was significant
14
Notation: SST can be decomposed into different but related sums of squares. For example:
SST=SSE(X1)+SSR(X1) SST=SSE(X1,X2)+SSR(X1,X2) SSE(X1,X2) is smaller than SSE(X1), so the part of sum of square errors has experienced a decrease. Now, what ever it lost, it was gained by the regression sum of squares. SSE(X1,X2)= SSE(X1)-A SSR(X1,X2)= SSR(X1)+A This reduction in sum of square error is the same as the gain in regression sum of squares and we denote it by A=SSR(X2|X1)= SSE(X1)-SSE(X1,X2) the additional contribution obtained by adding the variable X2 to a model with X1 already in it.
15
In particular notice that
SSR(X1,X2)=SSR(X1)+A=SSR(X1)+SSR(X2|X1) That is, we have decomposed SSR(X1): SSR(X1,X2)=SSR(X1)+SSR(X2|X1) If we had more variables to consider, we can continue this decomposition: SSR(X1,X2,X3)=SSR(X1)+ SSR(X2|X1)+SSR(X3|X1,X2) And so on.
16
Following that notation we can write the F-statistic in the F-test as:
let SSR(F|R)=SSE(R)-SSE(F) and dfF|R=dfR - dfF Then the F-statistic can be written as SSR(F|R)/dfF|R F*= SSE(F)/ dfF
17
We will consider in our problem, adding one variable at a time and doing an F test to see at each step if the added variable is significant. We will use the following table. Notice the decomposition of the regression sum of squares:
18
Computations for our problem:
19
Partial F tests: one at a time
Test 1: adding X1 to the model without any variables. Reduced model: Y= Bo+ E Full model: Y= Bo+ B1 X1+ E F(X1)=SSR(X1)/MSE(X1)=888.92/(299.3/10)=19.67 R2=.663 Ra2=.629 (for full model) P-value= > X1 is significant
20
Test 2: adding X2 to the model with X1 in it.
Reduced model: Y= Bo+B1 X1+ E Full model: Y= Bo+B1 X1+B2 X2 + E F(X2|X1)=SSR(X2|X1)/(SSE(X1,X2)/9)=4.78 R2=.779 Ra2=.73 (for full model) P- value= > X2 is significant Test 3: adding X3 to the model with X1 and X2 in it. Reduced model: Y= Bo+B1 X1 +B2 X2 + E Full model: Y= Bo+B1 X1+B2 X2 +B3 X3 + E F(X3|X1,X2)=SSR(X3|X1, X2)/(SSE(X1,X2,X3)/8)=.009 R2=.78 Ra2=.697 (for full model) P-value= > adding X3 is not significant
21
Conclusion Following the strategy of adding one variable at a time we conclude that an appropriate model to explain the weight of children base on their height and age is Y= Bo+ B1 X1 + B2 X2 + E that is Wgt= Bo+ B1 Hgt + B2 Age+ E
22
Now let’s do a different analysis: variables added last
Now let’s do a different analysis: variables added last. We will see the significant of each variable when they are added last to a model with all the other variables in it
23
Test 1: adding X1 to the model with X2 and X3 in it.
Reduced model: Y= Bo+ B2 X2 + B3 X3 + E Full model: Y= Bo+ B1 X1 + B2 X2 + B3 X3 + E F(X1|X2,X3)=SSR(X1|X2,X3)/MSE(X1,X2,X3) =166.58/(195.19/8)=6.83 P-value= > X1 is significant Test 2: adding X2 to the model with X1 and X3 in it. Reduced model: Y= Bo+ B1 X1 + B3 X3 + E F(X2|X1,X3)=SSR(X2|X1,X3)/MSE(X1,X2,X3) =101.82/(195.19/8)=4.17 P-value= > X2 is significant
24
Test 3: adding X3 to the model with X1 and X2 in it.
Reduced model: Y= Bo+ B1 X1 + B2 X2 + E Full model: Y= Bo+ B1 X1 + B2 X2 + B3 X3 + E F(X3|X1,X2)=SSR(X3|X1,X2)/MSE(X1,X2,X3)=.009 P-value= > X3 is not significant In this case, with this technique we obtained the same conclusion
25
Compare it to t-test
26
Labor costs vs cases, indirect costs and holidays problem
Let’s use the strategy of adding one variable at a time. X1=cases, X2=indirect costs, X3=holidays X4=cases * indirect costs X5=cases * holidays X6= indirect costs * holidays In the table below, the computations of the F-statistics are in the framed boxes
27
Labor vs Cases, IndCost,Holiday
28
Model: Y= Bo+ B1 X1 + B3 X3 + B4 X4 + B5 X5 + B6 X6 + E
29
Coefficients of partial determination
Recall R2=SSR/SST is the ratio of variation in the model with all the variables in it (full), to the total, which is a model with no variables in it (reduced). Similarly we can define: R2Y2|1=SSR(X2|X1)/SSE(X1) =[SSE(X1)-SSE(X1,X2)]/SSE(X1) This is the proportionate reduction of variation in Y remaining after X1 is included in the model that is gained by also including X2 in the model. In general, going from a reduced model to a full model: R2Y F|R=SSR(F|R)/SSE(R)
30
Children’s weight Labor vs Cases, IndCost,Holiday
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.