Sequential sums of squares … or … extra sums of squares
Sequential sums of squares: what are they? The reduction in the error sum of squares when one or more predictor variables are added to the regression model. Or, the increase in the regression sum of squares when one or more predictor variables are added to the regression model.
Sequential sums of squares: why? They can be used to test whether one slope parameter is 0. They can be used to test whether a subset (more than two, but less than all) of the slope parameters are 0.
Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the PIQ (performance) scores from the (revised) Wechsler Adult Intelligence Scale. Predictor (X 1 ): Brain size based on MRI scans (given as count/10,000) Predictor (X 2 ): Height in inches Predictor (X 3 ): Weight in pounds
OUTPUT #1 The regression equation is PIQ = MRI Predictor Coef SE Coef T P Constant MRI Analysis of Variance Source DF SS MS F P Regression Error Total
OUTPUT #2 The regression equation is PIQ = MRI Height Predictor Coef SE Coef T P Constant MRI Height Analysis of Variance Source DF SS MS F P Regression Residual Total Source DF Seq SS MRI Height
OUTPUT #3 The regression equation is PIQ = MRI Height Weight Predictor Coef SE Coef T P Constant MRI Height Weight Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS MRI Height Weight 1 0.0
Sequential sums of squares: definition using SSE notation SSR(X 2 |X 1 ) = SSE(X 1 ) - SSE(X 1,X 2 ) In general, you subtract the error sum of squares due to all of the predictors both left and right of the bar from the error sum of squares due to the predictor to the right of the bar. SSR(X 2,X 3 |X 1 ) = SSE(X 1 ) - SSE(X 1,X 2,X 3 )
Sequential sums of squares: definition using SSR notation SSR(X 2 |X 1 ) = SSR(X 1,X 2 ) – SSR(X 1 ) In general, you subtract the regression sum of squares due to the predictor to the right of the bar from the regression sum of squares due to all of the predictors both left and right of the bar. SSR(X 2,X 3 |X 1 ) = SSR(X 1,X 2,X 3 )-SSR(X 1 )
Decomposition of regression sum of squares In multiple regression, there is more than one way to decompose the regression sum of squares. For example:
OUTPUT #2 The regression equation is PIQ = MRI Height Predictor Coef SE Coef T P Constant MRI Height Analysis of Variance Source DF SS MS F P Regression Residual Total Source DF Seq SS MRI Height
OUTPUT #4 The regression equation is PIQ = Height MRI Predictor Coef SE Coef T P Constant Height MRI Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Height MRI
Decomposition of SSR: how?
Even more ways to decompose SSR when 3 or more predictors
Degrees of freedom and regression mean squares A sequential sum of squares involving one extra predictor variable has one degree of freedom associated with it: A sequential sum of squares involving two extra predictor variables has two degrees of freedom associated with it:
Sequential sums of squares in Minitab The SSR is automatically decomposed into one-degree-of-freedom sequential sums of squares, in the order in which the predictor variables are entered into the model. To get sequential sum of squares involving two or more predictor variables, sum the appropriate one-degree-of-freedom sequential sums of squares.
OUTPUT #3 The regression equation is PIQ = MRI Height Weight Predictor Coef SE Coef T P Constant MRI Height Weight Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS MRI Height Weight 1 0.0
OUTPUT #5 The regression equation is PIQ = Height Weight MRI Predictor Coef SE Coef T P Constant Height Weight MRI Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Height Weight MRI
Testing one slope β 1 = β MRI is 0 Predictor Coef SE Coef T P Constant Height Weight MRI Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Height Weight MRI
Testing one slope β 2 = β HT is 0 Predictor Coef SE Coef T P Constant MRI Weight Height Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS MRI Weight Height
Testing one slope β 3 = β WT is 0 Predictor Coef SE Coef T P Constant MRI Height Weight Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS MRI Height Weight 1 0.0
Testing one slope β k is 0: why it works? Full model:Reduced model:
Testing one slope β k is 0: why it works? (cont’d) The general linear test statistic: becomes:
Testing whether β 2 = β 3 = 0 Full model:Reduced model:
Testing whether β 2 = β 3 = 0 (cont’d) The general linear test statistic: becomes:
OUTPUT #3 The regression equation is PIQ = MRI Height Weight Predictor Coef SE Coef T P Constant MRI Height Weight Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS MRI Height Weight 1 0.0
Cumulative Distribution Function F distribution with 2 DF in numerator and 34 DF in denominator x P( X <= x ) P-value is:
Getting P-value for F-statistic in Minitab Select Calc >> Probability Distributions >> F… Select Cumulative Probability. Use default noncentrality parameter of 0. Type in numerator DF and denominator DF. Select Input constant. Type in F-statistic. Answer appears in session window. P-value is 1 minus the number that appears.
Test whether β 1 = β 3 = 0 Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Height Weight MRI