EXTRA SUMS OF SQUARES  Extra sum squares  Decomposition of SSR  Usage of Extra sum of Squares  Coefficient of partial determination.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Topic 12: Multiple Linear Regression
Chapter 7: Multiple Regression II Ayona Chatterjee Spring 2008 Math 4813/5813.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Hypothesis Testing Steps in Hypothesis Testing:
Topic 15: General Linear Tests and Extra Sum of Squares.
1 G Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
CORRELATION AND SIMPLE LINEAR REGRESSION - Revisited Ref: Cohen, Cohen, West, & Aiken (2003), ch. 2.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Chapter 11 Multiple Regression.
Multiple Linear Regression
Introduction to Probability and Statistics Linear Regression and Correlation.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Ch. 14: The Multiple Regression Model building
Simple Linear Regression and Correlation
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
SIMPLE LINEAR REGRESSION
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Topic 16: Multicollinearity and Polynomial Regression.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Correlation and Regression
Econ 3790: Business and Economics Statistics
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Elementary Statistics Correlation and Regression.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Buddy Freeman, 2015 Multiple Linear Regression (MLR) Testing the additional contribution made by adding an independent variable.
Sequential sums of squares … or … extra sums of squares.
Analisa Regresi Week 7 The Multiple Linear Regression Model
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
The general linear test approach to regression analysis.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Multiple Regression Example A hospital administrator wished to study the relation between patient satisfaction (Y) and the patient’s age (X 1 ), severity.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Statistics 350 Lecture 19. Today Last Day: R 2 and Start Chapter 7 Today: Partial sums of squares and related tests.
AP STATISTICS LESSON 3 – 3 (DAY 2) The role of r 2 in regression.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Stats Methods at IC Lecture 3: Regression.
Relationship with one independent variable
Simple Linear Regression
Quantitative Methods Simple Regression.
Multiple Regression II
Cases of F-test Problems with Examples
Multiple Regression II
AP STATISTICS LESSON 3 – 3 (DAY 2)
Relationship with one independent variable
Statistics 350 Lecture 18.
Introduction to Regression
Multiple regression Partial F-tests.
Introduction to Regression
Presentation transcript:

EXTRA SUMS OF SQUARES  Extra sum squares  Decomposition of SSR  Usage of Extra sum of Squares  Coefficient of partial determination

 A different way to look at the comparison of models.  It measures the marginal reduction in the error sum of squares when one or several independent variablea are added to the regression model, given that other indep variables are already in the model.  It shows the marginal increasing in the regression sum of squares squares when one or several independent variablea are added to the regression model.

 Look at the difference  in SSE  In SSM  Because SSM+SSE=SST, these two ways are equivalent  Models we compare are hierarchical in the sense that one includes all of the explanatory variables of the other

 We can compare models with different explanatory variables  X 1, X 2 vs X 1  X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3  Note first includes all Xs of second  We will get an F test that compares the two models  We are testing a null hypothesis that the regression coefficients for the extra variables are all zero

 For X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3  H 0 : β 4 = β 5 = 0  H 1 : β 4 and β 5 are not both 0  Degrees of freedom for the F statistic are the number of extra variables and the dfE for the model with larger number of explanatory variables  Suppose n=100 and we compare models with X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3  Numerator df is 2  Denominator df is n-6 = 94

 Predict bone density using age, weight and height; does diet add any useful information?  Predict faculty salaries using highest degree (categorical), rank (categorical), time in rank, and department (categorical); does race (gender) add any useful information?

 Predict GPA using 3 HS grade variables; do SAT scores add any useful information?  Predict yield of an industrial process using temperature and pH; does the supplier of the raw material (categorical) add any useful information?

Suppose we have the case of two X variables  The total sum of squares that only X 1 is in the model given by: SST = SSR(X 1 ) + SSE(X 1 )  When we add X 2 in the model while X 1 is already in the model, SST become: SST = SSR(X 1 ) + SSR(X 2 |X 1 ) + SSE(X 2,X 1 )  Equivalent to: SST = SSR(X 2,X 1 ) + SSE(X 2,X 1 )

Suppose we have the case of two X variables  The total sum of squares that only X 1 is in the model given by: SST = SSR(X 1 ) + SSE(X 1 )  When we add X 2 in the model while X 1 is already in the model, SST become: SST = SSR(X 1 ) + SSR(X 2 |X 1 ) + SSE(X 2,X 1 )  Equivalent to: SST = SSR(X 2,X 1 ) + SSE(X 2,X 1 )

Hence decomposition of the regression sum of squares SSR(X 2,X 1 ) into two marginal components are: 1. SSR (X 1 ), measuring the contribution by including X 1 alone in the model, and 2. SSR(X 2 |X 1 ), measuring the additional contribution when X 2 is included, given that X 1 is already in the model.

Therefore, when the regression model contains three X variables, a variety decomposition of SSR(X 1,X 2,X 3 ) can be obtained such as:  SSR(X 1,X 2,X 3 ) = SSR(X 1 ) + SSR(X 2 |X 1 ) + SSR(X 3 |X 1,X 2 )  SSR(X 1,X 2,X 3 ) = SSR(X 2 ) + SSR(X 3 |X 2 ) + SSR(X 1 |X 2,X 3 )  SSR(X 1,X 2,X 3 ) = SSR(X 1 ) + SSR(X 2,X 3 |X 1 )

ANOVA tables containing decomposition of SSR is as follows: Source of Variation Sum of Squares Degrees of Freedom Mean Squares RegressionSSR(X 1,X 2,X 3 )3MSR(X 1,X 2,X 3 ) X1X1 SSR(X 1 )1MSR(X 1 ) X 2 |X 1 SSR(X 2 |X 1 )1MSR(X 2 |X 1 ) X 3 |X 1,X 2 SSR(X 3 |X 1,X 2 )1MSR(X 3 |X 1,X 2 ) ErrorSSE(X 1,X 2,X 3 )n-4MSE(X 1,X 2,X 3 ) TotalSSTn-1

 Test whether a single β k = 0 Suppose we have a regression with 3 variables indpendent. A full model is given by: and SSE(F) = SSE(X 1,X 2,X 3 ) with df=n-4 To test whether β 2 = 0,we have an alternative model: As a reduced model and SSE(R) = SSE(X 1, X 3 ) with df=n-3

The general linear test statistics is given by

 Test whether several β k = 0 Suppose we have a regression with 3 variables indpendent. A full model is given by: and SSE(F) = SSE(X 1,X 2,X 3 ) with df=n-4 To test whether β 2 = β 3 = 0,we have an alternative model: As a reduced model and SSE(R) = SSE(X 1 ) with df=n-2

The general linear test statistics is

 Full model with p-1 input variables is given by:  The least squares estimator : b F =(X’X) -1 X’Y  and the error sumof squares is given by: SSE(F) = ( Y’Y – b’ F X’Y )  Reduced model with a single or several β k = 0 is given by: where C β = h, C is sxp matrix of rank s and h is a specified sx1 vector.

 Example 1: To test whether β 2 = 0 in a regression model contains 2 indep variables, then C = [0 0 1] and h = [0], and we have:  Example 2: To test whether β 1 = β 2 = 0 in a regression model contains 2 indep variables, then

 and we have:  And the least squares estimator : b R =b F - (X’X) -1 C’(C (X’X) -1 C’) -1 (Cb F -h) and the error sumof squares is given by: SSE(R) = ( Y’Y – Xb R )’(Y’Y – Xb R ) Which has associated with it df R = n – (p-s)

 and we have:  And the least squares estimator : b R =b F - (X’X) -1 C’(C (X’X) -1 C’) -1 (Cb F -h) and the error sumof squares is given by: SSE(R) = ( Y’Y – Xb R )’(Y’Y – Xb R ) Which has associated with it df R = n – (p-s)

 Test Statistics where

A study of the realtion of amount of body fat (Y) to several possible explanatory, independent variables, based on a sample of 20 healthy females age years old. The possible independent variables are triceps skinfold thickness (X 1 ), thigh circumference (X 2 ) and midarm circumference(X 3 ). Underwater weighing is the alternative

Suppose we modelize based on all X variables, We have anova table as follows: And the least square estimators for regr coeff: SourceSSdfMSFPr>F Model <.0001 Error Total VarbS(b)tPr>F int skinfold4, thigh midarm

 The P value for F(3, 16) is <.0001  But the P values for the individual regression coefficients are , , and  None of these are near our standard of 0.05  What is the explanation?

Var Type I SS Type II SS skinfold thigh midarm Total  Fact: the Type I and Type II SS are very different  If we reorder the variables in the model statement we will get  Different Type I SS  The same Type II SS

Rerun with skinfold as the explanatory variable: And the least square estimators for regr coeff: SourceSSdfMSFPr>F Model <.0001 Error Total VarbS(b)tPr>F int skinfold <0.0001

Rerun with skinfold as the explanatory variable: SourceSSdfMSFPr>F Model <.0001 X1X X 2,X 3 |X Error Total

 A coeff of multiple determination R 2, measures the proportionate reduction in the variation of Y achieved by the introduction of the entire set of X variables considered in the model.  A coeff of partial determination r 2, measures the marginal contribution of one X variable, when all others are already in cluded in the model.

 Suppose we have a regression with 2 variables indpendent. A full model is given by: The coeff of partial determination between Y and X 1, given that X 2 is in the model is measured by

The coeff of partial determination between Y and X 2, given that X 1 is in the model is measured by

General Case The coeff of partial determination to 3 or more indep. variables in the model is immediate. For instance:

Coeff of partial correlation is the square root of a coeff of partial determination, it sign is the same as the corresponding regression coeff. The coeff of partial determination can be expressed in terms of simple or other partial correlations. For example:

 Can help reduce round off errors in calculations  Puts regression coefficients in common units  Units for the usual coefficients are units for Y divided by units for X

 Standardized can be obtained from the usual ones by multiplying by the ratio of the standard deviation of X to the standard deviation of Y  Interpretation is that a one sd increase in X corresponds to a ‘standardized beta’ increase in Y

 Y = … + β X + …  = … + β (s X /s Y )(s Y /s X )X + …  = … + ( β (s X /s Y )) ((s Y /s X )X) + …  = … + ( β (s X /s Y )) (s Y ) (X/s X ) + …

 Standardize Y and all X’s (subtract mean and divide by standard deviation  Then divide by n-1  The regression coefficients for variables transformed in this way are the standardized regression coefficients

 Reading NKMW 8.1 to 8.3  Exercise NKMW page no 8.3, 8.12  Homework NKMW page no 8.25, 8.31