Topic 15: General Linear Tests and Extra Sum of Squares.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Regression and correlation methods
Chapter 7: Multiple Regression II Ayona Chatterjee Spring 2008 Math 4813/5813.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
EPI 809/Spring Probability Distribution of Random Error.
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Multiple regression analysis
Chapter 12 Multiple Regression
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
Multiple Linear Regression
Introduction to Probability and Statistics Linear Regression and Correlation.
EPI809/Spring Testing Individual Coefficients.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Example of Simple and Multiple Regression
Topic 16: Multicollinearity and Polynomial Regression.
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Sequential sums of squares … or … extra sums of squares.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Analisa Regresi Week 7 The Multiple Linear Regression Model
Simple Linear Regression ANOVA for regression (10.2)
ANOVA for Regression ANOVA tests whether the regression model has any explanatory power. In the case of simple regression analysis the ANOVA test and the.
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.
Topic 4: Statistical Inference. Outline Statistical inference –confidence intervals –significance tests Statistical inference for β 1 Statistical inference.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
EXTRA SUMS OF SQUARES  Extra sum squares  Decomposition of SSR  Usage of Extra sum of Squares  Coefficient of partial determination.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
1 Topic 3 – Multiple Regression Analysis Regression on Several Predictor Variables (Chapter 8)
Topic 29: Three-Way ANOVA. Outline Three-way ANOVA –Data –Model –Inference.
Summary of the Statistics used in Multiple Regression.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Stats Methods at IC Lecture 3: Regression.
Reasoning in Psychology Using Statistics
Regression 10/29.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Linear Regression.
Prepared by Lee Revere and John Large
Model Comparison: some basic concepts
Statistics 350 Lecture 18.
Adding variables. There is a difference between assessing the statistical significance of a variable acting alone and a variable being added to a model.
Presentation transcript:

Topic 15: General Linear Tests and Extra Sum of Squares

Outline Extra Sums of Squares with applications Partial correlations Standardized regression coefficients

General Linear Tests Recall: A different way to look at the comparison of models Look at the difference –in SSE (reduce unexplained SS) –in SSM (increase explained SS) Because SSM+SSE=SST, these two comparisons are equivalent

General Linear Tests Models we compare are hierarchical in the sense that one (the full model) includes all of the explanatory variables of the other (the reduced model) We can compare models with different explanatory variables such as 1.X 1, X 2 vs X 1 2.X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3 (Note first model includes all Xs of second)

General Linear Tests We will get an F test that compares the two models We are testing a null hypothesis that the regression coefficients for the extra variables are all zero For X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3 –H 0 : β 4 = β 5 = 0 –H 1 : β 4 and β 5 are not both 0

General Linear Tests Degrees of freedom for the F statistic are the number of extra variables and the df E for the larger model Suppose n=100 and we compare models with X 1, X 2, X 3, X 4, X 5 vs X 1, X 2, X 3 Numerator df is 2 Denominator df is n-6 = 94

Notation for Extra SS SSE(X 1,X 2,X 3,X 4,X 5 ) is the SSE for the full model SSE(X 1,X 2,X 3 ) is the SSE for the reduced model SSE(X 4,X 5 | X 1,X 2,X 3 ) is the difference in the SSEs (reduced minus full) SSE(X 1,X 2,X 3 ) - SSE(X 1,X 2,X 3,X 4,X 5 ) Recall can replace SSE with SSM

F test Numerator : (SSE(X 4,X 5 | X 1,X 2,X 3 ))/2 Denominator : MSE(X 1,X 2,X 3,X 4,X 5 ) F ~ F(2, n-6) Reject if the P-value ≤ 0.05 and conclude that either X 4 or X 5 or both contain additional information useful for predicting Y in a linear model that also includes X 1, X 2, and X 3

Examples Predict bone density using age, weight and height; does diet add any useful information? Predict GPA using 3 HS grade variables; do SAT scores add any useful information? Predict yield of an industrial process using temperature and pH; does the supplier of the raw material (categorical) add any useful information?

Extra SS Special Cases Compare models that differ by one explanatory variable, F(1,n-p)=t 2 (n-p) SAS’s individual parameter t-tests are equivalent to the general linear test based on SSM(X i |X 1,…, X i-1, X i+1,…, X p-1 )

Add one variable at a time Consider 4 explanatory variables and the extra sum of squares –SSM (X 1 ) –SSM (X 2 | X 1 ) –SSM (X 3 |X 1, X 2 ) –SSM (X 4 |X 1, X 2, X 3 ) SSM (X 1 ) +SSM (X 2 | X 1 ) + SSM (X 3 | X 1, X 2 ) + SSM (X 4 | X 1, X 2, X 3 ) =SSM(X 1, X 2, X 3, X 4 )

One Variable added Numerator df is 1 for each of these tests F = (SSM / 1) / MSE( full ) ~ F(1, n-p) This is the SAS Type I SS We typically use Type II SS

KNNL Example p healthy female subjects Y is body fat X 1 is triceps skin fold thickness X 2 is thigh circumference X 3 is midarm circumference Underwater weighing is the “gold standard” used to obtain Y

Input and data check options nocenter; data a1; infile ‘../data/ch07ta01.dat'; input skinfold thigh midarm fat; proc print data=a1; run;

Proc reg proc reg data=a1; model fat=skinfold thigh midarm; run;

Output Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total Group of predictors helpful in predicting percent body fat

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept skinfold thigh midarm None of the individual t-tests are significant.

Summary The P-value for F test is <.0001 But the P-values for the individual regression coefficients are , , and None of these are below our standard significance level of 0.05 What is the reason?

Look at this using extra SS proc reg data=a1; model fat=skinfold thigh midarm /ss1 ss2; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t|Type I SSType II SS skinfold thigh midarm Notice how different these SS are for skinfold and thigh

Interpretation Fact: the Type I and Type II SS are very different If we reorder the variables in the model statement we will get –Different Type I SS –The same Type II SS Could variables be explaining same SS and canceling each other out?

Run additional models Rerun with skinfold as the explanatory variable proc reg data=a1; model fat=skinfold; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept skinfold <.0001 Skinfold by itself is a highly significant linear predictor

Use general linear test to see if other predictors contribute beyond skinfold proc reg data=a1; model fat= skinfold thigh midarm; thimid: test thigh, midarm; run;

Output Test thimid Results for Dependent Variable fat SourceDF Mean SquareF ValuePr > F Numerator Denominator Yes they are help after skinfold is in the model. Perhaps best model includes only two predictors

Use general linear test to assess midarm proc reg data=a1; model fat= skinfold thigh midarm; midarm: test midarm; run;

Output Test thimid Results for Dependent Variable fat SourceDF Mean SquareF ValuePr > F Numerator Denominator With skinfold and thigh in the model, midarm is not a significant predictor. This is just the t-test for this coef in full model

Other uses of general linear test The test statement can be used to perform a significance test for any hypothesis involving a linear combination of the regression coefficients Examples –H 0 : β 4 = β 5 –H 0 : β 4 - 3β 5 = 12

Partial correlations Measures the strength of a linear relation between two variables taking into account other variables Procedure to find partial correlation X i, Y –Predict Y using other X’s –Predict X i using other X’s –Find correlation between the two sets of residuals KNNL use the term coefficient of partial determination for the squared partial correlation

Pcorr2 option proc reg data=a1; model fat=skinfold thigh midarm / pcorr2; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Squared Partial Corr Type II Intercept skinfold thigh midarm Skinfold and midarm explain the most remaining variation when added last

Standardized Regression Model Can help reduce round-off errors in calculations Puts regression coefficients in common units Units for the usual coefficients are units for Y divided by units for X

Standardized Regression Model Standardized coefs can be obtained from the usual ones by multiplying by the ratio of the standard deviation of X to the standard deviation of Y Interpretation is that a one sd increase in X corresponds to a ‘standardized beta’ increase in Y

Standardized Regression Model Y = … + βX + … = … + β(s X /s Y )(s Y /s X )X + … = … + β(s X /s Y )((s Y /s X )X) + … = … + β(s X /s Y )(s Y )(X/s X ) + …

Standardized Regression Model Standardize Y and all X’s (subtract mean and divide by standard deviation) Then divide by n-1 The regression coefficients for variables transformed in this way are the standardized regression coefficients

STB option proc reg data=a1; model fat=skinfold thigh midarm / stb; run;

Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Standardized Estimate Intercept skinfold thigh midarm Skinfold and thigh suggest largest standardized change

Reading We went over 7.1 – 7.5 We used program topic15.sas to generate the output