Spring 2007 Lecture 9Slide #1 More on Multivariate Regression Analysis Multivariate F-Tests Multicolinearity The EVILS of Stepwise Regression Intercept.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Heteroskedasticity The Problem:
Linear Regression: Making Sense of Regression Results
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Analysis of Economic Data
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
Ch. 14: The Multiple Regression Model building
Interpreting Bi-variate OLS Regression
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Multiple Linear Regression Analysis
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Hypothesis Testing in Linear Regression Analysis
Simple Linear Regression
Returning to Consumption
Simple Linear Regression Models
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
April 4, 2006Lecture 11Slide #1 Diagnostics for Multivariate Regression Analysis Homeworks Assumptions of OLS Revisited Visual Diagnostic Techniques Non-linearity.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Stats Methods at IC Lecture 3: Regression.
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
business analytics II ▌appendix – regression performance the R2 
The slope, explained variance, residuals
CHAPTER 29: Multiple Regression*
Presentation transcript:

Spring 2007 Lecture 9Slide #1 More on Multivariate Regression Analysis Multivariate F-Tests Multicolinearity The EVILS of Stepwise Regression Intercept Dummies Interaction Effects –Interaction Dummies –Slope Dummies

Spring 2007 Lecture 9Slide #2 F-Tests F-tests are used to test for the statistical significance of the overall model fit. Normally, that’s redundant given that you already have the t-stats for the individual b’s. This amounts to a ratio of the explained variance by the residual variance, correcting for the number of observations and parameters. The F value is compared to the F-distribution, just like a t-distribution, to obtain a p-value. F-tests become useful when we need to assess nested models.

Spring 2007 Lecture 9Slide #3 F-Tests, continued The question is whether a more complex model adds to the explanatory power over a simpler model. E.g., using the Guns data, does including a set of gender/age (pm1024) and race (pb1064) variables improve the prediction of the violent crime rate (vio) variable after we’ve included average income (avginc) and population density (density)? To find out, we calculate an F-statistic for the model improvement: Where the complex model has K parameters, and the simpler model has K-H parameters. RSS{K-H} is the residual sum of squares for the simpler model RSS{K} is the residual sum of squares for the more complex model

Spring 2007 Lecture 9Slide #4 F-Testing a Nested Model Simpler Model: vio = b 0 + b 1 (avginc) + b 2 (density) More Complex Model: vio = b 0 + b 1 (avginc) + b 2 (density) + b 3 (pm1024) + b 4 (pb1064) Given the models, K = 5, K - H = 3, and n = Calculating the RSS’s involves running the two models, obtaining the RSS from each. For these models, RSS K = 60,820,479.3 and RSS K-H = 68,295, So: Given that df 1 = H (2) and df 2 = n-K (1168), the p-value of the model improvement shown in Appendix Table 5C (p. 762) is <0.01.

Spring 2007 Lecture 9Slide #5 In-Class Homework -- Set 1 Test for the additional explanatory power of the age/gender and race varables (pm1024 and pb1064) when modeling the murder and robbery rates (mur and rob). Use the simple model from the prior lecture exercise (avginc and density) as your base of comparison. Discuss the theoretical meaning of your results.

Spring 2007 Lecture 9Slide #6 Multicolinearity If any X i is a linear combination of other X’s in the model, b i cannot be estimated –Remember that the partial regression coefficient strips both the X’s and Y of the overlapping covariation: –If an X is perfectly predicted by the other X’s, then:

Spring 2007 Lecture 9Slide #7 Multicolinearity, Continued We rarely find perfect multicolinearity in practice, but high multicolinearity results in loss of statistical resolution: –Large standard errors –Low t-stats, high p-values Erodes hypothesis tests –Enormous sensitivity to small changes in: Data Model specification

Spring 2007 Lecture 9Slide #8 Detecting Multicolinearity Use of VIF measures: –Variance Inflation Factor Degree to which other coefficients’ variance is increased due to the inclusion of the specified variable 1/VIF is equivalent to: (AKA “Tolerance”) Acceptable tolerance? Partly a function of n-size: If n < 50, tolerance should exceed 0.7 If n < 300, tolerance should exceed 0.5 If n < 600, tolerance should exceed 0.3 If n < 1000, tolerance should exceed 0.1 Calculating Tolerance in Stata (using VIF)

Spring 2007 Lecture 9Slide #9 Addressing Multicolinearity Drop one or other of co-linear variables Caution: May result in model mis-specification Use theory as a guide Add new data Special samples may maximize independent variation –E.g.: Elite samples may disentangle income, education Data Scaling Highly correlated variables may represent multiple indicators of an underlying scale Data reduction through factor analysis More on this later

Spring 2007 Lecture 9Slide #10 In-Class Homework: Set 2 Run a VIF on the “more complex” model from the nested F-test exercise (from lecture slide #5) Now add the variable pw1064 (percent of the population white, ages 10-64). Run VIF again. Using the pm1024 and pb1064 variables –F-Test will show significant improvement (over the reduced model from earlier) –But not all individual coefficients were statistically significant What gives? –Correlate pb1064, pw1064 and pm1029 –Strategy? Drop either pw1064 or pb1064 (use theory to decide which) Re-run the model

Spring 2007 Lecture 9Slide #11 Evils of Mindless Regression Stata permits a number of mechanical “search strategies” –Stepwise regression (forward, backward, upside down) How they work: use of sequential F-tests These methods pose serious problems: –If X’s are strongly related, one or the other will tend to be excluded –Susceptible to inclusion of spuriously related variables –Obliterates the meaning of classical statistical tests Hypothesis tests depend on prior hypothesis identification Example: The infamous Swedish “EMF study” Hundreds of diseases regressed on proximity to EMF sources –about 5% found to be “statistically significant” Enormous policy implications -- but method was buried in report

Spring 2007 Lecture 9Slide #12 Dummy Intercept Variables Dummy variables allow for tests of the differences in overall value of the Y for different nominal groups in the data (akin to a difference of means) Coding: 0 and 1 values (e.g., men versus women) Y X1X1 X 2,0 X 2,1

Spring 2007 Lecture 9Slide #13 Modeling Robbery as a function of Population Density and “Shall Issue” laws Source | SS df MS Number of obs = F( 2, 1170) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = rob | Coef. Std. Err. t P>|t| [95% Conf. Interval] density | shall | _cons | Robbery rates are systematically lower in “Shall Issue” states (is this necessarily causal??)

Spring 2007 Lecture 9Slide #14 Dummy Variable Applications Implies a comparison (the omitted group) –Be clear about the “comparison category” Multinomial Dummies –When categories exceed 2 Importance of specifying the base category Examples of Category Variables –Experimental treatment groups –Race and ethnicity –Region of residence –Type of education –Religious affiliation –“Seasonality” Adds to modeling flexibility

Spring 2007 Lecture 9Slide #15 Interaction Effects Interactions occur when the effect of one X is dependent on the value of another Modeling interactions: –Use Dummy variables (requires categories) –Use multiplicative interaction effect Multiply an interval scale times a dummy (also known as a “slope dummy”) Example: the effect of Population Density (density) on Robbery Rate (rob) may be affected by whether the respondent in in a Shall Issue state or not –Re-code the interaction; run it.

Spring 2007 Lecture 9Slide #16 Modeling Robbery with a Dummy Slope Variable: Density * Shall Source | SS df MS Number of obs = F( 3, 1169) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = rob | Coef. Std. Err. t P>|t| [95% Conf. Interval] density | shall | i | _cons |

Spring 2007 Lecture 9Slide #17 Illustration of Slope Interaction

Spring 2007 Lecture 9Slide #18 Coming Up... Assumptions of OLS Revisited Visual Diagnostic Techniques Non-linearity Non-normality and Heteroscedasticity Outliers and Case Statistics Aren’t we having fun now?