Quantitative Methods – Week 8: Multiple Linear Regression

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

Managerial Economics in a Global Economy
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Quantitative Methods and Computer Applications in the Historical and Social Sciences Roman Studer Nuffield College
Graduate School Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Chapter 13 Multiple Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Chapter 12 Multiple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Chapter 4 Multiple Regression.
Quantitative Methods – Week 6: Inductive Statistics I: Standard Errors and Confidence Intervals Roman Studer Nuffield College
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Multiple Regression Models
The Simple Regression Model
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Lecture 6: Multiple Regression
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Quantitative Methods – Week 5: Linear Regression Analysis
Chapter 11 Multiple Regression.
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 15: Model Building
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Multiple Linear Regression Analysis
Quantitative Methods – Week 7: Inductive Statistics II: Hypothesis Testing Roman Studer Nuffield College
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
Regression and Correlation Methods Judy Zhong Ph.D.
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Method.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Estimating Demand Functions Chapter Objectives of Demand Estimation to determine the relative influence of demand factors to forecast future demand.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Lecturer: Kem Reat, Viseth, PhD (Economics)
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
Chapter 16 Data Analysis: Testing for Associations.
Environmental Modeling Basic Testing Methods - Statistics III.
1 B IVARIATE AND MULTIPLE REGRESSION Estratto dal Cap. 8 di: “Statistics for Marketing and Consumer Research”, M. Mazzocchi, ed. SAGE, LEZIONI IN.
Chap 6 Further Inference in the Multiple Regression Model
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Regression Chapter 5 January 24 – Part II.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
The simple linear regression model and parameter estimation
Inference for Least Squares Lines
Multiple Regression Analysis: Further Issues
Presentation transcript:

Quantitative Methods – Week 8: Multiple Linear Regression Roman Studer Nuffield College roman.studer@nuffield.ox.ac.uk

Introduction After the interlude on inductive statistics, were are back to regression analysis… So far, we have looked at bivariate regressions with one dependent and one explanatory variable: Y= a + bX Now, we want to extend the regression analysis to include several explanatory variables; this is called multiple regression: Y= a + b1X1 + b2X2 + b3X3 + b4X4 This enables us to investigate the influence of various explanatory variables in turn while controlling for the influence of others The fundamental underlying theoretical principles and the statistical procedures required for the estimation, and for the evaluation of the coefficients are still the same as in bivariate regressions However, the formulae and the calculations get more difficult, and the computations can be left to STATA

Interpretation: Partial Regression Coefficients The partial regression coefficients (b1,b2, b3, …) allow us to examine the influence of each of the explanatory variables while controlling for the influence of the others The interpretation of the partial regression coefficients is the same as in the bivariate regressions The partial regression coefficients can change when we include/exclude other relevant explanatory variables This again points to the problem of omitted variables!! The interdependence of explanatory variables is another important issue that comes up in this respect. Use correlation analysis to look at the relation between explanatory variables!

Interpretation: Standardised Beta Coefficients The regression coefficients measure the effects by original units However, the explanatory variable with the largest coefficient is not necessarily the most important one… Therefore, if we are interested in the relative importance of each explanatory variable on the dependent variable, Y, we have to adjust for the different units of the explanatory variables This is done by converting the partial regression coefficients into standardised beta coefficients. Each variable in the regression is replaced by its z-score: The standardization makes the scale of the regressors irrelevant Interpretation: “How many standard deviations of movement in Y are caused by 1 standard deviation in an explanatory variable Xi?”

Interpretation: The Intercept The intercept, a, shows the value of the dependent variable when all explanatory variables are set equal to zero Great care has to be taken when interpreting the intercept in multiple linear regressions Character of a “residual”: “impact of all the variables excluded in the model” At times, an interpretation of the intercept make little sense…

Interpretation: The Coefficient of Multiple Determination, R2 R2 is a measure of the proportion of the variation in the dependent variable explained by the several explanatory variables in a multiple regression R²=ESS/TSS ESS = Explained Sum of Squares TSS = Total Sum of Square The values of R2 always lies between 0 and 1; the higher it is, the more the variation in Y has been explained Measure of the “goodness of fit” or the explanatory power of the regression

Interpretation: Adjusted R2 The explained sum of square (ESS) increases with the number of explanatory variables while the total variation in Y (TSS) is unaffected Adding explanatory variables will always raise the value of R² Extreme case: There are as many explanatory variables as observations  R² =1 R2-adjusted adjusts R2 for the number of explanatory variables k: R2-adj. imposes a penalty for adding additional independent variables to a model Explanatory models with higher R²-adjusted should be prefered - even if R² is smaller If the sample size is large, the correction from R² to R²-adjusted will be small

Presentation of Regression Results

Further Issues: Model Specification Specifying a model includes three basic steps: Choosing the dependent variable to be explained Determine the explanatory variables to be included Determine the mathematical form of the relationships (linear vs. non-linear) Traditional approach: Testing models from economic theory, estimating the unknown parameters “Modern” approach: Data play a key role in the formulation of estimation model Explorative data analysis: „trial and error“ process with ad hoc modifications, adding additional variables, changing the functional form of variations, etc. Follow a strategy of „general to specific modelling“ to avoid biased regression coefficients due to omitted variables Estimate a complete model that includes all possibly relevant variables (including those that represent competing explanations) Then, exclude variables that are not statistically significant (starting with the lowest t-values/ highest p-values) Reduce the regression model until only significant variables are left in the model.

Further Issues: F-Test We use t-tests to test for the significance of the single regression coefficients. Null-Hypothesis H0: bi = 0. But what about the overall significance of the estimated regression line? To test for the joint significance of the regression coefficients, we use F-tests. Null hypothesis H0: b1 = b2 = b3 = 0. In words: explanatory variables x1, x2 and x3 do not jointly influence y Again, if the calculated F-value exceed the tabulated value of F, then the null hypothesis is rejected, i.e. the variables do significantly explain the variation in the dependent variable and the variables must therefore not be excluded from the regression model Fcrit depends on number of observations (n), the number of estimated coefficients in the unrestricted model (k), and the number of restrictions (m). The exact value can be found in F-distribution tables (Appendix in almost any statistical textbook) However, STATA makes it even easier for us, as it reports the p-value for the joint significance of all regression coefficients!

Review & Further Reading The main topics covered in this course were Descriptive statistics Correlation analysis Inductive statistics Regression analysis (simple and multiple regression) You now know the fundamentals of quantitative methods….  MAKE USE OF THIS! However, additional issues are likely to come up once you’re dealing with quantitative research. These may include… Tests (associations between variables, testing for different means, etc.) for nominal and ordinal data  See F&T, chapter 7 Dummy variables  F&T, chapter 10 Lagged variables F&T, chapter 10 Violating the assumptions of the classical linear regression model (multicollinearity, autocorrelation, heteroscedasticity, outliers, specification, etc.)  F&T, chapter 11 Non-linear models  F&T, chapter 12 Use your textbook as a first guide when you encounter one of these issues!!

Computer Class: Multiple Linear Regression

Exercises Data set: “Weimar elections” at http://www.nuff.ox.ac.uk/users/studer/teaching.htm Use the whole sample including all 52 observations from the last 4 Weimar elections. Run a regression of Nazi’s percentage of votes on the unemployment rate, share of workers, Catholics, and farmers. Interpret the results Which variables can be excluded from the model? Explain the success of the Communist party. Could the Communists benefit from the rising unemployment after controlling for other factors like share of workers, Catholics, and farmers, voter participation? Test whether the Communists benefited less from unemployment than the Nazis Exclude the voter participation from the regression model. Explain the change in the effect of unemployment. Explore the correlations between the explanatory variables

Homework & Take-Home Exam Problem Set: Finish the exercises from today’s computer class if you haven’t done so already. Include all the results and answers in the file you send me. Do exercises 1 and 2 from chapter 8.4 in Feinstein & Thomas, pp. 255-56. Download the data set needed for question 2 from http://www.nuff.ox.ac.uk/users/studer/teaching.htm (“Irish families”). This data set is described in Appendix A.2 in Feinstein & Thomas, pp. 497-501. Present your regression results from question 2 in a nice table (check today’s lecture notes for an example!). Report a, regression coefficients, R2, adjusted R2, N, t-statistics, F-statistic and significance levels Send it to me by Monday of week 9 Take-Home Exam You’ll receive the take-home exam by week 10 You’ll have to submit it by Friday of week 1 of Trinity Term (27 April) We will meet later in Trinity Term to discuss the exams (we’ll arrange an exact date once I have corrected the exams)