Linear Regression: Making Sense of Regression Results

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Inference for Regression
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Econ 140 Lecture 121 Prediction and Fit Lecture 12.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Statistics for Business and Economics
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Linear Regression and Correlation Analysis
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Interpreting Bi-variate OLS Regression
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Relationships Among Variables
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Chapter 8: Bivariate Regression and Correlation
Introduction to Linear Regression and Correlation Analysis
Understanding Multivariate Research Berry & Sanders.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Social Science Models What elements (at least two) are necessary for a “social science model”?
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Statistics Review - 1 What is the difference between a variable and a constant? Why are we more interested in variables than constants? What are the four.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 16 Data Analysis: Testing for Associations.
Discussion of time series and panel models
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Multiple Independent Variables POLS 300 Butz. Multivariate Analysis Problem with bivariate analysis in nonexperimental designs: –Spuriousness and Causality.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
business analytics II ▌appendix – regression performance the R2 
Bivariate & Multivariate Regression Analysis
Correlation and Simple Linear Regression
QM222 Class 11 Section A1 Multiple Regression
Political Science 30: Political Inquiry
The slope, explained variance, residuals
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Multiple Regression Chapter 14.
Simple Linear Regression and Correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Linear Regression: Making Sense of Regression Results Interpreting Stata regression output Coefficients for independent variables Fit of the regression: R Square Statistical significance How to reject the null hypothesis Multivariate regressions College graduation rates Ethnicity and voting

SPSS Output – We’ll Use Stata – Benefit in Knowing Two Packages Slope or “coefficient” How tight is the fit? Y-intercept or “constant”

Interpreting regression output Regression output typically includes two key tables for interpreting your results: A “Coefficients” table that contains the y-intercept (or “constant”) of the regression, a coefficient for every independent variable, and the standard error of that coefficient. A “Model Summary” table that gives you information on the fit of your regression.

Interpreting SPSS (another statistical package) regression: Coefficients – 1 • The y-intercept is 4.2% with a standard error of 7.0% • The coefficient for SAT Scores is 0.059%, with a standard error of 0.007%. Standardized coefficients discussed later.

Interpreting regression output: Coefficients - 2 The y-intercept or constant is the predicted value of the dependent variable when the independent variable takes on the value of zero. This basic model predicts that when a college admits a class of students who averaged zero on their SAT, 4.2% of them will graduate. The constant is not the most helpful statistic.

Interpreting regression output: Coefficients - 3 The coefficient of an independent variable is the predicted change in the dependent variable that results from a one unit increase in the independent variable. A college with students whose SAT scores are one point higher on average will have a graduation rate that is 0.059% higher. Increasing SAT scores by 200 points leads to a (200)(0.059%) = 11.8% rise in graduation rates

Interpreting regression output: Fit of the Regression The R Square measures how closely a regression line fits the data in a scatterplot. • It can range from zero (no explanatory power) to one (perfect prediction). • An R Square of 0.345 means that differences in SAT scores can explain 35% of the variation in college graduation rates. Key sentence for quizzes!

Statistical Significance - 1 What would the null hypothesis look like in a scatterplot? If the independent variable has no effect on the dependent variable, the scatterplot should look random, the regression line should be flat, and its slope should be zero. Null hypothesis: The regression coefficient for an independent variable equals zero.

Statistical Significance - 2 Our formal test of statistical significance asks whether we can be SURE that a regression coefficient DIFFERS from zero. The “standard error” is the standard deviation of the sample distribution. If a coefficient is more than two standard errors away from zero, we can reject the null hypothesis (that it equals zero).

Statistical Significance - 3 So, if a coefficient is more than TWICE the size of its standard error, we REJECT the NULL hypothesis with 95% confidence. This works whether the coefficient is negative or positive. The coefficient/standard error ratio is called the “test statistic” or “t-stat.” A t-stat bigger than 2 or less than -2 indicates at statistically significant effect

Statistical Significance - 4

Regression of Tax on Cons, Party and Stinc in Stata Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 3, 96) = 65.44 Model | 54886.5757 3 18295.5252 Prob > F = 0.0000 Residual | 26840.2643 96 279.586087 R-squared = 0.6716 -------------+------------------------------ Adj R-squared = 0.6613 Total | 81726.84 99 825.523636 Root MSE = 16.721 ------------------------------------------------------------------------------ tax | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- cons | -.64472 .07560 -8.53 0.000 -.7010575 party | 11.20792 4.67533 2.40 0.018 .1902963 stinc | -.56008 1.28316 -0.44 0.663 -.0297112 _cons | 67.38277 15.11393 4.46 0.000 . For which independent variables would we reject the null hypothesis? Why?

Visualizing a t ratio - 1 Which of the next two slides depicts a higher t ratio?

Visualizing a t ratio - 2

Visualizing a t ratio - 3

Multivariate Regression - 1 A “multivariate regression” uses more than one independent variable (or confound) to explain variation in a dependent variable. The coefficient for each independent variable reports its effect on the DV, holding constant all of the other IVs in the regression.

Multivariate Regression - 2 Year of Founding SAT Scores Graduation Tuition Rates Student/Faculty Ratio

Multivariate Regression - 3

Multivariate Regression - 4 Holding all other factors constant, a 200 point increase in SAT scores leads to a predicted (200)(0.042) = 8.4% increase in the graduation rate, and this effect is statistically significant. Controlling for other factors, a college that is 100 years younger should have a graduation rate that is (100)(-0.021) = 2.1% lower, but this effect is NOT significantly different from zero.

Multiple Regression: Comparative Politics – Stata - 1 Let’s examine the impact of government ideology on economic growth in 18 wealthy democracies (Western Europe, the United States, Canada, Japan, Australia and New Zealand) annually over the 1961-1994 period.

Comparative Politics - 2 Variable List: growthpc – annual growth of per capita (i.e., per person) gross domestic product govcons – strength of the conservative party in the national government left – strength of the left party in the national government

Comparative Politics - 3 gdppc – per capita gross domestic product unem – unemployment rate

Comparative Politics - 4 Source | SS df MS Number of obs = 453 -------------+------------------------------ F( 4, 448) = 16.56 Model | 272.295407 4 68.0738517 Prob > F = 0.0000 Residual | 1841.26412 448 4.10996456 R-squared = 0.1288 -------------+------------------------------ Adj R-squared = 0.1211 Total | 2113.55953 452 4.67601666 Root MSE = 2.0273 ------------------------------------------------------------------------------ growthpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- govcons | -.168093 .0380607 -4.42 0.000 -.2428933 -.0932942 left | .001841 .0034541 0.53 0.594 -.0049468 .0086298 gdppc | -.000157 .0000585 -2.70 0.007 -.0002725 -.0000428 unem | -.086520 .0458576 -1.89 0.060 -.176643 .0036023 _cons | 7.501013 .7285216 10.30 0.000 6.069269 8.932757 What do these results indicate?

Multicollinearity Check vif Variable | VIF 1/VIF -------------+---------------------- govcons | 1.37 0.730762 unem | 1.31 0.763241 gdppc | 1.29 0.776446 left | 1.20 0.834291 Mean VIF | 1.29 Low multicollinearity – highest is govcons (27% of the variance explained by the other independent variables: 1 - .73 = .27 – thus “low”)

Nonlinear Models - 1 While many/most variable relationships in political science are reasonably well approximated by the linear relationships shown on the next slide, some are not.

Nonlinear Models - 2 The next slide shows a negative nonlinear relationship between OSHA expenditures and the workplace injury rate. What theory would lead us to think that: (1) the relationship between OSHA expenditures and the workplace injury rate would be negative; (2) that the relationship would be nonlinear? What form should the nonlinearity take?

Nonlinear Models - 3

Nonlinear Models - 4 DON’T WORRY ABOUT THE MATH! Since the rate of change decreases (i.e., the injury rate decreases but at a slower rate for each additional dollar spent on OSHA inspections), we can estimate a linear relationship by converting the OSHA budget to logarithms. Thus, an OSHA budget of 10 (i.e., $10,000,000) is read as 2.3 (i.e., base “e” = 2.71728 and 2.718282.3 = 10).

Nonlinear Models - 5 The next slide shows the relationship between economic development and political violence. What form should such a relationship take? Should we expect the relationship to change direction (i.e., from negative to positive or vice versa)? Why? How would you measure the variables?

Nonlinear Models - 6

Nonlinear Models - 7 The next several slides examine nonlinear models from the comparative politics literature on political violence. The dependent variable is the death rate in a nation from political violence or violent acts (e.g., riots).

Nonlinear Models - 8

Nonlinear Models - 9

Nonlinear Models - 10

Nonlinear Models - 11 The next slide shows a graph in which the dependent variable (Y axis) is the percentage of elected county officials who are African-American and the independent variable (X axis) is the percentage of the county voters who are African-American. What would you expect the graph to look like? How many “changes of direction” (positive to negative or vice versa) in the relationship would you expect?

Nonlinear Models - 12

North Carolina Source | SS df MS Number of obs = 300 Model | 8422.69127 4 2105.67282 Prob > F = 0.0000 Residual | 7404.1454 295 25.098798 R-squared = 0.5322 -------------+------------------------------ Adj R-squared = 0.5258 Total | 15826.8367 299 52.9325641 Root MSE = 5.0099 ------------------------------------------------------------------------------ blktot | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- blkreg | .9915165 .1630062 6.08 0.000 .670714 1.312319 blkregsq | -.037464 .0071142 -5.27 0.000 -.051465 -.023463 blkregcub | .0005588 .00009 6.21 0.000 .0003817 .0007359 wall | -.1548252 .0395056 -3.92 0.000 -.2325737 -.0770767 _cons | 1.051 .9752407 1.08 0.282 -.868311 2.970311

Interaction Terms - 1 If our theory indicates that the impact of one independent variable on the dependent variable changes as the level of ANOTHER independent variable changes, we need an interaction term. We simply multiply the scores on the two independent variables and create a new independent variable.

Interaction Terms - 2

Interaction Terms - 3

The Impact of Outliers The next two slides show the impact of outlier (i.e., extreme) data. The argument that a lower corporate tax rate will actually raise more revenue is based on this conundrum. Spotting outliers is one of the reasons graphical analysis is useful. We sometimes re-run analyses removing an extreme score to see how fragile the initial results are.

Outlier Omitted

Causal Models – Presidents and the Economy - 1 20th Percentile (Dep. Variable: Growth Rate) Democratic President 2.32 (.80) Oil Prices (% lagged) -.032 (.016) Labor Force Participation 4.66 (1.44) Lagged Growth -.191 (.084) Linear Trend -12.84 (5.88) Quadratic Trend 9.68 (5.75) Intercept 2.68 (1.26) R - Squared .41

Causal Models – Presidents and the Economy - 2 Impact of Democratic President across Income Groups: 20th Percentile: 2.32 (.80) 40th Percentile: 1.60 (.56) 60th Percentile: 1.53 (.52) 80th Percentile: 1.23 (.51) 95th Percentile: .50 (.64)

Causal Models – Presidents and the Economy - 3 20th Percentile (Dep. Variable: Growth Rate) Democratic President .51 (.64) Unemployment (%) -.849 (.307) Inflation (%) -.134 (.127) GNP Growth (%) .798 (.144) Oil Prices (% lagged) -.005 (.013) Why are the results different? Does the partisanship of the President matter? (YES!)

Regression – Presidents and the Economy - 4 income Democratic >>>> unemployment >>growth Presidential >>>> inflation >>>>>> rate Adm. >>>>>GNP growth>>>> 20th percentile