Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression and Correlation
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Heteroskedasticity The Problem:
Linear Regression: Making Sense of Regression Results
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
TigerStat ECOTS Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.
INTERPRETATION OF A REGRESSION EQUATION
Sociology 601 Class 17: October 28, 2009 Review (linear regression) –new terms and concepts –assumptions –reading regression computer outputs Correlation.
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Statistics for Business and Economics
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Correlation & Regression
Returning to Consumption
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Social Science Models What elements (at least two) are necessary for a “social science model”?
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Statistics Review - 1 What is the difference between a variable and a constant? Why are we more interested in variables than constants? What are the four.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Spring 2007 Lecture 9Slide #1 More on Multivariate Regression Analysis Multivariate F-Tests Multicolinearity The EVILS of Stepwise Regression Intercept.
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
Inference for Least Squares Lines
Bivariate & Multivariate Regression Analysis
From t-test to multilevel analyses Del-2
Statistics for Managers using Microsoft Excel 3rd Edition
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 8 Section A1 Using categorical data in regression
The slope, explained variance, residuals
Chapter 13 Simple Linear Regression
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
Presentation transcript:

Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their effects (9.6) Causality (10) 1

Formulas for b, a, r, and se(b) 2

Stata Example of Inference about a Slope. summarize murder poverty Variable | Obs Mean Std. Dev. Min Max murder | poverty | regress murder poverty Source | SS df MS Number of obs = F( 1, 49) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] poverty | _cons |

Stata Example of Inference about a Slope 4. correlate murder poverty (obs=51) | murder poverty murder | poverty | correlate murder poverty, covariance (obs=51) | murder poverty murder | poverty | sqrt( ) = = sd(y); sqrt ( ) = 8.73 = sd(x)

Alternative Formula for b 5 b = / = 1.323

Stata Example of Inference about a Slope 6 scatter murder poverty || lfit murder poverty

Stata Example of Inference about a Slope 7. regress murder poverty if state!="DC" Source | SS df MS Number of obs = F( 1, 48) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] poverty | _cons |

Assumptions Needed to make Population Inferences for slopes. The sample is selected randomly. X and Y are interval scale variables. The mean of Y is related to X by the linear equation E{Y} =  +  X. The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) The conditional distribution of Y at each value of X is normal. There is no error in the measurement of X. 8

Common Ways to Violate These Assumptions The sample is selected randomly. o Cluster sampling (e.g., census tracts / neighborhoods) causes observations in any cluster to be more similar than to observations outside the cluster. o Autocorrelation (spatial and temporal) o Two or more siblings in the same family. o Sample = populations (e.g., states in the U.S.) X and Y are interval scale variables. o Ordinal scale attitude measures o Nominal scale categories (e.g., race/ethnicity, religion) 9

Common Ways to Violate These Assumptions (2) The mean of Y is related to X by the linear equation E{Y} =  +  X. o U-shape: e.g., Kuznets inverted-U curve (inequality <- GDP/capita) o Thresholds: o Logarithmic (e.g., earnings <- education) The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) o earnings <- education o hours worked <- years o adult child occupational status <- parental occupational status 10

Common Ways to Violate These Assumptions (3) The conditional distribution of Y at each value of X is normal. o earnings (skewed) <- education o Y is binary o Y is a % There is no error in the measurement of X. o almost everything o what is the effect of measurement error in x on b? 11

Things to watch out for: extrapolation. Extrapolation beyond observed values of X is dangerous. The pattern may be nonlinear. Even if the pattern is linear, the standard errors become increasingly wide. Be especially careful interpreting the Y-intercept: it may lie outside the observed data. o e.g., year zero o e.g., zero education in the U.S. o e.g., zero parity 12

Things to watch out for: outliers Influential observations and outliers may unduly influence the fit of the model. The slope and standard error of the slope may be affected by influential observations. This is an inherent weakness of least squares regression. You may wish to evaluate two models; one with and one without the influential observations. 13

Things to watch out for: truncated samples Truncated samples cause the opposite problems of influential observations and outliers. Truncation on the X axis reduces the correlation coefficient for the remaining data. Truncation on the Y axis is a worse problem, because it violates the assumption of normally distributed errors. Examples: Topcoded income data, health as measured by number of days spent in a hospital in a year. 14

Causality We never prove that x causes y Research and theory make it increasingly likely Criteria: association time order no alternative explanations is the relationship spurious? 15

Alternative Explanations Example: Neighborhood poverty -> Low Test Scores 16

Alternative Explanations Example: Neighborhood poverty -> Low Test Scores Possible solutions: multivariate models e.g., control for parents’ education, income controls for other measureable differences fixed effects models e.g., changes in poverty -> changes in test scores controls for constant, unmeasured differences instrumental variables find an instrument that affects x 1 but not y experiments e.g., Moving to Opportunity randomize increases in $ 17

Alternative Explanations Example: Fertility -> Lower Mothers’ LFP Possible solutions: 18

Alternative Explanations Example: Fertility -> Lower Mothers’ LFP Possible solutions: multivariate models e.g., control for gender attitudes controls for other measureable differences fixed effects models e.g., changes in # children -> dropping out controls for constant, unmeasured differences instrumental variables find an instrument that affects x 1 but not y e.g., mothers of two same sex children experiments not feasible (or ethical) 19

Types of 3-variable Causal Models Spurious x 2 causes both x 1 and y e.g., religion causes fertility and women’s lfp Intervening x 1 causes x 2 which causes y e.g., fertility raises time spent on children which lowers time in the labor force What is the statistical difference between these? 20

Another type of 3-varaible relationship: Statistical Interaction Effects Example: Fertility -> Lower Mothers’ LFP The relationship between x 1 and y depends on the value of another variable, x 2 e.g., marital status -> earnings depends on gender 21