Download presentation
Presentation is loading. Please wait.
1
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their effects (9.6) Causality (10) 1
2
Formulas for b, a, r, and se(b) 2
3
Stata Example of Inference about a Slope. summarize murder poverty Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- murder | 51 8.727451 10.71758 1.6 78.5 poverty | 51 14.25882 4.584242 8 26.4. regress murder poverty Source | SS df MS Number of obs = 51 -------------+------------------------------ F( 1, 49) = 23.08 Model | 1839.06931 1 1839.06931 Prob > F = 0.0000 Residual | 3904.25223 49 79.6786169 R-squared = 0.3202 -------------+------------------------------ Adj R-squared = 0.3063 Total | 5743.32154 50 114.866431 Root MSE = 8.9263 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty | 1.32296.2753711 4.80 0.000.7695805 1.876339 _cons | -10.1364 4.120616 -2.46 0.017 -18.41708 -1.855707 ----------------------------------------------------------------------------- 3
4
Stata Example of Inference about a Slope 4. correlate murder poverty (obs=51) | murder poverty -------------+------------------ murder | 1.0000 poverty | 0.5659 1.0000. correlate murder poverty, covariance (obs=51) | murder poverty -------------+------------------ murder | 114.866 poverty | 27.8024 21.0153 sqrt(114.866) = 14.26 = sd(y); sqrt (21.0153) = 8.73 = sd(x)
5
Alternative Formula for b 5 b = 27.8024 / 21.0153 = 1.323
6
Stata Example of Inference about a Slope 6 scatter murder poverty || lfit murder poverty
7
Stata Example of Inference about a Slope 7. regress murder poverty if state!="DC" Source | SS df MS Number of obs = 50 -------------+------------------------------ F( 1, 48) = 31.36 Model | 307.342297 1 307.342297 Prob > F = 0.0000 Residual | 470.406476 48 9.80013492 R-squared = 0.3952 -------------+------------------------------ Adj R-squared = 0.3826 Total | 777.748773 49 15.8724239 Root MSE = 3.1305 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty |.5842405.104327 5.60 0.000.3744771.7940039 _cons | -.8567153 1.527798 -0.56 0.578 -3.92856 2.215129 ------------------------------------------------------------------------------
8
Assumptions Needed to make Population Inferences for slopes. The sample is selected randomly. X and Y are interval scale variables. The mean of Y is related to X by the linear equation E{Y} = + X. The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) The conditional distribution of Y at each value of X is normal. There is no error in the measurement of X. 8
9
Common Ways to Violate These Assumptions The sample is selected randomly. o Cluster sampling (e.g., census tracts / neighborhoods) causes observations in any cluster to be more similar than to observations outside the cluster. o Autocorrelation (spatial and temporal) o Two or more siblings in the same family. o Sample = populations (e.g., states in the U.S.) X and Y are interval scale variables. o Ordinal scale attitude measures o Nominal scale categories (e.g., race/ethnicity, religion) 9
10
Common Ways to Violate These Assumptions (2) The mean of Y is related to X by the linear equation E{Y} = + X. o U-shape: e.g., Kuznets inverted-U curve (inequality <- GDP/capita) o Thresholds: o Logarithmic (e.g., earnings <- education) The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) o earnings <- education o hours worked <- years o adult child occupational status <- parental occupational status 10
11
Common Ways to Violate These Assumptions (3) The conditional distribution of Y at each value of X is normal. o earnings (skewed) <- education o Y is binary o Y is a % There is no error in the measurement of X. o almost everything o what is the effect of measurement error in x on b? 11
12
Things to watch out for: extrapolation. Extrapolation beyond observed values of X is dangerous. The pattern may be nonlinear. Even if the pattern is linear, the standard errors become increasingly wide. Be especially careful interpreting the Y-intercept: it may lie outside the observed data. o e.g., year zero o e.g., zero education in the U.S. o e.g., zero parity 12
13
Things to watch out for: outliers Influential observations and outliers may unduly influence the fit of the model. The slope and standard error of the slope may be affected by influential observations. This is an inherent weakness of least squares regression. You may wish to evaluate two models; one with and one without the influential observations. 13
14
Things to watch out for: truncated samples Truncated samples cause the opposite problems of influential observations and outliers. Truncation on the X axis reduces the correlation coefficient for the remaining data. Truncation on the Y axis is a worse problem, because it violates the assumption of normally distributed errors. Examples: Topcoded income data, health as measured by number of days spent in a hospital in a year. 14
15
Causality We never prove that x causes y Research and theory make it increasingly likely Criteria: association time order no alternative explanations is the relationship spurious? 15
16
Alternative Explanations Example: Neighborhood poverty -> Low Test Scores 16
17
Alternative Explanations Example: Neighborhood poverty -> Low Test Scores Possible solutions: multivariate models e.g., control for parents’ education, income controls for other measureable differences fixed effects models e.g., changes in poverty -> changes in test scores controls for constant, unmeasured differences instrumental variables find an instrument that affects x 1 but not y experiments e.g., Moving to Opportunity randomize increases in $ 17
18
Alternative Explanations Example: Fertility -> Lower Mothers’ LFP Possible solutions: 18
19
Alternative Explanations Example: Fertility -> Lower Mothers’ LFP Possible solutions: multivariate models e.g., control for gender attitudes controls for other measureable differences fixed effects models e.g., changes in # children -> dropping out controls for constant, unmeasured differences instrumental variables find an instrument that affects x 1 but not y e.g., mothers of two same sex children experiments not feasible (or ethical) 19
20
Types of 3-variable Causal Models Spurious x 2 causes both x 1 and y e.g., religion causes fertility and women’s lfp Intervening x 1 causes x 2 which causes y e.g., fertility raises time spent on children which lowers time in the labor force What is the statistical difference between these? 20
21
Another type of 3-varaible relationship: Statistical Interaction Effects Example: Fertility -> Lower Mothers’ LFP The relationship between x 1 and y depends on the value of another variable, x 2 e.g., marital status -> earnings depends on gender 21
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.