Download presentation
Presentation is loading. Please wait.
Published byBertram Miles Modified over 9 years ago
1
Statistics for Social and Behavioral Sciences Part IV: Causality Association and Causality Session 22 Prof. Amine Ouazad
2
Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : T WO G ROUPS, R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks 10-14 Multivariate regression now! Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method. Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule Bivariate sample statistics: Correlation, Slope Four Steps of “Thinking Like a Statistician” Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling Biases: Nonresponse bias, Response bias, Sampling bias
3
Coming up “Comparison of Two Groups” Last week. “Univariate Regression Analysis” Last Saturday. (Section 9.5) “Association and Causality: Multivariate Regression” Today, Monday, Tuesday. Chapters 10 and 11. “Randomized Experiments and ANOVA”. Wednesday. Chapter 12. “Robustness Checks and Wrap Up”. Last Thursday.
4
Outline 1.Correlation and Causation 2.Multiple Causes Partly Spurious Association Spurious Association Chain Relationship 3.Interaction Next time:Multivariate regression
5
What causes crime? National Neighborhood Crime Study (2002), Peterson, Ruth D., and Krivo, Lauren J. Ohio State University. N = 6,935 neighborhoods. Crime data from local police departments, and the Federal Bureau of Investigation. Total crime rate per 1,000 residents. Number of police officers. Ethnicity of police officers. Demographics of the neighborhood: poverty, unemployment rate, education.
6
Regression of Crime Rate on the Unemployment Rate y : total crime per 1,000 residents. x : unemployment rate from 0 to 100.
7
Causation Matters Changing, manipulating X will affect Y. Example: – if Poverty -> Crime, then addressing poverty (e.g. war on poverty, food stamps, welfare programs) will lower crime. – if CO2 emissions Global average temperature, then reducing in CO2 emissions (eg through policies such as the Kyoto protocol) will lower global temperature. – If shoe size -> literacy, changing shoe size will affect literacy ! Nonsense. – If Hepatitis B vaccination -> autism, then reducing vaccination rates will lower the incidence of autism. X Y If the true relationship between X and Y is described by ….
8
True Model vs Statistical Model is your statistical model But the true model may be different: 1.Order is wrong. Y causes X instead of X causing Y. 2.Multiple causes. X may not be the most practically significant determinant of Y. 3.Spurious association. X may not cause Y at all. 4.Chain relationship. The impact of X on Y may be mediated by another variable X 2. 5.Interaction. The impact of X on Y may depend on the value of another variable X 2. X Y
9
Order is wrong? X Y Y X True modelStatistical model Regression suggests that more police officers per 10,000 resident leads to a higher crime rate per capita !?! Beware of software and formulas. Use them wisely.
10
Outline 1.Correlation and Causation 2.Multiple Causes Partly Spurious Association Spurious Association Chain Relationship 3.Interaction Next time:Multivariate regression
11
Multiple Causes Acknowledge that crime (Y) may be caused by a series of factors: X1X1 X2X2 X3X3 XKXK … Y True Model
12
Multiple Causes Acknowledge that the variable X 1 that you were focused on may not be the most practically significant variable that determines Y. Crime: finding the most important determinants of crime. – Education? Poverty? Unemployment? Female- headed households? Ethnicity of police officers? Number of police officers per 10,000 residents? Incarceration rate?
13
From Univariate to Multivariate Univariate regression: True model: y = + x 1 + Statistical model: y = a + b x 1 + e with E(y|x 1 ) = a + b x 1. And SD(y|x 1 ) = SD(e). Multivariate regression: True model: y = + x 1 + x 2 + x 3 + Statistical model: y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 + e with E(y|x 1,x 2,x 3 ) = a + b x 1 + b 2 x 2 + b 3 x 3. And SD(y|x 1,x 2,x 3 ) = SD(e).
14
Including X 2 may affect the coefficient b 1 of X 1 Race has a negative statistically significant impact on the crime rate. Accounting for multiple variables avoids simplistic statements !!!
15
Partly Spurious Association between X 1 and Y The statistical model does not include X 2. When including X 2 in the regression, the effect of X 1 is lower in magnitude. X 2 has both a direct and indirect effect on X 1. X1X1 X2X2 Y X1X1 Y True modelStatistical model
16
Spurious Association A statistically significant slope coefficient b does not mean that X 1 causes Y. Another factor X 2 may be causing both X 1 and Y. When including X 2 in the regression, the effect X1X1 X2X2 Y X1X1 Y True modelStatistical model
17
Shoe size and Literacy Sample of N children from age 5 to age 16. Literacy measured in the Early Childhood Longitudinal Study. Including age in the regression will likely render the coefficient of shoe size non significant. Shoe size Age Literacy True modelStatistical model Shoe size Literacy
18
Correct approach Make the true model and the statistical model coincide. Regress Y on both X 1 and X 2. Include all determinants of crime in the regression. X1X1 X2X2 Y X1X1 Y True modelStatistical model X2X2
19
Researchers had found that school funding is positively correlated (statistically significant and positive r and b) with student test scores…. But when including measures of teacher quality, the relationship between the amount of money a school spends has no statistically significant impact on student test scores. What makes a good school? Teacher quality Funding Student test score Funding True modelStatistical model Student test score
20
Chain Relationship X 1 causes Y …. But the effect of X1 on Y is entirely due to its effect on X 2. When not including X 2 in the regression, the coefficient of X1 is statistically significant. When including X 2 in the regression, the coefficient of X1 is not statistically significant. X1X1 X2X2 Y X1X1 Y True modelStatistical model
21
Outline 1.Correlation and Causation 2.Multiple Causes Partly Spurious Association Spurious Association Chain Relationship 3.Interaction Next time:Multivariate regression
22
Interaction X 2 affects how X 1 causes Y. For instance, unemployment causes crime, but the impact is much lower in neighborhoods that have a higher income. When not accounting for X 2, the coefficient of X 1 measures the average impact of X 1 on Y. X1X1 X2X2 Y X1X1 Y True modelStatistical model
23
Accounting for the Interaction of X1 and X2 Include both X 2 and the product of X 1 and X 2 in the regression. Model:y = a + b 1 x 1 + b 2 x 2 + b 3 x 1 *x 2 + e If b3 is positive, the impact of x 1 on y is larger the higher the value of x 2. If b3 is negative, the impact of x 1 on y is smaller the higher the value of x 2.
24
Here, b 3 is negative ! T_HINC75: percentage in neighborhood with high income. Accounting for the Interaction of unemployment and income
25
Wrap up Know the difference between the true model and the statistical model. Learn how to perform a multivariate regression in Stata. Order X and Y correctly. Account for multiple causes. Account for spurious correlations. Account for chain relationships. Account for interactions.
26
Coming up: Schedule for next week: Chapter on “Association and Causality”, and “Multivariate Regression”. Last online quiz sent last night, due Sunday 9am. Make sure you come to sessions and recitations. SundayMondayTuesdayWednesdayThursday RecitationEvening session 7.30pm West Administration 002 Usual class 12.45pm Usual room Evening session 7.30pm West Administration 001 Usual class 12.45pm Usual room
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.