Statistics for Social and Behavioral Sciences Part IV: Causality Inference for Slope and Correlation Section 9.5 Prof. Amine Ouazad
Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : T WO G ROUPS, R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks Multivariate regression coming up! Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method. Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule Bivariate sample statistics: Correlation, Slope Four Steps of “Thinking Like a Statistician” Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling Biases: Nonresponse bias, Response bias, Sampling bias
Coming up “Comparison of Two Groups” Last Session. “Univariate Regression Analysis” This Session Saturday. (Section 9.5) “Association and Causality: Multivariate Regression” Thursday and Extra Session. Chapters 10 and 11. “Randomized Experiments (Cted), ANOVA”. Last Tuesday and Extra Session. Chapter 12. “Robustness Checks and Wrap Up”. Last Thursday.
Outline 1.Burger prices (y) and poverty (x) 2.Inference for the slope coefficient b 3.Inference for the correlation coefficient r Next time:Association and Causality
Burger prices and poverty Do fast food chains price their burger lower in neighborhoods that are poorer? Data on N = 167 stores in Price of the burger in each store.
Burger Prices and Percentage in Poverty in ZIP code
Recap on the slope coefficient y: dependent variable. x: explanatory or independent variable. y and x should be quantitative. Assume the relationship y i = a + b x i + e i. Slope coefficient b: An increase of x by 1 is associated with an increase of y by b. Relationship between slope and correlation:
Scatter plot Predicted values in red.
Outline 1.Burger prices (y) and poverty (x) 2.Inference for the slope coefficient b 3.Inference for the correlation coefficient r Next time:Association and Causality
Slope: Parameter vs Statistic True relationship: y = + x + . Parameter would be measured if we had the entire population. On the sample, the statistic b is measured. – In general, b will not be equal to . Sampling distribution of b? Standard error of the slope coefficient b?
Confidence interval for the slope Build a confidence interval for the slope. t also provided by Table 5.1. Number of degrees of freedom df = N – 2. t statistic for the slope Built in a similar way as for previous statistics: Sampling distribution of the t statistic? Standard deviation of the t statistic? Number of degrees of freedom N-2.
Testing the null hypothesis H 0 : “ = 0”. H a : “ is different from 0” Two methods: – the confidence interval method and the t statistic method. Reject the null hypothesis at 95%: – if the 95% confidence interval does not include 0. – or if the t statistic is higher in absolute value than the t score at 95%. Same test can be performed for using the value of a.
Standard Error of Slope b Given that the true relationship is y i = + x i + i The estimated slope becomes: Realize that the source of uncertainty in b comes from the residual i. or (efficient) (unbiased )
Two Examples b is precisely estimated. has a low standard error. b is imprecisely estimated. has a higher standard error. In both graphs, y = x + e SD(e) = 0.8SD(e) = 2
Back to Burgers Reading this output? Can we reject H0: “Fraction in poverty (prppov) has not impact on the burger price” ?
Outline 1.Burger prices (y) and poverty (x) 2.Inference for the slope coefficient b 3.Inference for the correlation coefficient r Next time:Association and Causality
Inference for the Correlation Coefficient True correlation (a parameter) vs measured correlation r. H 0 : “ = 0 ” H a : “ is different from 0 “ Realize that, hence = 0 if and only if = 0. (Using the unbiased estimator of SD( )). Degrees of freedom: N – 2.
Coming up: Reading : Chapter on “Comparing Two Groups”. Next chapter 9 with t tests for slope coefficients. Online quiz this weekend on this material. Session on Saturday at in the same room -> catch up for National Day. Make sure you come to sessions and recitations. For help: Amine Ouazad Office 1135, Social Science building Office hour: Tuesday from 5 to 6.30pm. GAF: Irene Paneda Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.