Statistics for Social and Behavioral Sciences Part IV: Causality Association and Causality Session 22 Prof. Amine Ouazad.

Slides:



Advertisements
Similar presentations
C 3.7 Use the data in MEAP93.RAW to answer this question
Advertisements

Statistics for Social and Behavioral Sciences Session #16: Confidence Interval and Hypothesis Testing (Agresti and Finlay, from Chapter 5 to Chapter 6)
Statistics for Social and Behavioral Sciences Part IV: Causality Randomized Experiments, ANOVA Chapter 12, Section 12.1 Prof. Amine Ouazad.
INFERENTIAL STATISTICS. Descriptive statistics is used simply to describe what's going on in the data. Inferential statistics helps us reach conclusions.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Statistics for Social and Behavioral Sciences Session #11: Random Variable, Expectations (Agresti and Finlay, Chapter 4) Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences Session #9: Linear Regression and Conditional distribution Probabilities (Agresti and Finlay, Chapter 9)
Correlation Chapter 9.
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Data Analysis Statistics. Inferential statistics.
1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data.
1. Estimation ESTIMATION.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Review Chapter 1-3. Exam 1 25 questions 50 points 90 minutes 1 attempt Results will be known once the exam closes for everybody.
PSY 307 – Statistics for the Behavioral Sciences
Stat 217 – Week 10. Outline Exam 2 Lab 7 Questions on Chi-square, ANOVA, Regression  HW 7  Lab 8 Notes for Thursday’s lab Notes for final exam Notes.
Social Research Methods
Data Analysis Statistics. Inferential statistics.
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of ppt. 5 th percentile is 53.
Hypothesis Testing. Outline The Null Hypothesis The Null Hypothesis Type I and Type II Error Type I and Type II Error Using Statistics to test the Null.
Statistics for Social and Behavioral Sciences Session #15: Interval Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences Session #17: Hypothesis Testing: The Confidence Interval Method and the T-Statistic Method (Agresti and Finlay,
Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences Session #18: Literary Analysis using Tests (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine.
Statistical Analysis & Techniques Ali Alkhafaji & Brian Grey.
PPA 501 – A NALYTICAL M ETHODS IN A DMINISTRATION Lecture 3b – Fundamentals of Quantitative Research.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Lecturer’s desk INTEGRATED LEARNING CENTER ILC 120 Screen Row A Row B Row C Row D Row E Row F Row G Row.
Time Series Analysis – Chapter 4 Hypothesis Testing Hypothesis testing is basic to the scientific method and statistical theory gives us a way of conducting.
Statistics for Social and Behavioral Sciences
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Statistics for Social and Behavioral Sciences Session #6: The Regression Line C’ted (Agresti and Finlay, Chapter 9) Prof. Amine Ouazad.
Multivariate Regression and Data Collection 11/21/2013.
Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression R squared, F test, Chapter 11 Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences Part IV: Causality Inference for Slope and Correlation Section 9.5 Prof. Amine Ouazad.
1 Introduction to Research Methods How we come to know about crime.
Chapter 16 Data Analysis: Testing for Associations.
Academic Research Academic Research Dr Kishor Bhanushali M
Statistics for Social and Behavioral Sciences Part IV: Causality Comparison of two groups Chapter 7 Prof. Amine Ouazad.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.
Introduction. The Role of Statistics in Science Research can be qualitative or quantitative Research can be qualitative or quantitative Where the research.
Tuesday, April 8 n Inferential statistics – Part 2 n Hypothesis testing n Statistical significance n continued….
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Sample Size Determination
Statistics for Social and Behavioral Sciences Session #19: Estimation and Hypothesis Testing, Wrap-up & p-value (Agresti and Finlay, from Chapter 5 to.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Seven Sins of Regression Evaluation Research (8521) Prof. Jesse Lecy Lecture 7 1.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Review Chapter 1-3. Exam 1 25 questions 50 points 90 minutes 1 attempt Results will be known once the exam closes for everybody.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Outline Sampling Measurement Descriptive Statistics:
Sample Size Determination
Hypothesis Testing.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Introductory Statistics
Correlation and Prediction
Presentation transcript:

Statistics for Social and Behavioral Sciences Part IV: Causality Association and Causality Session 22 Prof. Amine Ouazad

Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : T WO G ROUPS, R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks Multivariate regression now! Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method. Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule Bivariate sample statistics: Correlation, Slope Four Steps of “Thinking Like a Statistician” Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling Biases: Nonresponse bias, Response bias, Sampling bias

Coming up “Comparison of Two Groups” Last week. “Univariate Regression Analysis” Last Saturday. (Section 9.5) “Association and Causality: Multivariate Regression” Today, Monday, Tuesday. Chapters 10 and 11. “Randomized Experiments and ANOVA”. Wednesday. Chapter 12. “Robustness Checks and Wrap Up”. Last Thursday.

Outline 1.Correlation and Causation 2.Multiple Causes Partly Spurious Association Spurious Association Chain Relationship 3.Interaction Next time:Multivariate regression

What causes crime? National Neighborhood Crime Study (2002), Peterson, Ruth D., and Krivo, Lauren J. Ohio State University. N = 6,935 neighborhoods. Crime data from local police departments, and the Federal Bureau of Investigation. Total crime rate per 1,000 residents. Number of police officers. Ethnicity of police officers. Demographics of the neighborhood: poverty, unemployment rate, education.

Regression of Crime Rate on the Unemployment Rate y : total crime per 1,000 residents. x : unemployment rate from 0 to 100.

Causation Matters Changing, manipulating X will affect Y. Example: – if Poverty -> Crime, then addressing poverty (e.g. war on poverty, food stamps, welfare programs) will lower crime. – if CO2 emissions  Global average temperature, then reducing in CO2 emissions (eg through policies such as the Kyoto protocol) will lower global temperature. – If shoe size -> literacy, changing shoe size will affect literacy ! Nonsense. – If Hepatitis B vaccination -> autism, then reducing vaccination rates will lower the incidence of autism. X Y If the true relationship between X and Y is described by ….

True Model vs Statistical Model is your statistical model But the true model may be different: 1.Order is wrong. Y causes X instead of X causing Y. 2.Multiple causes. X may not be the most practically significant determinant of Y. 3.Spurious association. X may not cause Y at all. 4.Chain relationship. The impact of X on Y may be mediated by another variable X 2. 5.Interaction. The impact of X on Y may depend on the value of another variable X 2. X Y

Order is wrong? X Y Y X True modelStatistical model Regression suggests that more police officers per 10,000 resident leads to a higher crime rate per capita !?! Beware of software and formulas. Use them wisely.

Outline 1.Correlation and Causation 2.Multiple Causes Partly Spurious Association Spurious Association Chain Relationship 3.Interaction Next time:Multivariate regression

Multiple Causes Acknowledge that crime (Y) may be caused by a series of factors: X1X1 X2X2 X3X3 XKXK … Y True Model

Multiple Causes Acknowledge that the variable X 1 that you were focused on may not be the most practically significant variable that determines Y. Crime: finding the most important determinants of crime. – Education? Poverty? Unemployment? Female- headed households? Ethnicity of police officers? Number of police officers per 10,000 residents? Incarceration rate?

From Univariate to Multivariate Univariate regression: True model: y =  +  x 1 +  Statistical model: y = a + b x 1 + e with E(y|x 1 ) = a + b x 1. And SD(y|x 1 ) = SD(e). Multivariate regression: True model: y =  +   x 1 +   x 2 +   x 3 +  Statistical model: y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 + e with E(y|x 1,x 2,x 3 ) = a + b x 1 + b 2 x 2 + b 3 x 3. And SD(y|x 1,x 2,x 3 ) = SD(e).

Including X 2 may affect the coefficient b 1 of X 1 Race has a negative statistically significant impact on the crime rate. Accounting for multiple variables avoids simplistic statements !!!

Partly Spurious Association between X 1 and Y The statistical model does not include X 2. When including X 2 in the regression, the effect of X 1 is lower in magnitude. X 2 has both a direct and indirect effect on X 1. X1X1 X2X2 Y X1X1 Y True modelStatistical model

Spurious Association A statistically significant slope coefficient b does not mean that X 1 causes Y. Another factor X 2 may be causing both X 1 and Y. When including X 2 in the regression, the effect X1X1 X2X2 Y X1X1 Y True modelStatistical model

Shoe size and Literacy Sample of N children from age 5 to age 16. Literacy measured in the Early Childhood Longitudinal Study. Including age in the regression will likely render the coefficient of shoe size non significant. Shoe size Age Literacy True modelStatistical model Shoe size Literacy

Correct approach Make the true model and the statistical model coincide. Regress Y on both X 1 and X 2. Include all determinants of crime in the regression. X1X1 X2X2 Y X1X1 Y True modelStatistical model X2X2

Researchers had found that school funding is positively correlated (statistically significant and positive r and b) with student test scores…. But when including measures of teacher quality, the relationship between the amount of money a school spends has no statistically significant impact on student test scores. What makes a good school? Teacher quality Funding Student test score Funding True modelStatistical model Student test score

Chain Relationship X 1 causes Y …. But the effect of X1 on Y is entirely due to its effect on X 2. When not including X 2 in the regression, the coefficient of X1 is statistically significant. When including X 2 in the regression, the coefficient of X1 is not statistically significant. X1X1 X2X2 Y X1X1 Y True modelStatistical model

Outline 1.Correlation and Causation 2.Multiple Causes Partly Spurious Association Spurious Association Chain Relationship 3.Interaction Next time:Multivariate regression

Interaction X 2 affects how X 1 causes Y. For instance, unemployment causes crime, but the impact is much lower in neighborhoods that have a higher income. When not accounting for X 2, the coefficient of X 1 measures the average impact of X 1 on Y. X1X1 X2X2 Y X1X1 Y True modelStatistical model

Accounting for the Interaction of X1 and X2 Include both X 2 and the product of X 1 and X 2 in the regression. Model:y = a + b 1 x 1 + b 2 x 2 + b 3 x 1 *x 2 + e If b3 is positive, the impact of x 1 on y is larger the higher the value of x 2. If b3 is negative, the impact of x 1 on y is smaller the higher the value of x 2.

Here, b 3 is negative ! T_HINC75: percentage in neighborhood with high income. Accounting for the Interaction of unemployment and income

Wrap up Know the difference between the true model and the statistical model. Learn how to perform a multivariate regression in Stata. Order X and Y correctly. Account for multiple causes. Account for spurious correlations. Account for chain relationships. Account for interactions.

Coming up: Schedule for next week: Chapter on “Association and Causality”, and “Multivariate Regression”. Last online quiz sent last night, due Sunday 9am. Make sure you come to sessions and recitations. SundayMondayTuesdayWednesdayThursday RecitationEvening session 7.30pm West Administration 002 Usual class 12.45pm Usual room Evening session 7.30pm West Administration 001 Usual class 12.45pm Usual room