Causation. Learning Objectives By the end of this lecture, you should be able to: – Describe causation and the ways in which it differs from correlation.

Slides:



Advertisements
Similar presentations
Chapter 4: Designing Studies
Advertisements

Chapter 4: Designing Studies
Aim: How do we establish causation?
AP Statistics Section 4.3 Establishing Causation
Correlation AND EXPERIMENTAL DESIGN
2.6 The Question of Causation. The goal in many studies is to establish a causal link between a change in the explanatory variable and a change in the.
Describing the Relation Between Two Variables
Correlation: Relationships Can Be Deceiving. The Impact Outliers Have on Correlation An outlier that is consistent with the trend of the rest of the data.
LSP 121 Introduction to Correlation. Correlation The news is filled with examples of correlation – If you eat so many helpings of tomatoes… – One alcoholic.
Correlation: Relationships Can Be Deceiving. An outlier is a data point that does not fit the overall trend. Speculate on what influence outliers have.
Lesson Establishing Causation. Knowledge Objectives Identify the three ways in which the association between two variables can be explained. Define.
 Pg : 3b, 6b (form and strength)  Page : 10b, 12a, 16c, 16e.
Chapter 4 Section 3 Establishing Causation
The Question of Causation
HW#9: read Chapter 2.6 pages On page 159 #2.122, page 160#2.124,
1 10. Causality and Correlation ECON 251 Research Methods.
 Correlation and regression are closely connected; however correlation does not require you to choose an explanatory variable and regression does. 
C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Sec. 4.2 – Cautions about Correlation and Regression.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
4.3: Establishing Causation Both correlation and regression are very useful in describing the relationship between two variables; however, they are first.
Does Association Imply Causation? Sometimes, but not always! What about: –x=mother's BMI, y=daughter's BMI –x=amt. of saccharin in a rat's diet, y=# of.
Get out your Residuals Worksheet! You will be able to distinguish between correlation and causation. Today’s Objectives:
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
 What is an association between variables?  Explanatory and response variables  Key characteristics of a data set 1.
Lecture 5 Chapter 4. Relationships: Regression Student version.
CORRELATIONS: PART II. Overview  Interpreting Correlations: p-values  Challenges in Observational Research  Correlations reduced by poor psychometrics.
Chapter 4 Day Six Establishing Causation. Beware the post-hoc fallacy “Post hoc, ergo propter hoc.” To avoid falling for the post-hoc fallacy, assuming.
Describing Relationships
Cautions About Correlation and Regression Section 4.2.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Scatterplots and Correlation Section 3.1 Part 2 of 2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates, Moore.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Prediction and Causation How do we predict a response? Explanatory Variables can be used to predict a response: 1. Prediction is based on fitting a line.
Data Analysis Causation Goal: I can distinguish between correlation and causation. (S-ID.9)
The Question of Causation 4.2:Establishing Causation AP Statistics.
AP Statistics. Issues Interpreting Correlation and Regression  Limitations for r, r 2, and LSRL :  Can only be used to describe linear relationships.
4. Relationships: Regression
2.7 The Question of Causation
Cautions About Correlation and Regression Section 4.2
4. Relationships: Regression
Cautions About Correlation and Regression
Proving Causation Why do you think it was me?!.
Establishing Causation
Section 4.3 Types of Association
Chapter 2: Looking at Data — Relationships
Chapter 2 Looking at Data— Relationships
Scatterplots, Association, and Correlation
Register for AP Exams --- now there’s a $10 late fee per exam
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Cautions about Correlation and Regression
The Question of Causation
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Lesson Using Studies Wisely.
Least-Squares Regression
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Does Association Imply Causation?
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Section 6.2 Establishing Causation
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Presentation transcript:

Causation

Learning Objectives By the end of this lecture, you should be able to: – Describe causation and the ways in which it differs from correlation. – Describe what is far and away the best method of establishing causation. – Explain what a confounding variable is.

*** Correlation does not necessarily imply causation *** A correlation does not mean that there is causation. As you know, correlation means that there is a relationship between two variables. Causation means that if you see a change in your explanatory variable, it should cause a change in the response variable. – Example: If you give someone extra beers, it should cause a change in BAC. – Example: If you allow more powerboat licenses, it should cause a change in the number of manatee deaths. Even if a correlation is very strong, this is not by itself good evidence that a change in x will cause a change in y

Causation v.s. Correlation Causation means that whenever there is a change in an explanatory variable, it should cause in a change in the response variable. Correlations: Correlations between two variables are extremely common and easy to find. However, saying that two variables are correlated in NO way guarantees that there is causation. Put another way: Having correlation without causation means that changing the explanatory variable will NOT guarantee a change in the response variable.

In the real world… Often (very, very often!) people report “associations” (i.e. correlation) between two variables. Yet upon further examination, it turns out that there is not ANY causation whatsoever! As humans, though, upon hearing about “associations” we often jump to an assumption of correlation. Most of the time, the causation simply is NOT there.

Example One study in Victorian England showed a strong correlation between people wearing top hats, and their life expectancy. This relationship was shown to be very strong (high ‘r’). Does this mean that had Queen Victoria provided free top-hats for all, the life expectancy in England would have shot up? – There is a confirmed correlation. However, there is NO causation. That is, wearing top hats does not cause people to live longer. So, what’s going on here? – Answer: There is a lurking variable! In this case, there is the lurking variable is income. People with higher incomes could afford doctors and medicines. These were in no way a given in Victorian England! – So in this case, while there is correlation between top-hats and life epxectancy, there is no causation. – However, there would be a causal relationship between Income and life expectancy.

Reminder: Before embarking on a regression analysis… Pop Quiz: After today, you should be able to answer the following question without looking at your notes…. What are three key requirements that should be met before embarking on a regression analysis? 1.If you are doing a linear * regression analysis, the relationship must be, well, linear! 2.The correlation (‘r’) should not be very weak. 3.There must be causation. Which of these would cause us to reject a regression analysis of the relationship between top-hats and life expectancy? Answer: #3. *There are versions of regression that can be done on non-linear relationships. However, we will not cover them in this course.

Example Correlation v.s. Causation One study during the polio epidemic in the 1920s showed a strong correlation between ice cream consumption and cases of polio. As a result, the public was warned to avoid eating ice cream as it increased the risk of contracting the disease. Thoughts? – Again, there was a strongly confirmed correlation. However, it turned out that there was NO causation. With a properly controlled experiment, it could have been easily shown that increased ice cream consumption did NOT increase the risk of polio. – Again, there was a lurking variable hiding in the background. It turns out that the virus that causes polio (a virus of the picornoviridae family for anyone who cares) thrives in warmer weather. So the lurking variable here was, temperature!

An example using R 2  Even when causation is present, does it give the whole picture?  A mother’s weight and her daughter’s weight are clearly correlated. In addition, experimentation has shown that there is also causation. One study came up with r=0.50, R 2 =  Why is R 2 so small?  Answer: What’s missing is that weight gain is multifactorial. That is, it is caused by many things. While genetics clearly does play a huge role, (i.e. people with a natural tendency to be overweight are more likely to have overweight kids), many other factors also contribute.  These include: TV, dietary habits in the house, attitude towards exercise, etc, etc. In other words, ‘mom’s weight’ is not a very useful explanatory variable in this case. It would be more helpful to try to analyze the genetics relationship separately from TV habits in the house, separate from exercise habits, etc.

How can we establish causation?  So how CAN we establish if causation is present?  Answer: Only a well-designed experiment with proper control groups can prove causation.

Let’s Play: Correlation, Causation or Both? Can we simply sit back and allow time to magically improve child mortality rates? No. Strong correlation, but no causation. In this case, the state of medical research is the lurking variable. Are kids with small feet doomed to be bad readers? No. Strong correlation, but no causation. In this case, the age of the child is the lurking variable.

Did you know??? Rooster crowing is perfectly correlated (r=1.0) with the sun rising?

Correlation is EXTREMELY common! We are constantly bombarded with relationships between variables. However, even when you DO find correlations (and they love to talk about these on the 6:00 news), there is very often no good evidence of causation. Causation… not so much!

Is there causation? 1.Student’s SAT score with subsequent college GPA – There is certainly a correlation since students who are good students will probably do well on the SAT and then again in college. However, if you sent everyone to a 4-week intensive SAT prep, you would probably see improvement in test scores on that exam, but the better SAT score would not cause an improvement later in college. 2.Being married with being happy People who are happier are more statistically more likely to get married than people who are not. 3.Being deeply religious with life expectancy – People who are religious are less likely to be the kind of people who smoke, use, drugs, etc.

Confounding variables Two variables are confounded when their effects on a response variable cannot be distinguished from each other. Example: Heavy drinking is strongly correlated and causal with decreased lifespan. Heavy drinkers are also statistically more likley to be smokers, less likely to adhere to a good diet, and are more likely to have some form of depressive disorder. If you were trying to determine to what degree alcohol use decreased lifespan, it would be hard to do without “controlling” for these confounding variables. – ‘Controlling’ is an important term in study design.

Figure 2.28 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Some possible explanations for an observed association. The dashed lines show an association. The solid arrows show a cause- and-effect link. x is explanatory, y is response, and z is a lurking variable. I will not ask you to distinguish between common-response / lurking / confounding variables..

Establishing causation It appears that lung cancer is associated with smoking. How do we know that both of these variables are not being affected by an unobserved third (lurking) variable? For instance, what if there is a genetic predisposition that causes people to both get lung cancer and become addicted to smoking, but the smoking itself doesn’t CAUSE lung cancer? 1)The association is strong. 2)The association is consistent. 3)Higher doses are associated with stronger responses. 4)Alleged cause precedes the effect. 5)The alleged cause is plausible. Ultimately, however, THERE IS NO SUBSTITUTE FOR AN EXPERIMENT!!! We can evaluate the association using the following criteria: