Correlation: Relationships Can Be Deceiving. The Impact Outliers Have on Correlation An outlier that is consistent with the trend of the rest of the data.

Slides:



Advertisements
Similar presentations
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Can Be Deceiving Chapter 11.
Advertisements

Copyright © 2011 Pearson Education, Inc. Statistical Reasoning.
Aim: How do we establish causation?
AP Statistics Section 4.3 Establishing Causation
Section 7.2 ~ Interpreting Correlations Introduction to Probability and Statistics Ms. Young ~ room 113.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
Copyright © 2015, 2011, 2008 Pearson Education, Inc. Chapter 5, Unit E, Slide 1 Statistical Reasoning 5.
AP Statistics Causation & Relations in Categorical Data.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
2.6 The Question of Causation. The goal in many studies is to establish a causal link between a change in the explanatory variable and a change in the.
Correlation: Relationships Can Be Deceiving. An outlier is a data point that does not fit the overall trend. Speculate on what influence outliers have.
Correlation MARE 250 Dr. Jason Turner.
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
Correlation Relationship between Variables. Statistical Relationships What is the difference between correlation and regression? Correlation: measures.
Correlation: Relationship between Variables
Scatterplots By Wendy Knight. Review of Scatterplots  Scatterplots – Show the relationship between 2 quantitative variables measured on the same individual.
Describing Relationships: Scatterplots and Correlation
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Research Methods Case studies Correlational research
Research Methods in Crime and Justice Chapter 5 Causality.
Categorical Variables, Relative Risk, Odds Ratios STA 220 – Lecture #8 1.
The Question of Causation
HW#9: read Chapter 2.6 pages On page 159 #2.122, page 160#2.124,
1 10. Causality and Correlation ECON 251 Research Methods.
Chapter 13 Observational Studies & Experimental Design.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Can Be Deceiving Chapter 11.
Chapter 151 Describing Relationships: Regression, Prediction, and Causation.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.
Essential Statistics Chapter 41 Scatterplots and Correlation.
Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:
Introduction to Correlation.  Correlation – when a relationship exists between two sets of data  The news is filled with examples of correlation ◦ If.
Chapter 151 Describing Relationships: Regression, Prediction, and Causation.
Relationships Can Be Deceiving Statistics lecture 5.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Lecture Presentation Slides SEVENTH EDITION STATISTICS Moore / McCabe / Craig Introduction to the Practice of Chapter 2 Looking at Data: Relationships.
Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.
Describing Relationships: Scatterplots and Correlation.
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Chapter 141 Describing Relationships: Scatterplots and Correlation.
BPS - 5th Ed. Chapter 41 Scatterplots and Correlation.
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Causal inferences This week we have been discussing ways to make inferences about the causal relationships between variables. One of the strongest ways.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Lecture 8 Sections Objectives: Bivariate and Multivariate Data and Distributions − Scatter Plots − Form, Direction, Strength − Correlation − Properties.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Data Analysis Causation Goal: I can distinguish between correlation and causation. (S-ID.9)
Scatterplots, Association, and Correlation. Scatterplots are the best way to start observing the relationship and picturing the association between two.
Copyright © 2011 Pearson Education, Inc. Statistical Reasoning 1 web 39. Weather Maps 40. Cancer Cure 1 world 41. News Graphics 42. Geographical Data.
2.7 The Question of Causation
Cautions About Correlation and Regression
Cautions about Correlation and Regression
Chapter 2: Looking at Data — Relationships
Chapter 2 Looking at Data— Relationships
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Correlation and Causality
Lesson Using Studies Wisely.
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Section 6.2 Establishing Causation
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: More on Two-Variable Data
Presentation transcript:

Correlation: Relationships Can Be Deceiving

The Impact Outliers Have on Correlation An outlier that is consistent with the trend of the rest of the data will inflate the correlation. An outlier that is not consistent with the rest of the data can substantially decrease the correlation.

Ages of Husbands and Wives Example of an outlier that should be removed. Subset of data on ages of husbands and wives, with one outlier added (entered 82 instead of 28 for husband’s age). Correlation for data with outlier is.39. If outlier removed, correlation of remainder points is.964 – a very strong linear relationship.

Legitimate Outliers, Illegitimate Correlation Be careful … when presented with data in which outliers are likely to occur When correlations presented for a small sample. Outliers can be legitimate data and in some cases should not be removed (without convincing justification). See Homework for an example.

The Missing Link: A Third Variable Simpson’s Paradox: Two or more groups. Variables for each group may be strongly correlated. When groups combined into one, very little correlation between the two variables (and after further review, they should not be combined).

The Fewer The Pages, The More Valuable The Book? Correlation? r = –.312, more pages => less cost? Pages versus Price for the Books on a Professor’s Shelf Example of data that should be split into two groups.

The Fewer The Pages, The More Valuable The Book? Correlation is –.312, more pages => less cost? Scatterplot includes book type: H = hardcover, S = softcover. Correlation for H books:.64 Correlation for S books:.35 Combining two types masked the positive correlation and produced illogical negative association. Pages versus Price for the Books on a Professor’s Shelf Example of data that should be split into two groups.

Legitimate Correlation Does Not Imply Causation An example of a silly correlation: List of weekly tissue sales and weekly hot chocolate sales for a city with extreme seasons would probably exhibit a correlation because both tend to go up in the winter and down in the summer. Even if two variables are legitimately related or correlated, do not fall into the trap of believing there is a causal connection. This happens often with observational studies. In the absence of any other evidence, data from observational studies simply cannot be used to establish causation.

Prostate Cancer and Red Meat Study Details: Followed 48,000 men who filled out dietary questionnaires in By 1990, 300 men diagnosed with prostate cancer and 126 had advanced cases. For advanced cases: “men who ate the most red meat had a 164% higher risk than those with the lowest intake.” Possible third variable that both leads men to consume more red meat and increases risk of prostate cancer … the hormone testosterone. Example showing that Legitimate Correlation Does Not Imply Causation

Reasons Two Variables Could Be Related: 1.Explanatory variable is the direct cause of the response variable. e.g.Amount of food consumed in past hour and level of hunger. 2.Response variable is causing a change in the explanatory variable. e.g. Explanatory = advertising expenditures and Response = occupancy rates for hotels. 3.Explanatory variable is a contributing but not sole cause of the response variable. e.g.Carcinogen in diet is not sole cause of cancer, but rather a necessary contributor to it.

Reasons Two Variables Could Be Related: 4.Confounding variables may exist. A confounding variable is related to the explanatory variable and affects the response variable. So can’t determine how much change is due to the explanatory and how much is due to the confounding variable(s). e.g.Emotional support is a confounding variable for the relationship between happiness and length of life in 5.Both variables may result from a common cause. e.g. Verbal SAT and GPA: causes (such as time dedicated to studying) responsible for one variable being high (or low) are same as those responsible for the other being high (or low).

Reasons Two Variables Could Be Related: 6.Both variables are changing over time. Nonsensical associations result from correlating two variables that have both changed over time. Divorce Rates and Drug Offenses Correlation between year and divorce rate is Correlation between year and % admitted for drug offenses is Correlation between divorce rate and % admitted for drug offenses is 0.67, quite strong. However, both simply reflect a trend across time.

Reasons Two Variables Could Be Related: 7.Association may be nothing more than coincidence. Association is a coincidence, even though odds of it happening appear to be very small. Example: New office building opened and within a year there was an unusually high rate of brain cancer among workers in the building. Suppose odds of having that many cases in one building were only 1 in 10,000. But there are thousands of new office buildings, so we should expect to see this phenomenon just by chance in about 1 of every 10,000 buildings.

Confirming Causation Evidence of a possible causal connection: There is a reasonable explanation of cause and effect. The connection happens under varying conditions. Potential confounding variables are ruled out. The only legitimate way to try to establish a causal connection statistically is through the use of randomized experiments. If a randomized experiment cannot be done, then nonstatistical considerations must be used to determine whether a causal link is reasonable.