Anscombe’s Quartet.

Slides:



Advertisements
Similar presentations
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Advertisements

11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Linear Regression
CHAPTER 6 Introduction to Graphing and Statistics Slide 2Copyright 2012, 2008, 2004, 2000 Pearson Education, Inc. 6.1Tables and Pictographs 6.2Bar Graphs.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Business Statistics - QBM117 Statistical inference for regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
Correlation.
STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,
Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Correlation & Regression
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 8 Making Sense of Data in Six Sigma and Lean
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Multivariate Data Analysis Chapter 2 – Examining Your Data
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
© (2015, 2012, 2008) by Pearson Education, Inc. All Rights Reserved Chapter 11: Correlational Designs Educational Research: Planning, Conducting, and Evaluating.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
Descriptive Statistics ( )
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Lecture Slides Elementary Statistics Twelfth Edition
Data Visualization vs. Infographics
Warm Up Scatter Plot Activity.
MATH-138 Elementary Statistics
CORRELATION.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
CHS 221 Biostatistics Dr. wajed Hatamleh
Visualization Analytics
Correlation – Regression
SIMPLE LINEAR REGRESSION MODEL
Elementary Statistics
CHAPTER 10 Correlation and Regression (Objectives)
…Don’t be afraid of others, because they are bigger than you
Theme 7 Correlation.
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
2. Find the equation of line of regression
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
Two Way Frequency Tables
Lecture 14 Review of Lecture 13 What we’ll talk about today?
Make Your Data Tell a Story
The greatest blessing in life is
Good Morning AP Stat! Day #2
Simple Linear Regression
3 4 Chapter Describing the Relation between Two Variables
Chapter 14 Inference for Regression
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Review of Chapter 3 Examining Relationships
Scatterplots, Association, and Correlation
Presentation transcript:

Anscombe’s Quartet

Why Statistics Can Be Wrong Anscombe's quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties.[1]

The Statistical Properties Property Value Mean of x in each case 9 (exact) Sample variance of x in each case 11 (exact) Mean of y in each case 7.50 (to 2 decimal places) Sample variance of y in each case 4.122 or 4.127 (to 3 decimal places) Correlation between x and y in each case 0.816 (to 3 decimal places) Linear regression line in each case y = 3.00 + 0.500x (to 2 and 3 decimal places, respectively)

All four sets are identical when examined using simple summary statistics, but vary considerably when graphed

The Datasets I II III IV x y 10.0 8.04 9.14 7.46 8.0 6.58 6.95 8.14 6.77 5.76 13.0 7.58 8.74 12.74 7.71 9.0 8.81 8.77 7.11 8.84 11.0 8.33 9.26 7.81 8.47 14.0 9.96 8.10 7.04 6.0 7.24 6.13 6.08 5.25 4.0 4.26 3.10 5.39 19.0 12.50 12.0 10.84 9.13 8.15 5.56 7.0 4.82 7.26 6.42 7.91 5.0 5.68 4.74 5.73 6.89 A procedure to generate similar data sets with identical statistics and dissimilar graphics has since been developed.

The Representation The first scatter plot (top left) appears to be a simple linear relationship, corresponding to two variables correlated and following the assumption of normality. The second graph (top right) is not distributed normally; while an obvious relationship between the two variables can be observed, it is not linear, and the Pearson correlation coefficient is not relevant (a more general regression and the corresponding coefficient of determination would be more appropriate). In the third graph (bottom left), the distribution is linear, but with a different regression line, which is offset by the one outlier which exerts enough influence to alter the regression line and lower the correlation coefficient from 1 to 0.816 (a robust regression would have been called for). Finally, the fourth graph (bottom right) shows an example when one outlier is enough to produce a high correlation coefficient, even though the relationship between the two variables is not linear. The quartet is still often used to illustrate the importance of looking at a set of data graphically before starting to analyze according to a particular type of relationship, and the inadequacy of basic statistic properties for describing realistic datasets.[2][3][4][5][6]

References Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17– 21. JSTOR 2682899. Elert, Glenn. "Linear Regression". The Physics Hypertextbook. Janert, Philipp K. (2010). Data Analysis with Open Source Tools. O'Reilly Media, Inc. pp. 65–66. ISBN 0-596-80235-8. Chatterjee, Samprit; Hadi, Ali S. (2006). Regression analysis by example. John Wiley and Sons. p. 91. ISBN 0-471-74696-7. Saville, David J.; Wood, Graham R. (1991). Statistical methods: the geometric approach. Springer. p. 418. ISBN 0-387-97517-9. Tufte, Edward R. (2001). The Visual Display of Quantitative Information (2nd ed.). Cheshire, CT: Graphics Press. ISBN 0-9613921-4-2. Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with Identical Statistics but Dissimilar Graphics: A Follow up to the Anscombe Dataset". American Statistician 61 (3): 248–254.doi:10.1198/000313007X220057.