Data Screening: Don’t Leave Home Without It AEA Coffee Break Webinar 8 September 2011 Dale Berger dale.berger@cgu.edu http://wise.cgu.edu
Why screen data? Locate and correct errors Assure that assumptions of statistical procedures are met Become familiar with the data Context is critical
Bivariate data example Data constructed by Anscombe consists of four sets of 11 pairs of X-Y scores. Can these four sets of data be pooled? Bumble applied regression to each set separately and recorded summary statistics. Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27, 17-21.
These summary statistics are identical in all four sets Sample Size (N) 11 Mean of X 9.0 Mean of Y 7.5 Correlation 0.816 Linear Equation y′ = 3 + .5x Regression SS 27.50 Residual SS 13.75 (df = 9)
Bumble’s Conclusions: There is a strong linear relationship between X and Y, as is apparent from r = .819, F(1, 9) = 18.00, p=.00217. All four data sets are equivalent and probably were sampled from the same population. What more would you like to know?
Anscombe’s four data sets
Data Set 1
Data Set 2
Data Set 3
Data Set 4
You can see a lot by just looking ---Yogi Berra If you don’t look, you won’t see it. --- DB
The Most Important Test in Statistics The Intra-Ocular Trauma Test To err is human; to really screw up it takes a computer!
Thank you! Questions and comments? dale.berger@cgu.edu http://wise.cgu.edu