Statistically-Significant Correlations

Slides:



Advertisements
Similar presentations
Correlation and Regression
Advertisements

Chapter 12 Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
SIMPLE LINEAR REGRESSION
Topics: Significance Testing of Correlation Coefficients Inference about a population correlation coefficient: –Testing H 0 :  xy = 0 or some specific.
STATISTICS ELEMENTARY C.M. Pascual
SIMPLE LINEAR REGRESSION
Measures of Regression and Prediction Intervals
Relationship of two variables
Chapter 13: Inference in Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Correlation and Regression Chapter 9. § 9.3 Measures of Regression and Prediction Intervals.
Elementary Statistics Correlation and Regression.
Correlation & Regression
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Normal Distribution.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Math 4030 – 13a Correlation & Regression. Correlation (Sec. 11.6):  Two random variables, X and Y, both continuous numerical;  Correlation exists when.
Correlation & Regression Analysis
2015 ASA V4 1 by Milo Schield, Augsburg College Member: International Statistical Institute US Rep: International Statistical Literacy Project Director,
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Hypothesis Testing Example 3: Test the hypothesis that the average content of containers of a particular lubricant is 10 litters if the contents of random.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Using the regression equation (17.6). Why regression? 1.Analyze specific relations between Y and X. How is Y related to X? 2.Forecast / Predict the variable.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Logistic Regression using Excel OLS with Nudge
Correlation and Regression
Department of Mathematics
Chapter 4: Basic Estimation Techniques
Regression and Correlation
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 4 Basic Estimation Techniques
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Regression Analysis AGEC 784.
Correlation and Simple Linear Regression
Basic Estimation Techniques
AND.
Elementary Statistics
Section 13.7 Linear Correlation and Regression
Math 4030 – 12a Correlation.
BIVARIATE REGRESSION AND CORRELATION
CHAPTER 10 Correlation and Regression (Objectives)
Basic Estimation Techniques
Elementary Statistics
Statistical Inference about Regression
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Copyright © Cengage Learning. All rights reserved.
SIMPLE LINEAR REGRESSION
Simple Linear Regression
Product moment correlation
SIMPLE LINEAR REGRESSION
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Chapter Outline Inferences About the Difference Between Two Population Means: s 1 and s 2 Known.
Why 25% of Voter Polls are Wrong
Regression and Correlation of Data
Presentation transcript:

Statistically-Significant Correlations Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 Statistically-Significant Correlations Milo Schield Augsburg College Editor of www.StatLit.org US Rep: International Statistical Literacy Project Fall 2014 National Numeracy Network Conference www.StatLit.org/pdf/ 2014-Schield-NNN-Significant-Correlations-Slides.pdf 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 1 1

Segmented Regression Model Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 2 Exact Solutions For N random pairs from an uncorrelated bivariate normally-distributed distribution, the sampling distribution is not simple. Here are three common analytic approaches: Fisher transformation (using LN and Arctanh), an exact solution (using a Gamma function), or Student-t distribution: t=rSqrt[(n-2)/(1-r^2)]; df=n-2 For large n, the critical value of t (95% confidence) is 1.96. For small n, the critical value of t increases as n decreases. None of these are simple or memorable. 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 2 2

Segmented Regression Model Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 3 Sufficient Condition Approach: Find an equation generating a minimum correlation for statistical-significance given N. Given N, find the smallest value of r where the left end of a 95% confidence interval is non-negative. Use calculator at www.vassarstats.net/rho.html or www.danielsoper.com/statcalc3/calc.aspx?id=44 For Daniels, use the results for a two-tailed test. Generate correlation coefficient with simple model Calculate error difference between calculated and exact using the exact as the standard. If all errors are positive, then the model is sufficient. 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 3 3

Simple Model: 2/SQRT(n) Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 4 Simple Model: 2/SQRT(n) All errors positive means the model is sufficient. 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 4 4

Segmented Regression Model Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 5 Solution Minimum statistically-significant r = 2/Sqrt(n) “n” is the number of pairs being correlated Less than 5% over for n between 5 and 4,000. Simple and memorable for two variables. It is similar to the formula for the maximum 95% Margin of Error in samples from a binary variable: 95% ME = 1.96 Sqrt[p*(1-p)/n] < 2 Sqrt[1/(4n)] 95% ME < 1/Sqrt(n) Simple and memorable for one binary variable. 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 5 5

Time-Series Correlations www.tylervigen.com Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 6 Time-Series Correlations www.tylervigen.com 10 pairs; 2/Sqrt(10) = 0.63; Statistically significant 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 6 6

Correlation = -0.993 Bee colonies & MJ arrests Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 7 Correlation = -0.993 Bee colonies & MJ arrests 20 pairs; 2/Sqrt(20) = 0.45; Statistically-significant 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 7 7

Correlation = 0.664 Drownings & Cage films Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 8 Correlation = 0.664 Drownings & Cage films 11pairs; 2/Sqrt(11) = 0.60; Statistically-significant! 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 8 8

Segmented Regression Model Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 9 Something Seems Wrong! There is nothing linear about these associations. These correlations seem unbelievably high. ----------------------- #1: The correlation between two time-series eliminates the common factor: time. The question is whether their mutual association is linear. To see this, an XY-plot is generated. 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 9 9

Correlation = 0.664 Drownings & Cage films Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 10 Correlation = 0.664 Drownings & Cage films 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 10 10

#2: Very High Correlations. Three Explanations Segmented Regression Model 2014 #2: Very High Correlations. Three Explanations Association is causal. See Tyler Vigen’s video: www.youtube.com/watch?feature=player_embedded&v=g-g0ovHjQxs Association is spurious – just random chance. Five percent of random associations will be mistakenly classified as statistically significant. Association is cherry-picked -- after the fact. According to Tyler, “This server has generated 24,470 correlations.” Tyler just picked those with high or interesting correlations. 2014-Schield-Segmented-Regression-slides

Segmented Regression Model 2014 Conclusions Use 2/Sqrt(n) as the minimum correlation for statistical significance. This criteria is sufficient, fairly accurate (within 5%) and memorable. The correlation between two time-series eliminates time. Correlation determines the degree of linearity in their cross-sectional association. Do not use a test for statistical significance if the data pairs were selected – after the fact via data mining – solely because of their high correlation. 2014-Schield-Segmented-Regression-slides

Correlation = 0.993 Divorce & Margarine Usage Analyzing Numbers in the News Segmented Regression Model 15 May 2008 2014 13 Correlation = 0.993 Divorce & Margarine Usage Correlation: 0.993. N=10, SS_Rho = 1/sqrt(11) = 10 pairs; 2/Sqrt(10) = 0.64. Statistically-significant www.tylervigen.com/view_correlation?id=1703 2008SchieldNNN6up.pdf 2014-Schield-Segmented-Regression-slides 13 13