What is correlation? How to compute? How to interpret? 2
The relations between two variables How the value of one variable changes when the value of another variable changes A correlation coefficient is a numerical index to reflect the relationship between two variables. Range: -1 ~ +1 Bivariate correlation (for two variables) 3
Parametric Pearson product-moment correlation (named for inventor Karl Pearson) Non-parametric Spearman’s rank correlation Kendall tau rank correlation coefficient 4
For two variables which are continuous in nature Height, age, test score, income But not for discrete or categorical variables Race, political affiliation, social class, rank R xy is the correlation between variable X and variable Y 5
Direct correlation (positive correlation): If both variables change in the same direction Indirect correlation (negative correlation): If both variables change in opposite directions 6
Below is Correlation Report of different Currency Exchange Rate on November 13 – 2014 (source: Bloomberg Terminal) -0.8 and 0.5, which is stronger? 7
the correlation coefficient between X and Y n the size of the sample X the individual’s score on the X variable Ythe individual’s score on the Y variable XYthe product of each X score times its corresponding Y score X 2 the individual X score, squared Y 2 the individual Y score, squared 8
Calculate Pearson correlation coefficient for US school enrollment (unit: k) in some time points of previous 50 years. (Source: United States Census Bureau) 1. Select two columns of data – are they correlated? 2. What does this correlated mean? 9 Year G9-12 Public G9-12 Private College- Public College- Private
CORREL function Or PEARSON function 10
Scatterplot or scattergram X Y 11 XY
12
r =1, a perfect direct (or positive) correlation In real life case, 0.7 and 0.8 could be the highest you will see 13
Strength and direction are important 14
Four sets of data with the same correlation of
Linear correlation means that X and Y are in one straight line Curvlilinear correlation Age and memory 16
incomeeducationattitudevote How to calculate the correlation coefficient? 1.CORREL() 2.Correlation in data analysis toolset 17
Correlation matrix IncomeEducationAttitudeVote Income Education Attitude Vote
Data Analysis tool - correlation 19
Correlation value: - finite number ~ + finite number Correlation coefficient value: ~ r xy valueInterpretation 0.8 ~ 1.0Very strong relationship (share most of the things in common) 0.6 ~0.8Strong relationship (share many things in common) 0.4 ~ 0.6Moderate relationship (share something in common) 0.2 ~ 0.4Weak relationship (share a little in common) 0.0 ~ 0.2Weak or no relationship (share very little or nothing in common) 20
Coefficient of determination: The percentage of variance in one variable that is accounted for by the variance in the other variable. = square of coefficient 49% of the variance in GPA can be explained by the variance in studying time 21
The amount of unexplained variance is called the coefficient of undetermination (coefficient of alienation) correlationdeterminationinterpretation
In a small town in Greece, The local police found the direct correlation between ice cream and crime 23
The correlation represents the association between two or more variables It has nothing to do with causality (there is no cause relation between two correlated variables) Ices cream and crime are correlated, but Ices cream does not cause crime 24
Summer Summer is when people get together. More specifically, casual drinkers and drug users are more likely to go to bars or parties on weekends and evenings, as opposed to a Tuesday morning. These people in the social mix, flooding the city’s streets and neighborhood bars, feed the peak times for murder, experts say. 25