CORRELATION
Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and weight (y) of 10 students A single characteristic is measured on two groups of individuals e.g : The height of 10 males (x) and 10 females (y)
HeightSelf-esteem
Definition Correlation is used to measure and describe a relationship/association between two variables A single number which describes the relationship between X and Y is the correlation coefficient. Denoted by ‘r’ or ‘ρ ’.
Scatter Diagram
What is the relationship between level of education and lifetime earnings?
Direction of Relationship A scatter plot shows at a glance the direction of the relationship. A positive correlation indicates a directly proportional relationship.
Direction of Relationship A negative correlation indicates an inversely proportional relationship
No Correlation In cases where there is no correlation between two variables, the dots are scattered about the plot in an irregular pattern.
Correlation Coefficient The correlation coefficient measures three characteristics of the relationship between X and Y: The direction of the relationship. The form of the relationship. The degree of the relationship
Karl Pearson Correlation
Calculation Calculate the KP Correlation for data in slide 3. Ans: 0.73 Interpretation: The data exhibits a strong positive correlation indicating that self-esteem increases with height.
The data shows a high positive correlation between income and education.
Drawbacks Presence of outliers Nonlinear scatter plot of x and y values. In the next slide scatter plots are shown for 7 different datasets that have the same correlation r=0.70. Is the use of r justified in each case?
Rank Correlation Age (mths) Stopping distance Age rankStopping rank dd2d
Scatter Plot
Calculations Number in sample (n) = 10 r = 1 - (195 / 10 x 99) r = r = 0.803
Probable Error If r>6P.E, then correlation is highly significant in the population, otherwise it is insignificant.
Caution Correlation does not imply causation. Example : Average temperature (x) in a month and number of ice cream vendors (y). r=0.92 (Highly positive)