Day 49 Causation and Correlation
Introduction A linear relationship between two variables is often determined by the use of a linear regression and the magnitude of the linearity determined by the correlation coefficient. We sometimes have cases where the correlation coefficient is strong yet, the response variable is not fully determined by the change in the dependent variable whose relationship is being determined. In such cases, we try to explain why and determine the main variable causing the response. In this lesson, we are going to explain why such cases occur.
Vocabulary: Scatter plot It is a graph composed of points representing the relationship between data of two related variables Line of the best fit It is the line in a scatter plot that best represents the points plotted Causation Refers to a relationship where the change in response variable is explained by the change in the independent variable This can be done in the notebooks or on vocabulary cards. Whatever system you use
Vocabulary: Correlation A measure of how two variables tend to fluctuate simultaneously, either directly or inversely. This can be done in the notebooks or on vocabulary cards. Whatever system you use
Correlation and Causation Correlation implies the simultaneous variations between two variables. It measures the magnitude of simultaneous change of two variables. When two variables increase simultaneously without necessarily one affecting the change in another, the correlation is strong. Likewise, when one variable increases and another decreases to the same degree without necessarily one affecting the change in another, then two variables have a strong correlation.
As long as the change in independent variable does not affect to a significant degree in change in response variables, the relation is not causal relationship, that is, there is no causation. Causation occurs only if a change in one variable (the independent variable) leads to a change in another variables (the dependent variable).
EXAMPLE A teacher would like to know relationship between age and proficiency in athletics. To achieve this, he identifies students and measure their proficiency in athletics in terms of the time taken to complete one complete cycle around a field. The percentages was awarded based on the table below Finishing time (min) 1 – 1.2 1.3 -1.5 1.6 –1.8 1.9 -2.1 2.2-2.4 2.5-2.7 Score (%) 95 90 85 80 75 70 Finishing time (min) 1 – 1.2 1.3 -1.5 1.6 –1.8 1.9 -2.1 2.2-2.4 2.5-2.7 Score (%) 95 90 85 80 75 70
Based on the above, the following was collected Finishing time (min) 2.8-3.0 3.1 -3.3 3.4 –3.6 3.7-3.9 Score (%) 65 60 55 50 Age 10 11 12 13 15 16 Score 60 65 70 75 85 95
Draw a scatter plot Determine the equation of the line Determine the correlation coefficient Explain your answer above and if it goes hand in hand with the real cause of proficiency in athletics. Solution Plotting the points in the table above, we have the following graph
Solution Plotting the points in the table above, we have the following graph
From the analysis above, we find that the correlation between the two variables is very strong. This implies that as the age increases, the proficiency in athletics increases. However, this is not true since age is not major determinant of the proficiency in athletics rather, the genetic make up of a person and the period of training. Therefore, age and proficiency in athletics do not have a causal relationship. In this cases, correlation does not imply causation. Solution Plotting the points in the table above, we have the following graph
homework Identify two variables where correlation implies causation Identify two variables where two where correlation does not imply causation.
Answers homework These should be variables chosen on the basis of direct effect and response. Example, distance covered and time taken. These should be variables that varies with respect to one another however, one does not necessarily affect the change in another. Example, Distance travelled and the lifetime of a vehicle.
THE END