Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.

Similar presentations


Presentation on theme: "Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here."— Presentation transcript:

1 Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here are both scatterplots of the same data, on different scales. The second seems to be a stronger association…  So we need a measure of association independent of the graphics…

2 Use the correlation coefficient, r  The correlation coefficient is a measure of the direction and strength of a linear relationship.  It is calculated using the mean and the standard deviation of both the x and y variables.  Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.

3 The correlation coefficient r Time to swim: = 35, s x = 0.7 Pulse rate: = 140 s y = 9.5

4 Part of the calculation involves finding z, the standardized score similar to the one we used when working with the normal distribution. Standardization: Allows us to compare correlations between data sets where variables are measured in different units or when variables are different. For instance, we might want to compare the correlation between [swim time and pulse], with the correlation between [swim time and breathing rate]. You DON'T want to do this by hand. Make sure you learn how to use your calculator or the computer to find r. z for time z for pulse

5 r does not distinguish between x & y The correlation coefficient, r, treats x and y symmetrically "Time to swim" is the explanatory variable here, and belongs on the x axis. However, in either plot r is the same (r=-0.75). r = -0.75

6 Changing the units of measure of variables does not change the correlation coefficient r, because we "standardize out" the units when getting z-scores. r has no unit of measure (unlike x and y) r = -0.75 z-score plot is the same for both plots z for time z for pulse

7 r ranges from -1 to +1 r quantifies the strength and direction of a linear relationship between 2 quantitative variables. Strength: how closely the points follow a straight line. Direction: is positive when individuals with higher X values tend to have higher values of Y.

8 When variability in one or both variables decreases, the correlation coefficient gets stronger (  closer to +1 or -1).

9 No matter how strong the association, r should not be used to describe non-linear relationships - we have other methods… Note: You can sometimes transform a non-linear association to a linear form, for instance by taking the logarithm. You can then calculate a correlation using the transformed data. Correlation coefficient r describes linear relationships

10 Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers - try the Statistical Applet under Resources in the eBook on the Stats Portal… Influential points Just moving one point away from the general trend here decreases the correlation from -0.91 to -0.75

11 In this example, adding two outliers decreases r from 0.95 to 0.61. Go to the Stats Portal, under Resources, try Statistical Applets, and choose the Correlation and Regression one… put some points in the scatterplot, watch the value of r and see what happens when you put in an outlier or two…

12 Homework: Read section 2.2, pay careful attention to the properties of the correlation coefficient, r To explore how extreme outlying observations influence r, play around with the Statistical Applet on Correlation and Regression under Resources in the eBook on the Stats Portal… – Then, using the computer to draw the scatterplots and do the computations as needed, do problems #2.42 - 2.44, 2.47, 2.53, 2.55, 2.56, 2.60


Download ppt "Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here."

Similar presentations


Ads by Google