Reading a scatterplot Examples: –Mars rocks (sulfate is measured as a percentage & redness is measured as a ratio of red to blue in light spectra) –American Association of University Professors (cases are different academic disciplines)
–SAT score versus length of essay –GPA versus SAT
Correlation coefficient 1.Point of averages: (average of variable 1, average of variable 2) 2.SD’s of both variables These describe the center and spread of the data.
Figure 1. Car ownership in Anytown, by household income
Correlation coefficient r = correlation coefficient Definition: measure of linear association r is always between -1 and 1 A positive value of r means there is a positive slope of the data – both variables increase together. A negative value of r means there is a negative slope of the data – as one variable increases the other decreases or vice versa.
Figure 2. Strong linear relationship of variables
Figure 3. Scattered data points
Figure 4. Very low or zero correlation
Figure 5. Data widely spread
The relationship between 2 variables can be summarized by: 1.Average of the x-values 2.SD of the x-values 3.Average of the y-values 4.SD of the y-values 5.r
The SD line passes through the point of averages and through all of the points which are an equal number of SD’s away from the average for both variables. The slope of the SD line is ± + for a positive association - for a negative association
Computing r Convert each value of each variable into standard units Take the average of the products Example: x: 3, 4, 5, 8, 10 y: 12, 10, 7, 6, 2
xydeviation(x)deviation(y)z (x)z (y)product of z's