Download presentation
Presentation is loading. Please wait.
Published byMagdalene Casey Modified over 6 years ago
1
Chapter 7: Scatterplots, Association, and Correlation
2
Woman President? Regularly, since 1937, the Gallop Poll has asked likely US voters whether they would vote for a qualified woman for president if their preferred political party nominated one. Are people more likely to say yes to this question now then they were 70 years ago? Has the increase been consistent? Were there periods where there was no increase, or even a decrease?
3
Scatterplot of Survey % of Voters Saying Yes Year
4
Scatterplots Show patterns, trends, relationships, and “extraordinary values” Compare the association between bivariate (two variables) data visually
5
Association Direction – describes the direction of the data
Negative – a pattern that would have a negative slope Positive – a pattern that would have a positive slope Form – describes the shape of the scatterplot Linear – straight Curved No pattern Strength – tight single stream or vague cloud “weak, moderate, strong” Unusual Features – outliers
6
Scatterplot Logistics
Graphed on a Cartesian plane with an x-axis and a y-axis ALWAYS display quantitative variables ALWAYS include units in your label
7
Drug dosage (assuming the drug is a pain reliever) and degree of pain relief
The association is likely to be strong, positive and curved. Assuming, of course, that the drug is an effective pain reliever, as the dosage increases, the degree of pain relief will increase. Eventually, the association is likely to level off, until no further pain relief is possible, since the pain will be gone.
8
Calories consumed and weight loss.
The association is likely to be moderate, negative and linear As fewer calories are consumed, more weight is likely to be lost. The association will not be strong, since some people lose weight easier than others, and there are other variables involved like overall health, exercise and beginning weight.
9
Hours of sleep and score on a test.
The association is likely weak, positive and possibly linear. Generally, a well-rested person is expected to score higher on a test. The relationship is weak, since there are other variables involved. Maybe a person got less sleep because they were up studying.
10
Shoe size and grade point average.
There is no association between shoe size and GPA. The scatterplot is likely to be randomly scattered.
11
Time for a mile run and age.
The association between time for a mile run and age is likely to be moderate and curved, with no dominant direction. The very young will likely have high run times. Run times are likely to be the lowest for people in their late teens or early twenties. Older people are likely to have high run times.
12
Age of car and cost of repairs.
The association between age of car and cost of repairs is positive, moderate, and linear. As cars get older, they usually require more repairs.
13
Roles for Variables Does it matter what variable I make “x” and what variable I make “y?” Explanatory Variable (or predictor variable) – plotted on the x-axis Response Variable – plotted on the y-axis Assign explanatory and response variables based on how you think the data will act
14
Use the calculator to do the work for you!
15
Scatterplots on the Calculator
Under 2nd y = , choose the scatterplot, your x-list for your x-axis data, and your y-list for your y-axis data. Zoom for optimal image
17
The explanatory variable is the height. Height determines the weight.
The response variable is the weight. *Pay attention to which L1, L2, L3 or L4 you place in the x and y variable lists.*
18
Direction – describes the direction of the data
Negative – a pattern that would have a negative slope Positive – a pattern that would have a positive slope Form – describes the shape of the scatterplot Linear – straight Curved No pattern Strength – tight single stream or vague cloud “weak, moderate, strong” Unusual Features – outliers There is a moderate, positive linear relationship between male height and male weight. Taller males are generally heavier. Likewise, there is a moderate, positive, linear relationship between female height and female weight. Taller females are generally heavier. There is one female who is much taller and heavier than the others, but this female’s attributes fit the overall pattern, although her weight is a bit higher than we would expect, based on the overall relationship.
19
You must look at the graph.
How linear does it look? Do not say correlation unless it is linear! In your sentences say association.
27
Standardizing Scatterplots
To standardize scatterplots, covert BOTH variables to z-scores:
28
Correlation Measured from the standardized scatterplot
Measures the strength of the linear association between two quantitative variables. Correlation Coefficient: Conditions that must be met: Quantitative Variables Condition Straight Enough Condition Outlier Condition
29
Calculating r To turn on correlation (you should only need to do this once or if you change your batteries) above the 0 2nd CATALOG then scroll down to DiagnosticOn Press enter twice and the calculator should tell you Done.
30
More Calculating r Now that the diagnostics are on: Go to the CALC menu under STAT and choose 8: LinReg (a + bx)
31
Correlation Properties
The sign of a correlation coefficient gives the direction of the association Correlation is always between -1 and 1 and can be exactly equal to -1 and 1, but it’s extremely rare in real life Correlation treats x and y symmetrically – the correlation of x with y is the same as the correlation of y with x
32
More Correlation Properties
Correlation has no units Correlation is not affected by changes in the center or the scale of either variable (because it is dependent upon z-scores which wouldn’t change with scale changes) Correlation measures the strength of the linear association of two variables Correlation is sensitive to outliers
33
How Strong is Strong? “Weak,” “Moderate,” and “Strong” are often used to categorize correlation, but there is not exact science to naming the strength of correlation Name the strength within the context of the data
34
Correlation Tables Assets Sales Market Value Profits Cash Flow
Employees 1.000 0.746 0.682 0.879 0.602 0.814 0.968 0.641 0.855 0.970 0.989 0.594 0.924 0.818 0.762 0.787
35
Correlation Tables Compact and give a lot of summary information at a glance Efficient way to start an analysis of a large data set, but also dangerous because it does not check for linearity or outliers Diagonal cells are always exactly 1
36
Straightening Scatterplots
When a scatterplot shows a bent form that consistently increases or decreases, we can often straighten the form of the plot by re-expressing one or both variables
37
Example of Straightening Scatterplots
Some camera lenses have an adjustable aperture, the hole that lets the light in. The size of the aperture is expressed in a number called the f/stop. Each increase of one f/stop corresponds to a halving of the light that is allowed to come through. When you halve the shutter speed, you cut down the light, so you have to open the aperture one notch. A table of recommended shutter speeds and f/stops for a camera lists the relationship like this:
38
Straightening Scatterplots
Shutter Speed: 1/1000 1/500 1/250 1/125 1/60 1/30 1/15 1/8 f/stop: 2.8 4 5.6 8 11 16 22 32 Make a scatterplot of f/stop vs. Shutter Speed (with f/stop as the response variable). Calculate the correlation coefficient. Is this an appropriate measure of association? Now, re-express the variable f/stop by squaring each data value. What do we notice? The second plot looks much more nearly straight. The form of the plot is now straight, so the correlation is now an appropriate measure of association.
39
Another Way to Create New Lists…
When we want to square the f/stop data, we can do so in the STATS menu or in the main screen: STO > allows you to store data into a variable or a list. Here, we stored the squares of our list FSTOP into L1
40
What Can Go Wrong? Don’t say correlation when you mean association
Don’t correlate categorical variables Be sure the association is linear when using correlation Beware of outliers
41
More Things That Can Wrong
“Correlation does not imply causation!” Watch out for lurking variables – a variable other than x and y that simultaneously affects both variables and accounts for the correlation between the two
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.