Download presentation
Presentation is loading. Please wait.
1
Scatter plots & Association
Statistics is about … variation. Recognize, quantify and try to explain variation. Variation in contents of cola cans can be explained, in part, by the type of cola in the cans. Statistics is about variation. We wish to recognize, quantify and try to explain variation. We have done some of this in our example of weights of contents of cans of cola. We saw that there was variation in the weights. With the second histogram (intervals of 5 grams) we saw that there were two distinct mounds indicating the possibility of two groups. It turned out that there were two groups, diet and regular. Once we introduced this second (explanatory) variable we were able to see that diet colas tended to weigh less than regular colas.
2
Scatter plots & Association
Response variable – variable of primary interest. Explanatory variable – variable used to try to explain variation in the response. When we have two variables, we will distinguish between the two by identifying the one of primary interest as the response variable and the other as an explanatory variable. The explanatory variable will be used to try to explain variation on the response. In our cola example, our response variable was weight of contents (g) and the explanatory variable was a categorical variable, type of cola (Diet or Regular).
3
Scatter plots & Association
When both the response and the explanatory variables are quantitative, display them both in a scatter plot. Look for a general pattern of association. When both variables are quantitative, we can display the relationship between the explanatory and response variable in a scatter plot.
4
Scatter plots & Association
Example: Tar (mg) and carbon monoxide (mg) in cigarettes. y, Response: CO (mg). x, Explanatory: Tar (mg). Cases: 25 brands of cigarettes. Who? Brands of cigarettes What? Tar (mg) and nicotine (mg) We will use the Tar value (which is usually given on the package of cigarettes) as the explanatory variable for the nicotine content (something that is not advertised).
5
Scatter plot When constructing a scatter plot always put the explanatory variable on the horizontal (X) axis and the response on the vertical (Y) axis.
6
Positive Association Above average values of CO are associated with above average values of Tar. Below average values of CO are associated with below average values of Tar. Positive association or a general positive trend. As one variable increases (decreases) the other variable tends to increase (decrease). The variables move in the same direction.
7
Scatter plots & Association
Example: Outside temperature and amount of natural gas used. Response: Natural gas (1000 ft3). Explanatory: Outside temperature (o C). Cases: 26 days.
8
Negative Association A negative association occurs when as one variable increases (decreases) the other variable tends to decrease (increase). The variables move in opposite directions.
9
Negative Association Above average values of gas are associated with below average temperatures. Below average values of gas are associated with above average temperatures.
10
Correlation Linear Association
How closely do the points on the scatter plot represent a straight line? The correlation coefficient gives the direction of the linear association and quantifies the strength of the linear association between two quantitative variables. Correlation is a very special kind of association. If one variable tends to move linearly with another variable then that linear association can be quantified by the correlation between the two variables. Correlation applies only to linear association between two quantitative variables.
11
Correlation Standardize y Standardize x
In order to look at correlation we have to look at the explanatory and response variables on standardized scales.
12
ZxZy > 0 The green points have the property that the two standardized variables (zx and zy) have the same sign, either both positive (upper right) or both negative (lower left). The red points have the property that the two standardized variables (zx and zy) have opposite signs, if one is positive the other is negative. Multiplying the standardized scores for each point gives the contribution to the calculation of the measure of the correlation (linear association) between the two variables. Dividing the sum of the products by n – 1 produces the Pearson Product Moment Correlation Coefficient, r. ZxZy > 0
13
Correlation Coefficient
If we substitute the formulas for zx and zy, we get the second expression for the formula for the correlation coefficient. This is something we don’t want to calculate by hand and so we will leave this to JMP to compute for us.
14
Correlation Conditions
Correlation applies only to quantitative variables. Correlation measures the strength of linear association. Outliers can distort the value of the correlation coefficient.
15
Correlation Coefficient
Tar and CO r =
16
Correlation Coefficient
There is a strong positive correlation, linear association, between the tar content and carbon monoxide content of the various cigarette brands.
17
JMP Analyze – Multivariate methods – Multivariate Y, Columns Tar (mg)
CO (mg) The blue triangle next to a variable (column) in JMP indicates that it is a quantitative variable. When both variables are quantitative then computing the correlation is appropriate.
18
JMP does not ask which variable is the response and which is the explanatory variable so it puts each variable in both roles. The plot we are interested in is the lower left because Nicotine is on the vertical axis and Tar is on the horizontal axis. Note that the correlation between Tar and Nicotine or between Nicotine and Tar is the same at
19
Correlation Properties
The sign of r indicates the direction of the association. The value of r is always between –1 and +1 Correlation has no units. Correlation is not affected by changes of center or scale. The value of r = –1 is a perfect negative linear association (all points in the scatter plot would fall on a straight line with a negative slope). The value of r = +1 is a perfect positive linear association (all points in the scatter plot would fall on a straight line with a positive slope).
20
Correlation Cautions “Correlation” and “Association” are different.
Correlation – specific (linear). Association – vague (trend). Don’t correlate categorical variables.
21
Correlation Cautions Don’t confuse correlation with causation.
There is a strong positive correlation between the number of crimes committed in communities and the number of 2nd graders in those communities. Beware of lurking variables. A mantra for this, and subsequent, chapters is: “Correlation is not causation.”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.