Download presentation
Presentation is loading. Please wait.
1
Examining Relationships
Scatterplots and Correlation
2
The “W’s” When you examine the relationship between two or more variables, first ask the familiar key questions about the data Who (are the individuals described by the data) What (are the variables and in what units) Why (were the data collected, if possible) When, Where, how and by whom (were the data produced) Note: the answers to “who” and “what” are essential.
3
Response Variables – measure an outcome of a study
Explanatory Variables – helps explain or influences changes in a response variable Response (dependent variable) Explanatory (independent variable) Lurking Variable – variables in the background that can strongly influence the relationship between two variables
4
Looking at Scatterplots
Scatterplots may be the most common and most effective display for data. In a scatterplot, you can see patterns, trends, relationships, and even the occasional extraordinary value sitting apart from the others. Scatterplots are the best way to start observing the relationship and the ideal way to picture associations between two quantitative variables.
5
Looking at Scatterplots (cont.)
When looking at scatterplots, we will look for direction, form, strength, and unusual features. Direction: A pattern that runs from the upper left to the lower right is said to have a negative direction. A trend running the other way has a positive direction.
6
Looking at Scatterplots (direction)
Figure 3.1 from the text shows a negative association between the percent of high school seniors who take the SAT and their mean SAT math scores
7
Looking at Scatterplots (direction)
This figure shows a positive association between the year since 1900 and the % of people who say they would vote for a woman president. As the years have passed, the percentage who would vote for a woman has increased.
8
Looking at Scatterplots (form)
a straight line (linear) relationship, will appear as a cloud or swarm of points stretched out in a generally consistent, straight form.
9
Looking at Scatterplots (form)
Curved - the relationship curves gently, while still increasing or decreasing steadily or
10
Looking at Scatterplots (form)
Clustered – when there are clear groups within the data
11
Looking at Scatterplots (cont.)
Strength: At one extreme, the points appear to follow a single stream (whether straight, curved, or bending all over the place).
12
Looking at Scatterplots (cont.)
Strength: At the other extreme, the points appear as a vague cloud with no discernable trend or pattern: Note: we will quantify the amount of scatter soon.
13
Interpreting a Scatterplot
The scatterplot of the mean SAT Math scores in each state against the percent of that state’s high school seniors who took the SAT in The dotted lines intersect at the point (25,560), the data for Colorado. Interpret the scatterplot, look for patterns and any important deviations from the pattern. See p 176 for text soln
14
Adding Categorical Variables to Scatterplots
Dividing the states into southern and non-southern Introduces a third variable into The scatterplot. This categorical Variable only has two values.
15
Correlation Data collected from students in Statistics classes included their heights (in inches) and weights (in pounds): Here we see a positive association and a fairly straight form, although there seems to be a high outlier.
16
Correlation (cont.) How strong is the association between weight and height of Statistics students? If we had to put a number on the strength, we would not want it to depend on the units we used. A scatterplot of heights (in centimeters) and weights (in kilograms) doesn’t change the shape of the pattern:
17
Correlation (cont.) Since the units don’t matter, why not remove them altogether? We could standardize both variables and write the coordinates of a point as (zx, zy). Here is a scatterplot of the standardized weights and heights:
18
Correlation The correlation coefficient (r) gives us a numerical measurement of the direction and strength of the linear relationship between the explanatory and response variables. For the students’ heights and weights, the correlation is
19
How correlation measures the strength of a
linear relationship. Patterns closer to a straight line have correlations closer to 1 or -1.
20
Correlation Cautions Correlation measures the strength of the linear association between two quantitative variables. Before you use correlation, you must check several conditions: Quantitative Variables Condition Straight Enough Condition Outlier Condition
21
Correlation Conditions (cont.)
Quantitative Variables Condition: Correlation applies only to quantitative variables. Don’t apply correlation to categorical data masquerading as quantitative. Check that you know the variables’ units and what they measure.
22
Correlation Conditions (cont.)
Straight Enough Condition: You can calculate a correlation coefficient for any pair of variables. But correlation measures the strength only of the linear association, and will be misleading if the relationship is not linear.
23
Correlation Conditions (cont.)
Outlier Condition: Correlation is not resistant. An outlier can make an otherwise small correlation look big or hide a large correlation. It can even give an otherwise positive association a negative correlation coefficient (and vice versa). When you see an outlier, it’s often a good idea to report the correlations with and without the point.
24
Variability Correlation is given by r, indicating the strength and direction of an association. Variability is given by r2, and it indicates that 100r2 is the percent of variability in the dependent variable that can be explained by the independent variable. The remaining percent variability must then be explained by lurking variables.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.