Download presentation
Presentation is loading. Please wait.
1
Describing Scatterplots
Section 3.1
2
Data Univariate data:
3
Data Univariate data: data that involve a single variable per case
This is the type of data we have been working with so far.
4
Data Bivariate data:
5
Data Bivariate data: data that involve two variables per case
6
Data Bivariate data: data that involve two variables per case
For quantitative variables, often displayed as a scatterplot.
7
Scatterplot Scatterplot shows relationship between two quantitative variables.
8
Scatterplot Scatterplot shows relationship between two quantitative variables. y vs x
9
Describing Scatterplots
Recall, for univariate data we use shape, center, and spread to summarize data.
10
Describing Scatterplots
Recall, for univariate data we use shape, center, and spread to summarize data. For bivariate data, we use shape, trend, and strength.
11
Describing Scatterplots
Describing scatterplots is a 6-step process.
12
6-Step Process Identify cases and variables
13
6-Step Process Identify cases and variables Describe overall shape
14
6-Step Process Identify cases and variables Describe overall shape
Describe trend
15
6-Step Process Identify cases and variables Describe overall shape
Describe trend Describe strength
16
6-Step Process Identify cases and variables Describe overall shape
Describe trend Describe strength Does pattern generalize to other cases?
17
6-Step Process Identify cases and variables Describe overall shape
Describe trend Describe strength Does pattern generalize to other cases? Plausible explanation for pattern – or is there a lurking variable
18
1. Identify Cases and Variables
19
1. Identify Cases and Variables
Need these to put the data in context; otherwise, data are meaningless
20
Cases and Variables Scatterplot: each point represents one case
21
Cases and Variables Scatterplot: each point represents one case
x-coordinate equal to value of one variable and y-coordinate equal to value of the other variable Describe scale (units of measurement) and range of each variable
22
2. Describe Overall Shape
23
2. Describe Overall Shape
Linearity: is pattern linear, curved, or none at all?
24
2. Describe Overall Shape
Linearity: is pattern linear, curved, or none at all? Clusters: is there just one cluster or more than one?
25
2. Describe Overall Shape
Linearity: is pattern linear, curved, or none at all? Clusters: is there just one cluster or more than one? Outliers: any striking exceptions to the pattern?
26
Shape : Linear Points do not have to form a line to have a linear pattern!
27
Shape : Linear
28
Shape: Curved
29
Shape: Curved
30
Shape: Curved
31
Shape: None
32
Shape: None
33
Outliers : Extreme x-value
34
Outliers : Extreme x-value
35
Outlier: Extreme y-value
36
Outlier: Extreme y-value
37
Outliers: Not Follow General Trend
38
Outliers: Not Follow General Trend
39
Clusters Is there just one cluster or is there more than one?
40
3. Describe Trend If as x gets larger, y tends to get larger, there is a positive trend.
41
Trend
42
Trend Think of slope!
43
Trend If as x gets larger, y tends to get smaller, there is a negative trend.
44
Trend
45
Trend If there is no shape, then there is no trend.
46
Trend
47
4. Describe Strength If the points cluster closely around an imaginary line or curve, the strength is strong.
48
Strength
49
4. Describe Strength If the points are scattered farther away from an imaginary line or curve, the strength decreases.
50
Strength
51
Strength
52
Strength
53
Variability of Strength
Is strength constant or does the strength vary?
54
Variability: Fairly Uniform
55
Variability: Fairly Uniform
Could be strong. moderate, or weak and still be fairly uniform
56
Variability: Heteroscedasticity
57
Variability: Heteroscedasticity
58
5. Pattern Generalize? Does pattern generalize to other cases or is relationship a case of “what you see is all you get?”
59
5. Pattern Generalize? Does pattern generalize to other cases or is relationship a case of “what you see is all you get?”
60
6. Plausible Explanation
Are there plausible explanations for pattern?
61
6. Plausible Explanation
Are there plausible explanations for pattern? Is it reasonable to conclude that change in one variable causes a change in the other?
62
6. Plausible Explanation
Are there plausible explanations for pattern? Is it reasonable to conclude that change in one variable causes a change in the other? Is there a third or lurking variable that might be causing both variables to change?
63
6. Plausible Explanation
64
Generalization vs Explanation
Prices of houses sold in Gainsville, Fl in one month
65
Example Turn to page 110 and look at problem P1.
66
Page 110, P1
67
Page 110, P1 b. These data are not very interesting to describe.
The x-axis shows ages 2 to 7 years, and the y-axis shows the median height of children at each age. The shape is linear, the trend is positive, and the strength is very strong. That is, the scatterplot shows a very strong positive linear trend. Students may mention that a typical child grows about 2.7 inches per year.
68
Page 110, P1 c. The linear trend could reasonably be
expected to hold for another year. However, median height could not be expected to increase at this rate to age 50, as people typically stop growing around age 20.
69
Page 110, P1 d. In the background is something called
“growing up” that happens over time during the early years of life. That is, an increase in age is associated with an increase in height.
70
Page 111, E1
71
Page 111, E1 Plot a shows a positive relationship that is
strong and linear. There is fairly uniform variation across all values of x.
72
Page 111, E1 Plot b shows a negative relationship that is
strong and linear, again with fairly uniform variation across all values of x.
73
Page 111, E1 Plot c shows a positive relationship that is
moderate and linear with fairly uniform variation across all values of x. One point lies a short distance from the bulk of the data.
74
Page 111, E1 Plot d shows a negative relationship that is
moderate and linear with fairly uniform variation across all values of x. Again, there is one outlier.
75
Page 111, E1 Plot e shows a positive relationship that
is strong and linear except for the outlier. The one outlier has dramatic influence on the strength of this relationship. There is fairly uniform variation across all values of x.
76
Page 111, E1 Plot f shows a negative relationship that is
very strong and curved. One point on the far right lies in the general pattern but far away from the remainder of the data, which accentuates the strong relationship. Another outlier lies below the bulk of the data on the left .
77
Page 111, E1 Plot g shows a negative relationship that is
strong and curved. The two points at either end of the array accentuate the curvature. There is a bit more variability among values of y for smaller values of x than for larger values of x.
78
Page 111, E1 Plot h shows a positive relationship that is
strong and curved. Again, the outlier on the extreme right accentuates the curved pattern and would have dramatic influence on where a trend line might be placed. The variability in y is fairly constant across all values of x.
79
Page 113, E5
80
Page 113, E5 a. Plots A, B, and C are the most linear. Plot
D is not linear because of the seven universities in the lower right, which may be different from the rest. Plot A, of graduation rate versus alumni giving rate, gives some impression of downward curvature. However, if you disregard the point in the upper right, the impression of any curvature disappears.
81
Page 113, E5 Plots A, B, and C all have just one cluster.
However, plot D, the plot of graduation rate versus top 10% in high school, has two clusters. Most of the points follow the upward linear trend, but the cluster of seven points in the lower right with the highest percentage of freshmen in the top 10% shows little relationship with the graduation rate.
82
Page 113, E5 Plots A and C have possible outliers. Plot A,
the plot of graduation rate versus alumni giving rate has a possible outlier in the upper right. The point is below the general trend and its x-value (but not its y-value) is unusually large. In plot C, the plot of graduation rate versus SAT 75th percentile, the points toward the upper left and the middle right should be examined because they are farther from the general trend than the other points, although neither their x-values nor their y-values are unusual. The point in the lower right of plot D should also be examined, along with the other six points nearby.
83
Page 113, E5 b. Plots A and C have similar moderate
Positive linear trends. Plot D shows wide variation in both variables, with little or no trend. Plot B is the only plot that shows a negative
84
c. Among these four variables, it appears that
the alumni giving rate is the best predictor of the graduation rate, and SAT scores (as measured by the 75th percentile) is second best. However, both of these relationships are moderate and neither is a strong predictor of graduation rate. Ranking in high school class (as measured by the top 10%) is almost useless as a predictor of college graduation rate. Plot A, of graduation rate versus alumni giving rate, owes part of the impression of a strong relationship to the point in the upper right. This plot shows some heteroscedasticity, with the graduation rate varying more with smaller alumni giving rates.
85
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.