Download presentation
Presentation is loading. Please wait.
1
Exploring Data Chapter 1
2
Patterns from Histogram A
Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value
3
Histogram A example center: 35, spread: 25 to 45
4
Histogram A example center, spread
5
Patterns from Histogram B
Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value Shape: overall appearance of distribution
6
Histogram B example skewed right
7
Histogram B example skewed left
8
Histogram B example symmetrical, mound shaped
9
Histogram B example uniform
10
Histogram B example bimodal
11
Patterns from Histogram C
Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value Shape: overall appearance of distribution Unusual features: gaps/clusters and outliers
12
Histogram C example roughly symmetrical with gaps at 30 and 40
13
Histogram C example uniform with possible outlier at 5
14
Displaying Distributions with Graphs
categorical versus quantitative categorical: bar graphs, pie charts quantitative: dotplots, histograms, stemplots, boxplots
15
Frequency Distributions
A frequency distribution is a table that displays the categories, frequencies, relative frequencies and/or cumulative relative frequencies. The frequency for a particular category is the number of observed responses that fall into that category. The corresponding relative frequency is the fraction or proportion of observed responses in the category. The cumulative relative frequency is the fraction or proportion of observed responses in all categories so far including the current.
16
Creating Histograms The difficulty with continuous data is that there are no natural categories. We must define our categories, or class intervals. The quantity often gives a rough estimate for an appropriate number of intervals. Or using Sturgis's Rule is to take classes, rounded to the nearest integer.
17
Example Exit Name Miles 1 Ohio Gateway * 16 Carlisle 25 1A New Castle
8 17 Gettysburg Pike 9.8 2 Beaver Valley 3.4 18 Harrisburg West Shore 5.9 3 Cranberry 15.6 19 Harrisburg East 5.4 4 Butler Valley 10.7 20 Lebanon-Lancaster 5 Allegheny Valley 8.6 21 Reading 19.1 6 Pittsburgh 8.9 22 Morgantown 12.8 7 Irwin 10.8 12 Downingtown 13.7 New Stanton 8.1 24 Valley Forge 14.3 9 Donegal 15.2 Norristown 6.8 10 Somerset 19.2 26 Fort Washington 11 Bedford 35.6 27 Willow Grove 4.4 Breezewood 15.9 28 Philadelphia 8.4 13 Fort Littleton 18.1 29 Delaware Valley 6.4 14 Willow Hill 9.1 30 15 Blue Mountain 12.7
18
Stemplots Median: 62 6 7 5 4 3 2 8 9 1 Spread: from 22 to 91 4 8 5
8 5 2 6 1 2 5 8 3 5 Fairly symmetrical 2 5 No unusual features 7 5 5 Key: 2|2 means 22 wpm
19
Alfred Hitchcock Stemplot
13 12 11 10 9 8 1 9 5 Key: 8|1 means 81 minutes
20
Split Stemplot Similar to a histogram, we want to avoid too many
data points in a small range ages of which a sample of 35 American mothers first gave birth 4 3 2 1 Key: 1|4 means 14 years old
21
Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3 1 4 2 Key: 1|4 means 14 years old
22
Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3H 2H 1H 4L 3L 2L 1L 4 3 2 1 Key: 1|4 means 14 years old
23
Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3H 2H 1H 4L 3L 2L 1L 4 3 2 1 Key: 1|4 means 14 years old
24
Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3H 2H 1H 4L 3L 2L 1L 4 Key: 1|4 means 14 years old
25
Back to Back Stemplots 9 3 8 4 1 6 5 4 3 2 1 4 5 4 5 2 Babe Ruth Roger Maris 4 9 6 3 6 3 Key: 4 | 1 means 41
26
Babe Ruth vs. Roger Maris
Generally, we can see that Babe Ruth hit more home runs than Roger Maris. The center of Babe Ruth is higher at 46 than Roger Maris at 24.5 home runs. Roger Maris has an outlier at 61 while Ruth has no outliers. Ruth has a higher spread from 22 to 60 than Maris from 8 to 39 if we exclude the outlier. Both distributions are fairly symmetrical.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.