Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Data Chapter 1.

Similar presentations


Presentation on theme: "Exploring Data Chapter 1."— Presentation transcript:

1 Exploring Data Chapter 1

2 Patterns from Histogram A
Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value

3 Histogram A example center: 35, spread: 25 to 45

4 Histogram A example center, spread

5 Patterns from Histogram B
Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value Shape: overall appearance of distribution

6 Histogram B example skewed right

7 Histogram B example skewed left

8 Histogram B example symmetrical, mound shaped

9 Histogram B example uniform

10 Histogram B example bimodal

11 Patterns from Histogram C
Center: the value that divides the observations roughly in half Spread (variability): the extent of the data from smallest to largest value Shape: overall appearance of distribution Unusual features: gaps/clusters and outliers

12 Histogram C example roughly symmetrical with gaps at 30 and 40

13 Histogram C example uniform with possible outlier at 5

14 Displaying Distributions with Graphs
categorical versus quantitative categorical: bar graphs, pie charts quantitative: dotplots, histograms, stemplots, boxplots

15 Frequency Distributions
A frequency distribution is a table that displays the categories, frequencies, relative frequencies and/or cumulative relative frequencies. The frequency for a particular category is the number of observed responses that fall into that category. The corresponding relative frequency is the fraction or proportion of observed responses in the category. The cumulative relative frequency is the fraction or proportion of observed responses in all categories so far including the current.

16 Creating Histograms The difficulty with continuous data is that there are no natural categories. We must define our categories, or class intervals. The quantity often gives a rough estimate for an appropriate number of intervals. Or using Sturgis's Rule is to take classes, rounded to the nearest integer.

17 Example Exit Name Miles 1 Ohio Gateway * 16 Carlisle 25 1A New Castle
8 17 Gettysburg Pike 9.8 2 Beaver Valley 3.4 18 Harrisburg West Shore 5.9 3 Cranberry 15.6 19 Harrisburg East 5.4 4 Butler Valley 10.7 20 Lebanon-Lancaster 5 Allegheny Valley 8.6 21 Reading 19.1 6 Pittsburgh 8.9 22 Morgantown 12.8 7 Irwin 10.8 12 Downingtown 13.7 New Stanton 8.1 24 Valley Forge 14.3 9 Donegal 15.2 Norristown 6.8 10 Somerset 19.2 26 Fort Washington 11 Bedford 35.6 27 Willow Grove 4.4 Breezewood 15.9 28 Philadelphia 8.4 13 Fort Littleton 18.1 29 Delaware Valley 6.4 14 Willow Hill 9.1 30 15 Blue Mountain 12.7

18 Stemplots Median: 62 6 7 5 4 3 2 8 9 1 Spread: from 22 to 91 4 8 5
8 5 2 6 1 2 5 8 3 5 Fairly symmetrical 2 5 No unusual features 7 5 5 Key: 2|2 means 22 wpm

19 Alfred Hitchcock Stemplot
13 12 11 10 9 8 1 9 5 Key: 8|1 means 81 minutes

20 Split Stemplot Similar to a histogram, we want to avoid too many
data points in a small range ages of which a sample of 35 American mothers first gave birth 4 3 2 1 Key: 1|4 means 14 years old

21 Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3 1 4 2 Key: 1|4 means 14 years old

22 Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3H 2H 1H 4L 3L 2L 1L 4 3 2 1 Key: 1|4 means 14 years old

23 Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3H 2H 1H 4L 3L 2L 1L 4 3 2 1 Key: 1|4 means 14 years old

24 Split Stemplot Split stemplot typically breaks each stem into
High (5-9) and Low(0-4) 3H 2H 1H 4L 3L 2L 1L 4 Key: 1|4 means 14 years old

25 Back to Back Stemplots 9 3 8 4 1 6 5 4 3 2 1 4 5 4 5 2 Babe Ruth Roger Maris 4 9 6 3 6 3 Key: 4 | 1 means 41

26 Babe Ruth vs. Roger Maris
Generally, we can see that Babe Ruth hit more home runs than Roger Maris. The center of Babe Ruth is higher at 46 than Roger Maris at 24.5 home runs. Roger Maris has an outlier at 61 while Ruth has no outliers. Ruth has a higher spread from 22 to 60 than Maris from 8 to 39 if we exclude the outlier. Both distributions are fairly symmetrical.


Download ppt "Exploring Data Chapter 1."

Similar presentations


Ads by Google