Download presentation
1
Chapter 2 Summarizing and Graphing Data
Overview Frequency Distributions Statistical Graphics (stemplots, dotplots, etc) Histograms
2
Frequency Distributions
The amount of data collected in some real-world situations can be overwhelming. By suitably organizing data, we can often make a large and complicated set of data more compact and easier to understand. Here, we discuss Frequency Distributions which involves putting data into groups rather than treating each observation individually.
3
Grouping Quantitative Data
Days to Maturity for Short-Term Investments The table below displays the number of days to maturity for 40 short-term investments.
4
Grouping Quantitative Data
Getting a clear picture of the data in the table is difficult.
5
Grouping Quantitative Data
By grouping the data into categories, or classes, we can make the data easier to comprehend.
6
Grouping Quantitative Data
The first step is to decide on the classes. A convenient way to group these data is by 10s.
7
Grouping Quantitative Data
Since the shortest maturity period is 36 days, our first class is for maturity periods from 30 days up to, but not including, 40 days. Smallest Observation
8
Grouping Quantitative Data
The longest maturity period is 99 days, so grouping by 10s results in the seven classes. Largest Observation Smallest Observation
9
Grouping Quantitative Data
The final step for grouping the data is to find the number of investments in each class.
10
Grouping Quantitative Data
The final step for grouping the data is to find the number of investments in each class.
11
Grouping Quantitative Data
The final step for grouping the data is to find the number of investments in each class.
12
Grouping Quantitative Data
In the previous example, we used a commonsense approach to grouping data into classes. Some of that common sense can be used as guidelines for grouping. Three of the most important guidelines are the following.
13
Grouping Quantitative Data
The number of classes should be small enough to provide an effective summary but large enough to display the relevant characteristics of the data (in general between 5 and 20 ). Each observation must belong to one, and only one, class. Whenever feasible, all classes should have the same width.
14
Frequency Distributions
The number of observations that fall into a particular class is called the frequency or count of that class. A table that provides all classes and their frequencies is called a frequency distribution.
15
Frequency Distributions
The frequency distribution in this example is
16
Relative-Frequency Distributions
In addition to the frequency of a class, we are often interested in the percentage of a class.
17
Relative-Frequency Distributions
The relative frequency is the percent of observations within a category and is found using the formula: A relative frequency distribution lists the relative frequency of each category of data.
18
Terms Used in Grouping Lower class limits: are the smallest numbers that can actually belong to different classes. Upper class limits: are the largest numbers that can actually belong to different classes. Class boundaries: are the numbers used to separate classes, but without the gaps created by class limits. Class midpoints: are the midpoints of the classes and can be found by adding the lower class limit to the upper class limit and dividing the sum by two. Class width: is the difference between two consecutive lower class limits.
19
Reasons for Constructing Frequency Distributions
Large data sets can be summarized. We can gain some insight into the nature of data. We have a basis for constructing important graphs.
20
Another Example of Frequency Distribution Illustrating these Concepts
21
Frequency Distribution: Ages of Best Actresses Frequency Distribution
Original Data Frequency Distribution
22
Lower Class Limits Lower Class Limits
23
Upper Class Limits Upper Class Limits
24
Class Boundaries Class Boundaries Editor: Substitute Table 2-2 20.5
30.5 40.5 50.5 60.5 70.5 80.5
25
Class Midpoints Class Midpoints 25.5 35.5 45.5 55.5 65.5 75.5
26
Class Width Editor: Substitute Table 2-2 Class Width 10
27
Relative Frequency Distribution
28/76 = 37% 30/76 = 39% etc. Total Frequency = 76
28
Cumulative Frequency Distribution
Frequencies
29
Frequency Tables
30
Single-Value Grouping
If the data is discrete, the categories of data will be the observations (when total number of them is a relatively small number). Consider the following example: The following data represent the number of available cars in a household based on a random sample of 50 households. Construct a frequency and relative frequency distribution.
31
Single-Value Grouping
32
Grouping Qualitative Data
The concepts of class limits and midpoints are not appropriate for qualitative data. For instance, if we have data that categorize people as male or female, then the classes are “male” and “female.” We can still group qualitative data and compute frequencies and relative frequencies for classes. For qualitative data, the classes coincide with the observed values of the corresponding variable.
33
Grouping Qualitative Data
Example: The data on the next slide represent the color of M&Ms in a bag of plain M&Ms. Construct a frequency distribution of the color of plain M&Ms.
34
Yellow Orange Brown Green Blue Red
35
Yellow Orange Brown Green Blue Red
36
Frequency Tables
37
Distributions of Data Sets
The distribution of a data set is a table, graph, or formula that provides the values of the observations and how often they occur.
38
Displaying Distributions Qualitative Variable Quantitative Variable Frequency Table Stem & Leaf Plot Frequency Table Histogram Bar Graph Pie Chart Dot Plot
39
Displaying Distributions of Qualitative Data
40
Example: how well educated are 30-something young adults
Example: how well educated are 30-something young adults? Here is the distribution of the highest level of education for people aged 25 to 34 years:
41
Bar Graph
42
Label each category of data on a horizontal axis and the frequency or relative frequency of the category on the vertical axis.
43
A rectangle of equal width is drawn for each category whose height is equal to the category's frequency or relative frequency.
44
The bar graph quickly compares the sizes of the five education groups
The bar graph quickly compares the sizes of the five education groups. The heights of the bars show the counts in the five categories.
45
Pie Chart
46
The pie chart helps us see what part of the whole each group forms
The pie chart helps us see what part of the whole each group forms. For example, the “HS grad” slice makes up 30.7% of the pie because 30.7% of young adults have only a high school education.
47
Displaying Distributions of Quantitative Data
48
Stemplots A stemplot (also called a stem-and-leaf plot) gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. Stemplots work best for small numbers of observations that are all greater than 0.
49
How to Make a Stemplot Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. Write each leaf in the row to the right of its stem, in increasing order out from the stem.
50
Here are the numbers of home runs that Babe Ruth hit in each of his 15 years with the New York Yankees, 1920 to 1934.
51
Possible Modifications
Stemplots do not work well for large data sets, where each stem must hold a large number of leaves. Possible Modifications We can increase the number of stems in a plot by splitting each stem into two: one with leaves 0 to 4 and the other with leaves 5 to 9. When the observed values have many digits, it is often best to round the numbers to just a few digits before making a stemplot.
52
To make a stemplot of this distribution,
First round the purchases to the nearest dollar. Then use tens of dollars as stems and dollars as leaves.
56
To see the shape of a distribution more clearly, turn the stemplot on its side so that the larger values lie to the right.
57
Remarks About Stemplots
Stemplots display the actual values of the observations. This feature makes stemplots awkward for large data sets. The construction of a stemplot is a quick an easy way to sort data ( arrange them in order), and sorting is required for some other statistical procedures.
58
Dot Plot Consists of a graph in which each data value is plotted as a point (or dot) along a scale of values
59
Histograms A histogram is a bar graph of the frequency or the relative frequency distribution of a quantitative data. The horizontal scale represents classes of data values and the vertical scale represents frequency or the relative frequency. The heights of the bars correspond to frequency or the relative frequency values, and the bars are drawn adjacent to each other (without gaps).
60
Example: Percent of Hispanics in the adult population, by state (2000)
61
Step 1: Divide the range of the data into classes of equal width.
The data in the table range from 0.6 to 38.7 We choose 8 intervals of length 5%, that is, We choose our classes as follows
62
Example: Percent of Hispanics in the adult population, by state (2000)
63
Step 2: Count the number of individuals in each class.
These counts are called frequencies A table of frequencies for all classes is a frequency table.
64
Step 3: Draw the histogram.
First mark the scale for the variable whose distribution you are displaying on the horizontal axis. In this case, “percent of adults who are Hispanic” The vertical axis contains the scale of counts.
67
Two Types of Histograms
68
Histograms: Another Example
Days to Maturity for Short-Term Investments Frequency histogram
69
Histograms: Another Example
Days to Maturity for Short-Term Investments Frequency histogram
70
Histograms: Another Example
Days to Maturity for Short-Term Investments Relative-frequency histogram
71
Histograms: Another Example
Days to Maturity for Short-Term Investments Relative-frequency histogram
72
Examining distributions
In any graph of data, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a distribution by its shape, center, and spread. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern.
73
Distributions Shapes An important aspect of the distribution of a quantitative data set is its shape. The shape of a distribution plays a role in determining the appropriate method of statistical analysis. To identify the shape of a distribution, the best approach usually is to use a smooth curve that approximates the overall shape.
74
Distributions Shapes In later chapters, there will be frequent reference to data with a normal distribution. One key characteristic of a normal distribution is that it has a “bell” shape. The frequencies start low, then increase to some maximum frequency, then decrease to a low frequency. The distribution should be approximately symmetric.
75
For instance, the following table displays a frequency and a relative-frequency distribution for the heights of the 3264 female students who attend a Midwestern college. This is a good example of normal distribution
76
The figure displays a relative-frequency histogram for the same data
The figure displays a relative-frequency histogram for the same data. Notice the bell shape. Included is a smooth curve that approximates the overall shape of the distribution.
77
The figure displays a relative-frequency histogram for the same data
The figure displays a relative-frequency histogram for the same data. Notice the bell shape. Included is a smooth curve that approximates the overall shape of the distribution.
78
Both the histogram and the smooth curve show that this distribution of heights is bell shaped but the smooth curve makes seeing the shape a little easier
79
Advantages of using smooth curves
We do not need to worry about minor differences in shape. We can concentrate on overall patterns, which, in turn, allows us to classify most distributions by designating relatively few shapes. and most importantly, allows us to use all the tools from Calculus.
80
Some Common Distributions Shapes
81
The distribution of household sizes is right skewed.
The relative-frequency histogram for household size in the United States The distribution of household sizes is right skewed.
82
The distribution of Babe Ruth’s home run counts is symmetric and unimodal.
Range is from 22 to 60. There are no outliers. The distribution of supermarket spending is skewed to the right. Range is from $3 to $93. Outliers?
83
Population and Sample Distributions
The data set obtained by observing the values of a variable for an entire population is called population data or census data. The data set obtained by observing the values of a variable for a sample of the population is called sample data. To distinguish their distributions, we use the terminology population distribution and sample distribution.
84
Population and Sample Distributions
For a particular population and variable, sample distributions vary from sample to sample. However, there is only one population distribution, namely, the distribution of the variable under consideration on the population under consideration.
85
Population distribution and six sample distributions for household size
86
Population and Sample Distributions
In practice, we usually do not know the population distribution. We can use the distribution of a simple random sample from the population to get a rough idea of the population distribution. The larger the sample the better the sample distribution will approximate the population distribution.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.