Download presentation
Presentation is loading. Please wait.
1
Chapter 1: Exploring Data
Section 1.1 Displaying Distributions with Graphs The Practice of Statistics, 3rd edition - For AP* STARNES, YATES, MOORE
2
Section 1.1 Displaying Distributions with Graphs
Learning Objectives After this section, you should be able to… CONSTRUCT and INTERPRET bar graphs and pie charts RECOGNIZE “good” and “bad” graphs CONSTRUCT and INTERPRET two-way tables DESCRIBE relationships between two categorical variables ORGANIZE statistical problems
3
The Three Rules of Data Analysis
The three rules of data analysis won’t be difficult to remember: Make a picture—things may be revealed that are not obvious in the raw data. These will be things to think about. Make a picture—important features of and patterns in the data will show up. You may also see things that you did not expect. Make a picture—the best way to tell others about your data is with a well-chosen picture. Slide 3- 3
4
Categorical Variables
Categorical variables place individuals into one of several groups or categories. The values of a categorical variable are labels for the different categories The distribution of a categorical variable lists the count or percent of individuals who fall into each category Example, page 39 Frequency Table Format Count of Stations Adult Contemporary 1556 Adult Standards 1196 Contemporary Hit 569 Country 2066 News/Talk 2179 Oldies 1060 Religious 2014 Rock 869 Spanish Language 750 Other Formats 1579 Total 13838 Relative Frequency Table Format Percent of Stations Adult Contemporary 11.2 Adult Standards 8.6 Contemporary Hit 4.1 Country 14.9 News/Talk 15.7 Oldies 7.7 Religious 14.6 Rock 6.3 Spanish Language 5.4 Other Formats 11.4 Total 99.9 Variable Notice the roundoff error Count Percent Round off Error Values
5
Analyzing Categorical Data
Displaying categorical data Frequency tables can be difficult to read. Sometimes is is easier to analyze a distribution by displaying it with a bar graph or pie chart. Analyzing Categorical Data Frequency Table Format Count of Stations Adult Contemporary 1556 Adult Standards 1196 Contemporary Hit 569 Country 2066 News/Talk 2179 Oldies 1060 Religious 2014 Rock 869 Spanish Language 750 Other Formats 1579 Total 13838 Relative Frequency Table Format Percent of Stations Adult Contemporary 11.2 Adult Standards 8.6 Contemporary Hit 4.1 Country 14.9 News/Talk 15.7 Oldies 7.7 Religious 14.6 Rock 6.3 Spanish Language 5.4 Other Formats 11.4 Total 99.9
6
Bar Graph (Charts) A bar graph (chart) displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. A bar chart stays true to the area principle. Slide 3- 6
7
Bar Graphs (Charts) (cont.)
A relative frequency bar chart displays the relative proportion of counts for each category. A relative frequency bar chart also stays true to the area principle. Replacing counts with percentages Slide 3- 7 7
8
Top 10 causes of deaths in the United States 2001
Bar graph sorted by rank Easy to analyze Sorted alphabetically Much less useful
9
Analyzing Categorical Data
Graphs: Good and Bad Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. Our eyes react to the area of the bars as well as height. Be sure to make your bars equally wide. Avoid the temptation to replace the bars with pictures for greater appeal…this can be misleading! Alternate Example Analyzing Categorical Data Alternate Example: The following ad for DIRECTV has multiple problems. See how many your students can point out. First, the heights of the bars are not accurate. According to the graph, the difference between 81 and 95 is much greater than the difference between 56 and 81. Also, the extra width for the DIRECTV bar is deceptive since our eyes respond to the area, not just the height. Alternate Example This ad for DIRECTV has multiple problems. How many can you point out?
10
Pie Charts When you are interested emphasizing each category's relation to the whole, use a PIE CHART (also called CIRCLE GRAPH) Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is proportional to the fraction of the whole in in each category. Slide 3- 10
11
Stemplots Another simple graphical display for small data sets is a stemplot. (Also called a stem-and-leaf plot.) Stemplots give us a quick picture of the distribution while including the actual numerical values. How to make a stemplot: Separate each observation into a stem (all but the final digit) and a leaf (the final digit). Write all possible stems from the smallest to the largest in a vertical column and draw a vertical line to the right of the column. Write each leaf in the row to the right of its stem. Arrange the leaves in increasing order out from the stem. Provide a key that explains in context what the stems and leaves represent.
12
Stemplots These data represent the responses of 20 female AP Statistics students to the question, “How many pairs of shoes do you have?” Construct a stemplot. 50 26 31 57 19 24 22 23 38 13 34 30 49 15 51 Stems 1 2 3 4 5 Add leaves 4 9 Order leaves 4 9 Add a key Key: 4|9 represents a female student who reported having 49 pairs of shoes.
13
Stemplots When data values are “bunched up”, we can get a better picture of the distribution by splitting stems. Two distributions of the same quantitative variable can be compared using a back-to-back stemplot with common stems. Females Males 50 26 31 57 19 24 22 23 38 13 34 30 49 15 51 14 7 6 5 12 38 8 10 11 4 22 35 Females 333 95 4332 66 410 8 9 100 7 Males 0 4 1 2 2 2 3 3 58 4 5 1 2 3 4 5 “split stems” Key: 4|9 represents a student who reported having 49 pairs of shoes.
14
HW # Read pg 37-46 pg 46 #1a&b only, 2,3,5
15
Read pg pg 46 #1a&b only, 2,3,5
16
Read pg pg 46 #1a&b only, 2,3,5
17
Can you find the errors in these two graphs?
Chapter 1: Exploring Data Section 1.1 Histograms and Dotplots The Practice of Statistics, 3rd edition - For AP* STARNES, YATES, MOORE
18
Number of Goals Scored Per Game by the 2012 US Women’s Soccer Team
Dotplots One of the simplest graphs to construct and interpret is a dotplot. Each data value is shown as a dot above its location on a number line. How to make a dotplot: Draw a horizontal axis (a number line) and label it with the variable name. Scale the axis from the minimum to the maximum value. Mark a dot above the location on the horizontal axis corresponding to each data value. Number of Goals Scored Per Game by the 2012 US Women’s Soccer Team 2 1 5 3 4 13 14
19
Examining the Distribution of a Quantitative Variable
The purpose of a graph is to help us understand the data. After you make a graph, always ask, “What do I see?” How to Examine the Distribution of a Quantitative Variable In any graph, look for the overall pattern and for striking departures from that pattern. Describe the overall pattern of a distribution by its: Shape Center Spread Note individual values that fall outside the overall pattern. These departures are called outliers. Don’t forget your SOCS!
20
Describing Shape When you describe a distribution’s shape, concentrate on the main features. Look for rough symmetry or clear skewness. A distribution is roughly symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Symmetric Skewed-left Skewed-right
21
Comparing Distributions
Some of the most interesting statistics questions involve comparing two or more groups. Always discuss shape, center, spread, and possible outliers whenever you compare distributions of a quantitative variable. Compare the distributions of household size for these two countries. Don’t forget your SOCS!
22
Stemplots versus histograms
Stemplots are quick and dirty histograms that can easily be done by hand, therefore very convenient for smaller data sets. However, they are rarely found in scientific or laymen publications. When might you NOT want to use a Stemplot?
23
Displaying quantitative data: Histograms
Displays counts or percents Shows trend of data User defines number of classes Good for large data sets Does not display actual data values The bars have the same width and always touch (the edges of the bars are on class boundaries which are described below). The width of a bar represents a quantitative variable x, such as age rather than a category. The height of each bar indicates frequency. Quantitative variables often take many values. A graph of the distribution may be clearer if nearby values are grouped together. The most common graph of the distribution of one quantitative variable is a histogram.
24
Displaying Quantitative Data
How to Make a Histogram Divide the range of data into classes of equal width. Find the count (frequency) or percent (relative frequency) of individuals in each class. Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should touch, unless a class contains no individuals. Displaying Quantitative Data To find the class width, First compute: Largest value - Smallest Value Desired number of classes Increase the value computed to the next highest whole, number even if the first value was a whole number. This will ensure the classes cover the data.
25
How to create a histogram
It is an iterative process – try and try again. What bin size should you use? Not too many bins with either 0 or 1 counts Not overly summarized that you loose all the information Not so detailed that it is no longer a summary rule of thumb: start with 5 to10 bins Look at the distribution and refine your bins (There isn’t a unique or “perfect” solution)
26
Using the TI-83/84 to make histograms
The TI-83/84 can be used to make histograms, and will allow you to change the location and widths of the ranges. Turn to Page 59 in your textbook and follow the directions in the Technology Corner. Use the presidential data from Exercise 1.11 (pg. 57) 26
27
Using the TI-83 to make histogramsI
You can change the size and location of the ranges by using the Window button Use the scale key to change the number of classes. Enter the CLASS WIDTH. Press the Graph button to see the results 27
28
Be sure to choose classes all the same width.
Histogram Tips Be sure to choose classes all the same width. Use your judgment in choosing classes to display the shape. Too few classes will give a 'skyskaper' graph; Too many will produce a 'pancake' graph.
29
Same data set Not summarized enough Too summarized
30
Describing the Shape of a Histogram
Does the histogram have a single, central hump or several separated bumps? Humps in a histogram are called modes. A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal. Slide 4- 30
31
Humps and Bumps (cont.) A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform: Slide 4- 31
32
Most common distribution shapes
Symmetric distribution A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. Skewed distribution A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side. Complex, multimodal distribution Not all distributions have a simple overall shape, especially when there are few observations.
33
Anything Unusual? Don’t forget to make note of any unusual features denoted in the shape of the distribution. Sometimes it’s the unusual features that tell us something interesting or exciting about the data. You should always mention any stragglers, or outliers, that stand off away from the body of the distribution. Are there any gaps in the distribution? If so, we might have data from more than one group. Slide 4- 33
34
Histograms are Similar to Bar Graphs and so:
A relative frequency histogram displays the percentage of cases in each bin instead of the count. Relative frequency histograms are good for comparing distributions of unequal counts Slide 4- 34
35
Notice the shape does not change when comparing frequency and relative frequency Histograms
AP Statistics, Section 1.1, Part 4 35 35
36
Displaying Quantitative Data
Using Histograms Wisely Here are several cautions based on common mistakes students make when using histograms. Displaying Quantitative Data Cautions Although they are similar, don’t confuse histograms and bar graphs. Don’t use counts (in a frequency table) or percents (in a relative frequency table) as data. Use percents instead of counts on the vertical axis when comparing distributions with different numbers of observations. Just because a graph looks nice, it’s not necessarily a meaningful display of data.
37
HW pg 54 #7, 8, 11, 12
38
HW pg 54 #7, 8, 11, 12
40
Describe the data. Be ready to discuss.
2 minute quick write. Describe the data. Be ready to discuss. Chapter 1: Exploring Data Section 1.1 Relative Frequency and Cumulative Frequency The Practice of Statistics, 3rd edition - For AP* STARNES, YATES, MOORE
41
Definition An ogive is a graph that represents cumulative frequencies or cumulative relative frequencies of a data set. It is constructed from a cumulative frequency histogram or from a cumulative relative frequency histogram. For our purposes, we will create just the cumulative relative frequency ogive, as it enables us to estimate percentiles.
42
Percentiles In statistics, a percentile is the value of a variable below which a certain percent of observations fall. For example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found. The term percentile and the related term percentile rank are often used in the reporting of scores from norm-referenced tests. For example, if a score is in the 86th percentile, it is higher than 85% of the other scores. The median of a data set is also known as the 50th percentile.
43
Constructing an Ogive Here is the Frequency Distribution for the attendance (in thousands) at Super Bowl Data for games I to XXXVI (1 to 36): Notice the two extra columns. Class Limits Boundaries Freq. Relative Frequency Cumulative Frequency Cumulative Relative Frequency 62-69 3 .08 70-77 19 .53 78-85 8 .22 86-93 1 .03 94-101 2 .06
44
Cumulative Values Cumulative Frequencies and Cumulative Relative Frequencies represent “running totals” for the two columns which precede them. Below is a “complete” frequency distribution.
45
Step 1: Draw a Cumulative Relative Frequency Histogram.
46
Step 2: Draw the Ogive Curve
47
Step 3: Remove the Histogram (optional)
48
Step 4: Estimate Percentiles
About 77,000 people.
49
Estimate the 80th Percentile
About 88,500 people.
50
What percentile would represent 70,000 in attendance?
About the 15th Percentile.
51
Line graphs: time plots
In a time plot, time always goes on the horizontal, x axis. We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series: A trend is a rise or fall that persist over time, despite small irregularities. A pattern that repeats itself at regular intervals of time is called seasonal variation.
52
R Retail price of fresh oranges over time
Time is on the horizontal, x axis. The variable of interest—here “retail price of fresh oranges”— goes on the vertical, y axis. This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.
53
A time plot can be used to compare two or more data sets covering the same time period.
The pattern over time for the number of flu diagnoses closely resembles that for the number of deaths from the flu, indicating that about 8% to 10% of the people diagnosed that year died shortly afterward from complications of the flu.
54
Scales matter How you stretch the axes and choose your scales can give a different impression. A picture is worth a thousand words, BUT There is nothing like hard numbers. Look at the scales.
55
HW pg #13,16,18
56
HW pg #13,16,18
58
Section 1.2 Displaying Quantitative Data with Graphs
Summary In this section, we learned that… You can use a dotplot, stemplot, or histogram to show the distribution of a quantitative variable. When examining any graph, look for an overall pattern and for notable departures from that pattern. Describe the shape, center, spread, and any outliers. Don’t forget your SOCS! Some distributions have simple shapes, such as symmetric or skewed. The number of modes (major peaks) is another aspect of overall shape. When comparing distributions, be sure to discuss shape, center, spread, and possible outliers. Histograms are for quantitative data, bar graphs are for categorical data. Use relative frequency histograms when comparing data sets of different sizes.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.