Chapter 2 Summarizing and Graphing Data

Slides:



Advertisements
Similar presentations
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Advertisements

CHAPTER 1 Exploring Data
Chapter 3 Graphic Methods for Describing Data. 2 Basic Terms  A frequency distribution for categorical data is a table that displays the possible categories.
Chapter 2 Summarizing and Graphing Data
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Slide 1 Spring, 2005 by Dr. Lianfen Qian Lecture 2 Describing and Visualizing Data 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 2-1.
Chapter Two Organizing and Summarizing Data 2.1 Organizing Qualitative Data.
Sexual Activity and the Lifespan of Male Fruitflies
CHAPTER 1: Picturing Distributions with Graphs
Objectives (BPS chapter 1)
Chapter 2 Summarizing and Graphing Data
Slide 2-2 Copyright © 2008 Pearson Education, Inc. Chapter 2 Organizing Data.
Slide 2-2 Copyright © 2012, 2008, 2005 Pearson Education, Inc. Chapter 2 Organizing Data.
Chapter 1: Exploring Data
Probabilistic and Statistical Techniques 1 Lecture 3 Eng. Ismail Zakaria El Daour 2010.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Elementary Statistics Eleventh Edition Chapter 2.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 3 Graphical Methods for Describing Data.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 2-2 Frequency Distributions.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Section 2-1 Review and Preview. 1. Center: A representative or average value that indicates where the middle of the data set is located. 2. Variation:
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.2 Displaying Quantitative.
+ Chapter 1: Exploring Data Section 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Copyright 2011 by W. H. Freeman and Company. All rights reserved.1 Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 2 Tables.
+ Chapter 1: Exploring Data Section 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Slide 2-1 Copyright © 2012, 2008, 2005 Pearson Education, Inc. Chapter 2 Organizing Data.
Chapter 2 Summarizing and Graphing Data  Frequency Distributions  Histograms  Statistical Graphics such as stemplots, dotplots, boxplots, etc.  Boxplots.
1.2 Displaying Quantitative Data with Graphs.  Each data value is shown as a dot above its location on the number line 1.Draw a horizontal axis (a number.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 2 Summarizing and Graphing Data
Chapter 1: Exploring Data
Warm Up.
recap Individuals Variables (two types) Distribution
CHAPTER 1: Picturing Distributions with Graphs
Chapter 2 Summarizing and Graphing Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1: Picturing Distributions with Graphs
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Organizing, Displaying and Interpreting Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Essentials of Statistics 4th Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Chapter 2 Summarizing and Graphing Data Overview Frequency Distributions Statistical Graphics (stemplots, dotplots, etc) Histograms

Frequency Distributions The amount of data collected in some real-world situations can be overwhelming. By suitably organizing data, we can often make a large and complicated set of data more compact and easier to understand. Here, we discuss Frequency Distributions which involves putting data into groups rather than treating each observation individually.

Grouping Quantitative Data Days to Maturity for Short-Term Investments The table below displays the number of days to maturity for 40 short-term investments.

Grouping Quantitative Data Getting a clear picture of the data in the table is difficult.

Grouping Quantitative Data By grouping the data into categories, or classes, we can make the data easier to comprehend.

Grouping Quantitative Data The first step is to decide on the classes. A convenient way to group these data is by 10s.

Grouping Quantitative Data Since the shortest maturity period is 36 days, our first class is for maturity periods from 30 days up to, but not including, 40 days. Smallest Observation

Grouping Quantitative Data The longest maturity period is 99 days, so grouping by 10s results in the seven classes. Largest Observation Smallest Observation

Grouping Quantitative Data The final step for grouping the data is to find the number of investments in each class.

Grouping Quantitative Data The final step for grouping the data is to find the number of investments in each class.

Grouping Quantitative Data The final step for grouping the data is to find the number of investments in each class.

Grouping Quantitative Data In the previous example, we used a commonsense approach to grouping data into classes. Some of that common sense can be used as guidelines for grouping. Three of the most important guidelines are the following.

Grouping Quantitative Data The number of classes should be small enough to provide an effective summary but large enough to display the relevant characteristics of the data (in general between 5 and 20 ). Each observation must belong to one, and only one, class. Whenever feasible, all classes should have the same width.

Frequency Distributions The number of observations that fall into a particular class is called the frequency or count of that class. A table that provides all classes and their frequencies is called a frequency distribution.

Frequency Distributions The frequency distribution in this example is

Relative-Frequency Distributions In addition to the frequency of a class, we are often interested in the percentage of a class.

Relative-Frequency Distributions The relative frequency is the percent of observations within a category and is found using the formula: A relative frequency distribution lists the relative frequency of each category of data.

Terms Used in Grouping Lower class limits: are the smallest numbers that can actually belong to different classes. Upper class limits: are the largest numbers that can actually belong to different classes. Class boundaries: are the numbers used to separate classes, but without the gaps created by class limits. Class midpoints: are the midpoints of the classes and can be found by adding the lower class limit to the upper class limit and dividing the sum by two. Class width: is the difference between two consecutive lower class limits.

Reasons for Constructing Frequency Distributions Large data sets can be summarized. We can gain some insight into the nature of data. We have a basis for constructing important graphs.

Another Example of Frequency Distribution Illustrating these Concepts

Frequency Distribution: Ages of Best Actresses Frequency Distribution Original Data Frequency Distribution

Lower Class Limits Lower Class Limits

Upper Class Limits Upper Class Limits

Class Boundaries Class Boundaries Editor: Substitute Table 2-2 20.5 30.5 40.5 50.5 60.5 70.5 80.5

Class Midpoints Class Midpoints 25.5 35.5 45.5 55.5 65.5 75.5

Class Width Editor: Substitute Table 2-2 Class Width 10

Relative Frequency Distribution 28/76 = 37% 30/76 = 39% etc. Total Frequency = 76

Cumulative Frequency Distribution Frequencies

Frequency Tables

Single-Value Grouping If the data is discrete, the categories of data will be the observations (when total number of them is a relatively small number). Consider the following example: The following data represent the number of available cars in a household based on a random sample of 50 households. Construct a frequency and relative frequency distribution.

Single-Value Grouping 3 0 1 2 1 1 1 2 0 2 4 2 2 2 1 2 2 0 2 4 1 1 3 2 4 1 2 1 2 2 3 3 2 1 2 2 0 3 2 2 2 3 2 1 2 2 1 1 3 5

Grouping Qualitative Data The concepts of class limits and midpoints are not appropriate for qualitative data. For instance, if we have data that categorize people as male or female, then the classes are “male” and “female.” We can still group qualitative data and compute frequencies and relative frequencies for classes. For qualitative data, the classes coincide with the observed values of the corresponding variable.

Grouping Qualitative Data Example: The data on the next slide represent the color of M&Ms in a bag of plain M&Ms. Construct a frequency distribution of the color of plain M&Ms.

Yellow Orange Brown Green Blue Red

Yellow Orange Brown Green Blue Red

Frequency Tables

Distributions of Data Sets The distribution of a data set is a table, graph, or formula that provides the values of the observations and how often they occur.

Displaying Distributions Qualitative Variable Quantitative Variable Frequency Table Stem & Leaf Plot Frequency Table Histogram Bar Graph Pie Chart Dot Plot

Displaying Distributions of Qualitative Data

Example: how well educated are 30-something young adults Example: how well educated are 30-something young adults? Here is the distribution of the highest level of education for people aged 25 to 34 years:

Bar Graph

Label each category of data on a horizontal axis and the frequency or relative frequency of the category on the vertical axis.

A rectangle of equal width is drawn for each category whose height is equal to the category's frequency or relative frequency.

The bar graph quickly compares the sizes of the five education groups The bar graph quickly compares the sizes of the five education groups. The heights of the bars show the counts in the five categories.

Pie Chart

The pie chart helps us see what part of the whole each group forms The pie chart helps us see what part of the whole each group forms. For example, the “HS grad” slice makes up 30.7% of the pie because 30.7% of young adults have only a high school education.

Displaying Distributions of Quantitative Data

Stemplots A stemplot (also called a stem-and-leaf plot) gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. Stemplots work best for small numbers of observations that are all greater than 0.

How to Make a Stemplot Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. Write each leaf in the row to the right of its stem, in increasing order out from the stem.

Here are the numbers of home runs that Babe Ruth hit in each of his 15 years with the New York Yankees, 1920 to 1934.

Possible Modifications Stemplots do not work well for large data sets, where each stem must hold a large number of leaves. Possible Modifications We can increase the number of stems in a plot by splitting each stem into two: one with leaves 0 to 4 and the other with leaves 5 to 9. When the observed values have many digits, it is often best to round the numbers to just a few digits before making a stemplot.

To make a stemplot of this distribution, First round the purchases to the nearest dollar. Then use tens of dollars as stems and dollars as leaves.

To see the shape of a distribution more clearly, turn the stemplot on its side so that the larger values lie to the right.

Remarks About Stemplots Stemplots display the actual values of the observations. This feature makes stemplots awkward for large data sets. The construction of a stemplot is a quick an easy way to sort data ( arrange them in order), and sorting is required for some other statistical procedures.

Dot Plot Consists of a graph in which each data value is plotted as a point (or dot) along a scale of values

Histograms A histogram is a bar graph of the frequency or the relative frequency distribution of a quantitative data. The horizontal scale represents classes of data values and the vertical scale represents frequency or the relative frequency. The heights of the bars correspond to frequency or the relative frequency values, and the bars are drawn adjacent to each other (without gaps).

Example: Percent of Hispanics in the adult population, by state (2000)

Step 1: Divide the range of the data into classes of equal width. The data in the table range from 0.6 to 38.7 We choose 8 intervals of length 5%, that is, We choose our classes as follows

Example: Percent of Hispanics in the adult population, by state (2000)

Step 2: Count the number of individuals in each class. These counts are called frequencies A table of frequencies for all classes is a frequency table.

Step 3: Draw the histogram. First mark the scale for the variable whose distribution you are displaying on the horizontal axis. In this case, “percent of adults who are Hispanic” The vertical axis contains the scale of counts.

Two Types of Histograms

Histograms: Another Example Days to Maturity for Short-Term Investments Frequency histogram

Histograms: Another Example Days to Maturity for Short-Term Investments Frequency histogram

Histograms: Another Example Days to Maturity for Short-Term Investments Relative-frequency histogram

Histograms: Another Example Days to Maturity for Short-Term Investments Relative-frequency histogram

Examining distributions In any graph of data, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a distribution by its shape, center, and spread. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern.

Distributions Shapes An important aspect of the distribution of a quantitative data set is its shape. The shape of a distribution plays a role in determining the appropriate method of statistical analysis. To identify the shape of a distribution, the best approach usually is to use a smooth curve that approximates the overall shape.

Distributions Shapes In later chapters, there will be frequent reference to data with a normal distribution. One key characteristic of a normal distribution is that it has a “bell” shape. The frequencies start low, then increase to some maximum frequency, then decrease to a low frequency. The distribution should be approximately symmetric.

For instance, the following table displays a frequency and a relative-frequency distribution for the heights of the 3264 female students who attend a Midwestern college. This is a good example of normal distribution

The figure displays a relative-frequency histogram for the same data The figure displays a relative-frequency histogram for the same data. Notice the bell shape. Included is a smooth curve that approximates the overall shape of the distribution.

The figure displays a relative-frequency histogram for the same data The figure displays a relative-frequency histogram for the same data. Notice the bell shape. Included is a smooth curve that approximates the overall shape of the distribution.

Both the histogram and the smooth curve show that this distribution of heights is bell shaped but the smooth curve makes seeing the shape a little easier

Advantages of using smooth curves We do not need to worry about minor differences in shape. We can concentrate on overall patterns, which, in turn, allows us to classify most distributions by designating relatively few shapes. and most importantly, allows us to use all the tools from Calculus.

Some Common Distributions Shapes

The distribution of household sizes is right skewed. The relative-frequency histogram for household size in the United States The distribution of household sizes is right skewed.

The distribution of Babe Ruth’s home run counts is symmetric and unimodal. Range is from 22 to 60. There are no outliers. The distribution of supermarket spending is skewed to the right. Range is from $3 to $93. Outliers?

Population and Sample Distributions The data set obtained by observing the values of a variable for an entire population is called population data or census data. The data set obtained by observing the values of a variable for a sample of the population is called sample data. To distinguish their distributions, we use the terminology population distribution and sample distribution.

Population and Sample Distributions For a particular population and variable, sample distributions vary from sample to sample. However, there is only one population distribution, namely, the distribution of the variable under consideration on the population under consideration.

Population distribution and six sample distributions for household size

Population and Sample Distributions In practice, we usually do not know the population distribution. We can use the distribution of a simple random sample from the population to get a rough idea of the population distribution. The larger the sample the better the sample distribution will approximate the population distribution.