Chapter 2 Summarizing and Graphing Data

Slides:



Advertisements
Similar presentations
Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2.
Advertisements

Chapter Two Organizing and Summarizing Data
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 2 Picturing Variation with Graphs.
Chapter 2 Summarizing and Graphing Data
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 2-4 Statistical Graphics.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Slide 1 Spring, 2005 by Dr. Lianfen Qian Lecture 2 Describing and Visualizing Data 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data.
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
Chapter Organizing and Summarizing Data © 2010 Pearson Prentice Hall. All rights reserved 3 2.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 2-1.
1 Probabilistic and Statistical Techniques Lecture 3 Dr. Nader Okasha.
Chapter 2 Presenting Data in Tables and Charts
© 2010 Pearson Prentice Hall. All rights reserved Organizing and Summarizing Data Graphically.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
CHAPTER 2 ORGANIZING AND GRAPHING DATA. Opening Example.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive Statistics
Sexual Activity and the Lifespan of Male Fruitflies
Introduction to Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Welcome to Data Analysis and Interpretation
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Course Topics Simple statistical methods for data analysis using Excel. descriptive statistics, an introduction to statistical inference, and linear regression.
Chapter Organizing and Summarizing Data © 2010 Pearson Prentice Hall. All rights reserved 3 2.
Chapter 2 Summarizing and Graphing Data
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Edited by.
Descriptive Statistics: Tabular and Graphical Methods
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
2.1 Organizing Qualitative Data
Lecture 2 Graphs, Charts, and Tables Describing Your Data
Chapter Two Organizing and Summarizing Data 2.2 Organizing Quantitative Data I.
Organizing Data Section 2.1.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Chapter 2 Describing Data.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Probabilistic and Statistical Techniques 1 Lecture 3 Eng. Ismail Zakaria El Daour 2010.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Elementary Statistics Eleventh Edition Chapter 2.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 2-2 Frequency Distributions.
When data is collected from a survey or designed experiment, they must be organized into a manageable form. Data that is not organized is referred to as.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Statistics For Managers 4 th.
Chap 2-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course in Business Statistics 4 th Edition Chapter 2 Graphs, Charts, and Tables.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
© Copyright McGraw-Hill CHAPTER 2 Frequency Distributions and Graphs.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Chapter 2 Summarizing and Graphing Data
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Organizing and Summarizing Data 2.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Copyright 2011 by W. H. Freeman and Company. All rights reserved.1 Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 2 Tables.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 2 Section 2 – Slide 1 of 37 Chapter 2 Section 2 Organizing Quantitative Data.
Chapter 2 Summarizing and Graphing Data  Frequency Distributions  Histograms  Statistical Graphics such as stemplots, dotplots, boxplots, etc.  Boxplots.
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Descriptive Statistics: Tabular and Graphical Methods
Chapter 2 Summarizing and Graphing Data
3 2 Chapter Organizing and Summarizing Data
3 2 Chapter Organizing and Summarizing Data
Sexual Activity and the Lifespan of Male Fruitflies
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Organizing, Displaying and Interpreting Data
Essentials of Statistics 4th Edition
Presentation transcript:

Chapter 2 Summarizing and Graphing Data

Recall: The 2 Types of data variables:

2.1 Graphs for qualitative variables Bar graphs (frequency and relative frequency) Pie charts Pareto

Graphs for qualitative variables The values of a qualitative or categorical variable are labels. The distribution of a categorical variable lists the count or percentage of individuals in each category. Counts: 212 168 20 A sample of 400 wireless internet users.

Wireless internet users Male 288 (72%) Female 112 (28%) Total 400 (100%)

Frequency Distribution (or Frequency Table) lists each category of data and the number of occurrences for each category of data.

Frequency Distribution Ages of Best Actresses Original Data Frequency Distribution

Lower Class Limits are the smallest numbers that can actually belong to different classes Lower Class Limits

Upper Class Limits are the largest numbers that can actually belong to different classes Upper Class Limits

Class Midpoints Class Midpoints 25.5 35.5 45.5 55.5 65.5 75.5 can be found by adding the lower class limit to the upper class limit and dividing the sum by two Class Midpoints 25.5 35.5 45.5 55.5 65.5 75.5

Class Width is the difference between two consecutive lower class limits or two consecutive lower class boundaries Editor: Substitute Table 2-2 Class Width 10

Construct a frequency distribution of the color of plain M&Ms. EXAMPLE Organizing Qualitative Data into a Frequency Distribution The data on the next slide represent the color of M&Ms in a bag of plain M&Ms. Construct a frequency distribution of the color of plain M&Ms. 12

Frequency table 13

The relative frequency is the proportion (or percent) of observations within a category and is found using the formula: A relative frequency distribution lists the relative frequency of each category of data. 2-14 14

EXAMPLE. Organizing Qualitative Data into a Relative EXAMPLE Organizing Qualitative Data into a Relative Frequency Distribution Use the frequency distribution obtained in the prior example to construct a relative frequency distribution of the color of plain M&Ms. 15

Relative Frequency 0.2222 0.2 0.1333 0.0667 0.1111 2-16 16

Bar Graphs A bar graph is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis.

Use the M&M data to construct a frequency bar graph and EXAMPLE Constructing a Frequency and Relative Frequency Bar Graph Use the M&M data to construct a frequency bar graph and a relative frequency bar graph. 2-18 18

2-19 19

20

Actresses example 28/76 = 37% 30/76 = 39% etc. Total Frequency = 76

Frequency bar graph The horizontal scale represents the classes of data values the vertical scale represents the frequencies 20 30 40 50 60 70 80

Relative Frequency Graph Has the same shape and horizontal scale as the bar graph, but the vertical scale is marked with relative frequencies instead of actual frequencies

Interpreting Frequency Distributions In later chapters, there will be frequent reference to data with a normal distribution. One key characteristic of a normal distribution is that it has a “bell” shape. The frequencies start low, then increase to some maximum frequency, then decrease to a low frequency. The distribution should be approximately symmetric.

Example: “bell” shape

EXAMPLE Comparing Two Data Sets The following data represent the marital status (in millions) of U.S. residents 18 years of age or older in 1990 and 2006. Draw a side-by-side relative frequency bar graph of the data. Marital Status 1990 2006 Never married 40.4 55.3 Married 112.6 127.7 Widowed 13.8 13.9 Divorced 15.1 22.8 26

Marital Status in 1990 vs. 2006 1990 Relative Frequency 2006 27

Define the categorical variables Another Example: On the morning of April 10, 1912 the Titanic sailed from the port of Southampton (UK) directed to NY. Altogether there were 2,201 passengers and crew members on board. This is the table of the survivors of the famous tragic accident. Survived Dead Male Female First class 62 141 118 4 Second class 25 93 154 13 Third class 88 90 422 106 Crew members 192 20 670 3 Define the categorical variables

Bar chart representing the data in the table above (in percentages)

A Pareto chart is a bar graph where the bars are drawn in decreasing order of frequency or relative frequency. 2-30 30

Pareto Chart 2-31 31

Pie Chart A pie chart is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category.

EXAMPLE Constructing a Pie Chart The following data represent the marital status (in millions) of U.S. residents 18 years of age or older in 2006. Draw a pie chart of the data. Marital Status Frequency Never married 55.3 Married 127.7 Widowed 13.9 Divorced 22.8 33

Other example: A graph depicting qualitative data as slices of a pie

2.2 Graphs for quantitative variables: Histograms (discrete data and continuous data) Stem-and-leaf plots Time series Dot plots Distributions

Histogram: Example: CEO salaries Forbes magazine published data on the best small firms in 1993. These were firms with annual sales of more than five and less than $350 million. Firms were ranked by five-year average return on investment. The data extracted are the age and annual salary of the chief executive officer for the first 60 ranked firms. (Data at http://lib.stat.cmu.edu/DASL/DataArchive.html ) Salary of chief executive officer (including bonuses), in $thousands 145 621 262 208 362 424 339 736 291 58 498 643 390 332 750 368 659 234 396 300 343 536 543 217 298 1103 406 254 862 204 206 250 21 298 350 800 726 370 536 291 808 543 149 350 242 198 213 296 317 482 155 802 200 282 573 388 250 396 572 Histogram on CEO salaries

Drawing a histogram Construct a distribution table: Define class intervals or bins (Choose intervals of equal width!) Count the percentage of observations in each interval End-point convention: left endpoint of the interval is included, and the right endpoint is excluded, i.e. [a,b[ Draw the horizontal axis. Construct the blocks: Height of block = percentages! The total area under an histogram must be 100%

Percentage= (frequency/total)x100 Class intervals Frequency Use left end-point Percentage= (frequency/total)x100 Class intervals 0-100 2 2/59x100=3.39 600-700 3 5.08 100-200 4 4/59x100=6.78 700-800 200-300 18 30.50 800-900 6.78 300-400 14 23.73 900-1000 400-500 1000-1100 1 1.70 500-600 6 10.18 Total 59 100%

30.50% 23.73% 3.39% 1.70% The area of each block represents the percentages of cases in the corresponding class interval (or bin).

Remarks A histogram represents percent by area. The area of each block represents the percentages of cases in the corresponding class interval. The total area under a histogram is 100% There is no fixed choice for the number of classes in a histogram: If class intervals are too small, the histogram will have spikes; If class intervals are too large, some information will be missed. Use your judgment! Typically statistical software will choose the class intervals for you, but you can modify them. Let's try various binning levels.

Example: Smoking In a Public Health Service study, a histogram was plotted showing the number of cigarettes smoked per day by each subject (male current smokers), as shown below. The density is marked in parentheses. The class intervals include the left endpoint, but not the right. The percentage who smoked less than two packs a day but at least a pack, is around (note: there are 20 cigarettes in a pack.) 1.5% 15% 30% 50% The percent who smoked at least a pack a day is around 1.5% 15% 30% 50% The percent who smoked at least 3 packs a day is around 0.25 of 1% 0.5 of 1% 10% The percent who smoked 20 cigarettes a day is around 0.35 of 1% 0.5 of 1% 1.5% 3.5% 10%

Answers: The percentage who smoked less than two packs a day but at least a pack, is given by (note: there are 20 cigarettes in a pack.) the area of the third block: 1.5x(40-20)=1.5x20=30% The percent who smoked at least a pack a day is given by the area of the third and fourth blocks: 30+0.5x40=50% The percent who smoked at least 3 packs a day is the area of the block for number of cigarettes greater or equal to 60. This is half of the fourth block: 10% The percent who smoked 20 cigarettes a day: use the left endpoint convention, so 20 belongs to the third block. The answer is 1.5%.

Using histograms for comparisons Fuel economy for model year 2001 compact and two-seater cars (Table 1.8 pg 38) City Consumption Highway consumption

Stemplot (or Stem-and-Leaf Plot) Represents data by separating each value into two parts: the stem (leftmost digits) and the leaf (the last rightmost digit) Example: a data value of 147 would have 14 as the stem and 7 as the leaf.

To make a Stemplot:

Example:

Advantage of Stem-and-Leaf Diagrams over Histograms Once a frequency distribution or histogram of continuous data is created, the raw data is lost (unless reported with the frequency distribution), however, the raw data can be retrieved from the stem-and-leaf plot. 49

Dot plots A dot plot is drawn by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed. 2-50 50

EXAMPLE Drawing a Dot Plot The following data represent the number of available cars in a household based on a random sample of 50 households. Draw a dot plot of the data. 3 0 1 2 1 1 1 2 0 2 4 2 2 2 1 2 2 0 2 4 1 1 3 2 4 1 2 1 2 2 3 3 2 1 2 2 0 3 2 2 2 3 2 1 2 2 1 1 3 5 Data based on results reported by the United States Bureau of the Census. 51

2-52 52

Examining distributions Purpose of graph: to understand data better Histograms and Stemplots display the main features of a distribution similarly. Features to be observed: Modes (how many?) Symmetry vs skewness Outliers

2-54 54

EXAMPLE Identifying the Shape of the Distribution Identify the shape of the following histogram which represents the time between eruptions at Old Faithful. 55

Time-Series Graphs Data that have been collected at different points in time

Time-Series Graphs Data that have been collected at different points in time

Example:

Time series graph:

Time series graph with seasonal variation:

Other types of graphs: Frequency Polygon Ogive (cumulative frequencies) Scatter Plot (to relate two variables)

Frequency polygons The class midpoint is found by adding consecutive lower class limits and dividing the result by 2. A frequency polygon is drawn by plotting a point above each class midpoint on a horizontal axis at a height equal to the frequency of the class. After the points for each class are plotted, draw straight lines between consecutive points. 2-64 64

2-65 Time between Eruptions (seconds) Class Midpoint Frequency Relative Frequency 670 – 679 675 2 0.0444 680 – 689 685 690 – 699 695 7 0.1556 700 – 709 705 9 0.2 710 – 719 715 720 – 729 725 11 0.2444 730 – 739 735 2-65 65

Frequency Polygon Time (seconds) 2-66 66

Practice

CO2 emission levels in the world: Burning fuel in power plants or motor vehicles emits carbon dioxide (CO2) which contributes to global warming. The table in the next slide displays CO2 emissions per person from countries with populations at least 20 millions. Questions: Why do you think we choose to measure emissions per person rather than total CO2 emissions for each country? Display the data of the table in a graph. Describe the shape, center, and spread of the distribution. Which countries are outliers? Make a Stemplot, then A Histogram.

2.3 3.9 17 0.2 1.8 16 2.5 1.4 1.7 6.1 10 0.9 1.2 7.3 3.8 3.6 9.1 0.3 9.7 8.8 4.6 3.7 1.0 0.1 0.7 0.8 8 10.2 11 8.1 6.8 2.8 7.6 9 19.9 4.8 5.1 0.5

Answer: (a) Totals emissions would almost certainly be higher for very large countries; for example, we would expect that even with great attempts to control emissions, China (with over 1 billion people) would have higher total emissions than the smallest countries in the data set.

Answer: (stemplot) (b) Graph representation of the data: 1) Stemplot: 0 0 0 1 1 2 2 2 2 3 3 5 7 8 9 9 1 0 2 4 7 8 2 3 5 5 8 3 6 7 8 9 9 4 6 8 5 1 6 1 8 7 3 6 8 0 1 8 9 0 1 7 10 0 2 11 12 13 14 15 16 17 18 19 9

Answer: (histogram) (b)-continued: Graph representation of the data: 2) Histogram: (For example, using Excel – Note: in Excel, the convention is ‘right point belongs in bin, left point out’): (Demo in class) Summary of steps: - Find min and max of data - Choose binning - From Menus: Tools, Data Analysis, Histograms - Define: Input range, Bin range, Output range - Check Chart output. - Click OK. - Adjust width between bars (right-click on bars, format data series, options, set gap width to zero).

Answer: (histogram) (b)-continued: Histogram: min max 19.9 Bin Frequency 2 18 4 9 6 3 8 5 10 12 14 16 1 20 22 Interpretation of graphs: The graph is not symmetric. There is a strong right skew with a high peak at low metric tons per person, The three highest countries (the U.S., Canada, and Australia) appear to be outliers; apart from those countries, the distribution is spread from 0 to 11 metric tons per person (see table).