II. Graphical Displays of Data Like many other things, statistical analysis can suffer from garbage in, garbage out This often happens because no one bothered.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

C. D. Toliver AP Statistics
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Dot Plots & Box Plots Analyze Data.
Measures of Position - Quartiles
Understanding and Comparing Distributions
Understanding and Comparing Distributions 30 min.
Descriptive Statistics Summarizing data using graphs.
Beginning the Visualization of Data
CHAPTER 4 Displaying and Summarizing Quantitative Data Slice up the entire span of values in piles called bins (or classes) Then count the number of values.
1 Chapter 1: Sampling and Descriptive Statistics.
Chapter 5: Understanding and Comparing Distributions
It’s an outliar!.  Similar to a bar graph but uses data that is measured.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Percentiles Def: The kth percentile is the value such that at least k% of the measurements are less than or equal to the value. I.E. k% of the measurements.
Understanding and Comparing Distributions
Understanding and Comparing Distributions
Statistics: Use Graphs to Show Data Box Plots.
5 Number Summary Box Plots. The five-number summary is the collection of The smallest value The first quartile (Q 1 or P 25 ) The median (M or Q 2 or.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Numerical Descriptive Measures
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
REPRESENTATION OF DATA.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
What is a box and whisker plot? A box and whisker plot is a visual representation of how data is spread out and how much variation there is. It doesn’t.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Displaying and Exploring Data Unit 1: One Variable Statistics CCSS: N-Q (1-3);
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
Have out your calculator and your notes! The four C’s: Clear, Concise, Complete, Context.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Copyright © 2009 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Chapter 2 Describing Data.
Confidential2 Warm Up Find the mean, mode (s), and median for each set of data 1.90, 92, 94, 91, 90, 94, 95,98 93, 90 and 94, , 9.1, 8.9, 9.0,9.3,
Categorical vs. Quantitative…
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
1 Further Maths Chapter 2 Summarising Numerical Data.
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
Chapter 8 Making Sense of Data in Six Sigma and Lean
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Understanding and Comparing Distributions.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 5 – Slide 1 of 21 Chapter 3 Section 5 The Five-Number Summary And Boxplots.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
1 Chapter 4 Numerical Methods for Describing Data.
Chapter 5: Boxplots  Objective: To find the five-number summaries of data and create and analyze boxplots CHS Statistics.
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Probability & Statistics Box Plots. Describing Distributions Numerically Five Number Summary and Box Plots (Box & Whisker Plots )
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
Exploratory Data Analysis
ISE 261 PROBABILISTIC SYSTEMS
4. Interpreting sets of data
Chapter 6 ENGR 201: Statistics for Engineers
Box and Whisker Plots Algebra 2.
2.6: Boxplots CHS Statistics
Topic 5: Exploring Quantitative data
Numerical Measures: Skewness and Location
Describing Distributions of Data
Measure of Center And Boxplot’s.
Measure of Center And Boxplot’s.
Displaying and Summarizing Quantitative Data
Chapter 1 Warm Up .
Organizing, Summarizing, &Describing Data UNIT SELF-TEST QUESTIONS
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Presentation transcript:

II. Graphical Displays of Data Like many other things, statistical analysis can suffer from garbage in, garbage out This often happens because no one bothered to look at the data. Simple data displays can convey a lot of information. A. Stem-and-Leaf Displays Purpose: To provide a basis for evaluating the “shape” of the data without the loss of any information.

1. Basic Stem and Leaf Display This technique is best illustrated by an example. Pencil lead is actually a ceramic matrix filled with graphite. A measure of the quality of many ceramic bodies is the porosity. Porosity is a measure of the void space in the body. The following data set represents the result of a porosity test on “good” pencil lead

Let the numbers to the left of the decimal point be the “stems”. Let the numbers to the right of the decimal point be the “leaves”. Stem Leaves 11: 12: 13: 14: Representing the value 12.1: Stem Leaves 11: 12: 1 13: 14:

Representing the first row of data: Stem Leaves 11: 7 12: : 5 14: The entire data set: Stem Leaves This display is the “raw” stem-and-leaf display.

Usually, we “refine” the stem-and-leaf display. First, we order the leaves on each stem. Stem Leaves

Next, we add the depth information. The depth represents how far from the closest end of the data set a particular point is. For example, the data value 11.7 is the smallest observation; thus, it has a depth of 1. What is the depth of the data value 14.3? What is the depth of the data value 12.6? The completed stem-and-leaf gives the depth of the last value on the stems for the top part of the display. It gives the depth for the first value of the stems for the bottom part of the display.

We do not give the depth for the stem which contains the middle value of the data set. In this case, the depth information would be ambiguous. An aid for finding the depth is to report the number of leaves on each stem. Until we reach the middle stem, the depth for any stem is just the depth reported from the previous stem plus the number of leaves on the stem. Stem Leaves No. Depth

2. Stretched Stem-and-Leaf Consider a “stretched” stem and leaf display. Basically it splits each simple stem into two. Let X* be X0 – X4 Let X be X5 – X9 No. Depth 11* * * * Two other extensions of the basic stem-and-leaf display: the squeezed stem-and-leaf display side-by-side or back-to-back stem-and-leaf display

3. Reading a Data Display Goal of a data display: let the data speak to you? Like any conversation, some points are obvious, others come only from questioning the data. Some obvious questions: What is the ``center'' of the data? What is the ``spread'' of the data. More subtle questions: Do the data follow some pattern? Is the pattern symmetric?

If the pattern is not symmetric, is it right or left tailed? A right tailed or right skewed pattern: A left tailed or left skewed pattern:

Are there multiple peaks? What do multiple peaks suggest? Are there outliers?

B. Box Plots Purpose: To give a quick display of some important features of the data. Note: The box plot represents a distillation of the data. The stem-and-leaf display only loses the time order of the data. The box plot loses some of the information in the data. However, under several very reasonable assumptions, the information lost is of little or no value. 1. Preliminaries The box plot is based upon: the median the quartiles To find these quantities, we first must order the data set.

Let $y_1, y_2, \cdots, y_n$ denote our data set. Rearrange the data in ascending order, and let the new data set be denoted by where Note: the stem and leaf with ordered leaves is such an ordered data set. a. The median The median,, is the middle value of the ordered data set and is a measure of the “center”. Literally, the median splits the data set into two equal parts.

Let denote the “location” of the median in the ordered data set. If n is odd, then is an integer; thus, If n is even, then contains the fraction 1/2. In such a case, the median is the average of the two values “closest” to the “center”.

First Example: The following five values represent the ash content of pencil lead.

Second example: the porosities of good pencil lead Note: the stem and leaf is an ordered data set No. Depth 11* * * *

b. The upper and lower quartiles While the median divides the data into two parts of equal numbers, the quartiles (Q 1, Q 3 ) divide the date into four parts. Note: the second quantile (Q 2 ) is the median. Let be the location of the first and third quantiles.

If is an integer, then If is not an integer, then the quartile is the average of the two values closest to it.

First example: The “good” pencil lead data

Second example: breaking strength of yarn

2. The Box Plot Itself We shall illustrate this technique through the porosity data for the “good” pencil lead. 1.Construct a horizontal scale, marked conveniently, which covers at least the range of the data 2.Find, Q 1, Q 3

Use Q 1 and Q 3 to make a rectangular box above the scale. Draw a vertical line across the box for the median.

3.Determine the “Step” The Interquartile Range is a measure of variability or spread defined by Q 3 - Q 1 We define the stepsize by Step = (1.5)(Q 3 – Q 1 )\ For the good pencil lead data, Q 3 - Q 1 = 13.7 – 12.6 = 1.1 Step = 1.5(1.1) = 1.65

4.Determine the “inner fences” The fences help us isolate possible outliers The inner fences define the bounds for the unquestionably good data The Upper Inner fence (UIF) is UIF = Q 3 + Step The Lower Inner Fence (LIF) is LIF = Q 1 – Step For the good pencil lead data UIF = = LIF = 12.6 – 1.65 = 10.95

5. Locate the most extreme data points which are on or within the inner fences. These data values are called the adjacents. Draw vertical lines at these points, and connect these points to the “box” with a horizontal line. This line is called a whisker. For the good pencil lead data, all of the values fall within the inner fences. Thus, the adjacents are: 11.7 and 14.3

6. Calculate the “outer fences” The outer fences allow us to discriminate between “mild” and “extreme” outliers. Data values between the inner and outer fences are considered mild. Data values beyond the outer fences are considered extreme. The Upper Outer Fence (UOF) is UOF = Q 3 + 2(step) The Lower Outer Fence (LOF) is LOF = Q 1 - 2(step)

For the good pencil lead data UOF = (1.65) = 17.0 LOF = (1.65) = Mark possible “outliers” We use a ◦ to denote the mild outliers. We use a to denote the extreme outliers. Note: No outliers occur in our example.

Parallel Box Plots allow us to compare two or more sets of data. The Key: must use a common scale. Place box plots above each other or side-by-side. ____________ o | |_____|______| | o o ____________ | |_____|______| | o ____________ | |_____|______| | | | scale

Box Plots can also be used to analyze designed experiments. When there are categorical factors, the design can be “unstripped” and analyzed using parallel box plots. Example: Consider an experiment to study the influence of operating temperature and glass type on light output.

The resulting box plot is given below. The box plots show that a higher temperature yields higher light output. Also at the low temperature, glass type does not affect light output, but at the high temperature, glass type A produces higher light output.

Importance of Box Plots: Boxplots allow us to tell at a glance: 1. center 2. spread 3. outliers

Other important data displays: histograms time plots We generally use software to generate all data displays. The instructor should do an class demonstration using the software selected by the instructor.