QBM117 Business Statistics

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
CHAPTER 1 Exploring Data
Measures of Dispersion
Numerically Summarizing Data
CHAPTER 4 Displaying and Summarizing Quantitative Data Slice up the entire span of values in piles called bins (or classes) Then count the number of values.
LECTURE 7 THURSDAY, 11 FEBRUARY STA291 Fall 2008.
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics: Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 2 Describing Data with Numerical Measurements
Programming in R Describing Univariate and Multivariate data.
Department of Quantitative Methods & Information Systems
Numerical Descriptive Measures
Describing distributions with numbers
Chapter 12: Describing Distributions with Numbers We create graphs to give us a picture of the data. We also need numbers to summarize the center and spread.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
REPRESENTATION OF DATA.
Objectives 1.2 Describing distributions with numbers
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Review Measures of central tendency
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Table of Contents 1. Standard Deviation
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Chapter 2 Describing Data.
Describing distributions with numbers
The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company.
Lecture 3 Describing Data Using Numerical Measures.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
1 Further Maths Chapter 2 Summarising Numerical Data.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Summary Statistics: Measures of Location and Dispersion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Using Measures of Position (rather than value) to Describe Spread? 1.
LIS 570 Summarising and presenting data - Univariate analysis.
Unit 3: Averages and Variations Part 3 Statistics Mr. Evans.
Exploratory Data Analysis
Methods for Describing Sets of Data
Chapter 3 Describing Data Using Numerical Measures
2.5: Numerical Measures of Variability (Spread)
Chapter 5 : Describing Distributions Numerically I
Chapter 6 ENGR 201: Statistics for Engineers
Chapter 3 Describing Data Using Numerical Measures
Box and Whisker Plots Algebra 2.
Numerical Measures: Skewness and Location
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Day 52 – Box-and-Whisker.
Presentation transcript:

QBM117 Business Statistics Descriptive Statistics Descriptive Measures for Grouped Data Percentiles and Box Plots

Objectives To learn how to calculate the approximate mean and standard deviation for grouped data. To introduce percentiles as another descriptive measure. To introduce the box plot as another graphical technique.

Descriptive Measures for Grouped Data In most cases, measures if locations and variability are computed by using the individual data values. Sometimes we only have data that have been grouped into a frequency distribution, and we do not have access to the raw data. It is therefore useful to be able to calculate approximate descriptive measures directly from a frequency distribution.

Approximate Mean and Standard Deviation for Grouped Data The mean and the standard deviation are the most widely used descriptive measures. And so we will look at how to calculate the approximate mean and standard deviation for grouped data. Keep in mind that by grouping the data, we have lost information, and the descriptive measures obtained from the grouped data will only approximate those of the ungrouped data.

Calculating the Approximate Mean and Standard Deviation for Grouped Data You can calculate the approximate mean and standard deviation for grouped data using the statistics mode on your calculator. We start by calculating the midpoint for each of the classes of the the frequency distribution.

We then assume that each observation in a class is assumed to be equal to the midpoint of that class. We then need to enter the data into your calculator and obtain the mean and standard deviation. This will be demonstrated by example.

Example 1 Revisit example 5 from week 1 lecture 3 (Exercise 2.41 from text). The number of items returned to a leading Brisbane retailer by its customers were recorded for 25 days.

The frequency distribution for the data is given below: Number of items Frequency >5 up to and including 10 5 >10 up to and including 15 3 >15 up to and including 20 9 >20 up to and including 25 7 >25 up to and including 30 1

We now need to calculate the midpoint of each class. Number of items Midpoint Frequency >5 up to and including 10 7.5 5 >10 up to and including 15 12.5 3 >15 up to and including 20 17.5 9 >20 up to and including 25 22.5 7 >25 up to and including 30 27.5 1

We now need to enter the data into the calculator. Enter the value 7.5 into stats mode on your calculator 5 times. New Casios: 7.5 SHIFT ; 5 M+ Older Casios: 7.5 X 5 M+ New Sharps: 7.5 2nf F , 5 M+

And then enter the value 12.5 in 3 times. And then enter the value 27.5 in once.

Once you have entered the data in, check to see that you have 25 data values. Then obtain the mean and standard deviation of these values. mean = 16.7 standard deviation = 5.89 (2d.p.) Hence the approximate mean and standard deviation for the grouped data are 16.7 and 5.89 respectively. Note that the true mean and standard deviation for the ungrouped data are 17 and 6.20 (2d.p.) respectively.

Measures of Relative Standing Measures of central tendency and dispersion are important. However they are not the only numerical measures that can be used to describe a data set. Measures of relative standing, or order statistics, give information about the position of an observation in the sample.

Median We have already looked at a measure of relative standing, the median, which is also a measure of central tendency. Recall that the median is the middle value when the data are arranged in order. Hence the median divides the data set into halves

Percentiles It is useful in some situations to know what data value has a certain percentage of the observations above or below it. This measure is know as the percentile of the data. The pth percentile is the value that has at most p% of the observations less than that value, and at most (100-p)% of the observations greater than that value.

Quartiles We have special names for the 25th, 50th and 75th percentiles. These three measures divide the data into quartiles and hence are called quartiles. The 25th percentile is known as the lower quartile, Q1. The 50th percentile is known as the middle quartile, Q2 but more commonly called the median, M. The 75th percentile is known as the upper quartile, Q3.

Calculating Percentiles Arrange the data in ascending order We find the position of the pth percentile by calculating i = (p/100) x n . If i is not an integer, round up. The next integer greater than i denotes the position of the pth percentile. If i is an integer, the pth percentile is the average of the data values in positions i and i+1.

Example 3.14 from text Calculate the quartiles for the set of measurements 7 18 12 17 29 18 4 27 30 2 4 10 21 5 8 First we need to order the data 2 4 4 5 7 8 10 12 17 18 18 21 27 29 30

The lower quartile is the 25th percentile. p = 25 n = 15 i = (p/100) x n = (25/100) x 15 = 3.75 i = 3.75 is not an integer and so we round up to 4. The lower quartile is the 4th value. 2 4 4 5 7 8 10 12 17 18 18 21 27 29 30 Hence the lower quartile is 5.

The median (middle quartile) is the 50th percentile. p = 50 n = 15 i = (p/100) x n = (50/100) x 15 = 7.5 i = 7.5 is not an integer and so we round up to 8. The median is the 8th value. 2 4 4 5 7 8 10 12 17 18 18 21 27 29 30 Hence the median is 12.

The upper quartile is the 75th percentile. p = 75 n = 15 i = (p/100) x n = (75/100) x 15 = 11.25 i = 11.25 is not an integer and so we round up to 12. The upper quartile is the 12th value. 2 4 4 5 7 8 10 12 17 18 18 21 27 29 30 Hence the upper quartile is 21.

Calculating Percentiles in Excel To calculate percentiles in Excel go to Tools Data Analysis Descriptive Statistics To produce the median select Summary Statistics. To produce the lower quartile select Kth Smallest and enter in the position of the lower quartile. To produce the upper quartile select Kth Largest and enter in the position of the upper quartile from the largest value.

Five-Number Summary In a five-number summary, the following five numbers are used to summarise the data: - Smallest data value - Lower quartile - Median - Upper Quartile - Largest data value

Example 3.14 revisited The five-number summary for the set of measurements in Example 3.14 is Min = 2 Q1 = 5 M = 12 Q3 = 21 Max = 30

Interquartile Range (IQR) The interquartile range is the difference between the upper and lower quartiles. IQR = Q3 - Q1 The interquartile range is the range of the middle 50% of the data. It is a measure of dispersion that is not sensitive to outliers.

Example 3.14 revisited Calculate the inter quartile range for the set of measurements in Example 3.14. Q1 = 5 Q3 = 21 IQR = Q3 - Q1 = 21 – 5 = 16

Box Plots Now that we have introduced quartiles, we can present one more graphical technique for quantitative data. A box plot is a graphical display of the five-number summary. It can be used to identify the central location, spread and shape of the data and identifies any possible outliers.

Constructing a Box Plot Order that data. The most efficient way to do this is to construct a stem and leaf display. Calculate the five-number summary. Draw a box with the ends of the box located at the lower and upper quartiles. Draw a vertical line I the box at the location of the median.

Identify any outliers. An outlier is any value located at a distance of more than 1.5 x IQR from the box. Draw lines extending from the box to the smallest and largest values within 1.5 x IQR , i.e. the most extreme value that is not an outlier. These lines are called whiskers. Plot any outliers individually.

Example 3.14 revisited Construct a box plot for the set of measurements 7 18 12 17 29 18 4 27 30 2 4 10 21 5 8 The five-number summary is Min = 2 Q1 = 5 M = 12 Q3 = 21 Max = 30 The inter quartile range is IQR = 16

1.5 x IQR = 1.5 X 16 = 24 Q1 – 1.5 x IQR = 5 – 24 = -19 Q3 + 1.5 x IQR = 21 + 24 = 45 There are no data values less than -19 or greater than 45. Therefore there are no outliers.

Constructing Box Plots in Excel There are instructions for constructing a box plot In Excel on page 96 of the text (pg 94 abridged). You will need to use Data Analysis Plus – the macros that come on the disk that accompanies the text.

Example 3.14 revisited Construct a box plot in Excel for the set of measurements in Example 3.14 .

Using the Box Plot to Identify Skewness If the data set is perfectly symmetric then the box plot will be symmetric. The length of the left whisker will equal the length of the right whisker. The median will divide the box in half.

If the data is positively skewed, the length of the right whisker will be greater than the length of the left whisker, and/or the portion of the box to the right of the median will be greater than the portion of the box to the left of the median.

If the data is negatively skewed, the length of the left whisker will be greater than the length of the right whisker, and/or the portion of the box to the left of the median will be greater than the portion of the box to the right of the median.

Outliers As well as providing a graphical summary of a data set, a box plot is useful for identifying outliers. When presenting and analysing data it is important to identify and review outliers. An outlier may be an observation that has been incorrectly recorded. If so, it needs to be corrected before further analysis.

An outlier may also be an observation that was incorrectly included in the data set. If so, it can be removed. An outlier may just be an unusual observation that has been recorded correctly and does belong to the data set. In such cases the observation should remain.

Using Box Plots to Compare Data Sets We can use box plots to compare several data sets by constructing a box plot for each data set and displaying the box plots on the same scale. We can then compare the centre, spread and shape of the distributions of the different data sets. If the box plots are not on the same scale, more care needs to be taken when comparing the distributions. .

Example In automobile mileage and gasoline-consumption testing, 13 automobiles were road tested for 300 miles in both city and country driving conditions. The following data were recorded for miles-per-gallon performance. City 16.2 16.7 15.9 14.4 13.2 15.3 16.8 16.0 16.1 15.3 15.2 15.3 16.2 Country 19.4 20.6 18.3 18.6 19.2 17.4 17.2 18.6 19 21.1 19.4 18.5 18.7 Construct box plots for both data sets and compare the performance for city and country driving.

Reading for next lecture Chapter 4 Sections 4.1 – 4.3 Exercises 3.47 3.54 3.57 3.59 3.61