Chapter 3: Descriptive Statistics

Slides:



Advertisements
Similar presentations
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Descriptive Statistics
Calculating & Reporting Healthcare Statistics
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Intro to Descriptive Statistics
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Department of Quantitative Methods & Information Systems
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
© The McGraw-Hill Companies, Inc., Chapter 3 Data Description.
Business Statistics: Communicating with Numbers
4 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Descriptive Statistics: Numerical Methods
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Skewness & Kurtosis: Reference
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Averages and Variation Chapter 3 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
Categorical vs. Quantitative…
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter Three McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Chapter 3 Averages and Variation Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Exploratory Data Analysis
Business and Economics 6th Edition
Chapter 3 Created by Bethany Stubbe and Stephan Kogitz.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Averages and Variation
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Statistics
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Business and Economics 7th Edition
Presentation transcript:

Chapter 3: Descriptive Statistics

Learning Objectives LO1 Apply various measures of central tendency— including the mean, median, and mode—to a set of ungrouped data. LO2 Apply various measures of variability—including the range, interquartile range, mean absolute deviation, variance, and standard deviation (using the empirical rule and Chebyshev’s theorem)—to a set of ungrouped data. LO3 Compute the mean, median, mode, standard deviation, and variance of grouped data. LO4 Describe a data distribution statistically and graphically using skewness, kurtosis, and box-and-whisker plots. LO5 Use computer packages to compute various measures of central tendency, variation, and shape on a set of data, as well as to describe the data distribution graphically.

Measures of Central Tendency Ungrouped Data Ungrouped data is any array of numbers which have not been summarized by statistical techniques Measures of central tendency reveal information about the values at the center, or middle part, of a group of numbers (or ordered array) Common Measures of Central Tendency are the : Mean Median Mode Percentiles Quartiles

The Arithmetic Mean The arithmetic mean is commonly called ‘the mean’ It is the average of a group of numbers It is a concept applicable for interval and ratio data It is not applicable for nominal or ordinal data The mean is computed by summing all values in the data set and dividing the sum by the number of values in the data set Thus, its value is affected by each value in the data set, including extreme values

Application of Arithmetic Mean in Statistics As a summary statistic of central tendency in data produced by business and economic processes When used in these settings it is important to make the distinction between The population mean: µ and the Sample mean The population mean is based on all of the values within the population The sample mean only uses some of the values within a population

Computing Population Mean Suppose a company has five departments with 24, 13, 19, 26, and 11 workers in each department. The population mean number of workers in each department is 18.6 workers. The computations follow:

Computing Sample Mean The calculation of a sample mean uses the same algorithm as for a population mean and will produce the same answer if computed on the same data. However, a separate symbol is necessary for the population mean and for the sample mean. Given the following set of numbers: 57, 86, 42, 38, 90, and 66. The sample mean is 63.167. The computations follow:

Impact of Extreme Values on the Mean The mean is the most commonly used measure of central tendency because of its mathematical properties and because it uses all the data point in the data set However, the mean is affected by extremely large or extremely small numbers Note that for the sample mean example, if the largest number 90 is replaced by the number 1,000 the mean becomes 214.833 as opposed to 63.167 If the smallest number 38 is replaced by the number 5 the mean becomes 57.667 as opposed to 63.167 Extreme values can significantly distort the mean.

The Median The median is the middle value in an ordered array of numbers The median applies for ordinal, interval, and ratio data Advantage of the median – it is unaffected by extremely large and extremely small values in the data set A disadvantage of the median is that not all the information from the numbers is used

Computing the Median First Step Second Step Third Step Arrange the observations in an ordered array Second Step For an array with an odd number of terms, the median is the middle number. Third Step For an array with an even number of terms, the median is the average of the two middle numbers. Locating the Median The median’s location in an ordered array is found by (n+1)/2

Median Example with an Odd Number of Data Let X be an ordered array such that X has the following values: 3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21, 22 There are 17 values in the ordered array Position of median = (n+1)/2 = (17+1)/2 = 9th position Counting from left to right to the 9th position, the median is 15 Advantage - extreme values do not distort the median Note that if 22 (maximum value) is replaced by 100, the median is still 15 If 3 (minimum value) is replaced by -103, the median is still 15

Median Example with an Even Number of Data Let X be an ordered array such that X assumes the following values: 3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21 There are 16 values in the ordered array Position of median = (n+1)/2 = (16+1)/2 = 8.5th position The median is a value between the 8th and 9th observations in the ordered array. The median is 14 + 0.5(15-14) = 14.5 or simply, (14+15)/2 =14.5 Advantage - extreme values do not distort the median If 21 (maximum value) is replaced by 100, the median is still 14.5 If 3 (minimum value) is replaced by -88, the median is still 14.5

The Mode The mode is the value that occurs most frequently in an array of data The mode applies to all levels of data measurement: nominal, ordinal, interval, and ratio Unimodal: describes data sets with a single mode Bimodal: describes data sets that have two modes Multimodal: describes data sets that contain more than two modes

Example of the Mode Organizing the data into an ordered array helps to locate the mode The arrangement of the numbers represents an ordered array 44 is the value that occurs most frequently (occurs 5 times). The mode is 44

Percentiles Percentiles are measures of central tendency that divide a group of data into 100 parts The nth percentile is the value such that at least n percent of the data are below that value and at most (100 - n) percent are above that value For example: If a plant operator takes a safety examination and 87.6% of the safety exam scores are below that person’s score, he or she still scores at only the 87th percentile, even though more than 87% of the scores are lower. The median is the 50th percentile and has the same value as the 50th percentile

Percentiles Percentiles are stair step values: for example, the 87th and 88th percentile have no values between them Percentile methods are applicable for ordinal, interval, and ratio data and are not applicable for nominal data In general percentiles are not influenced by extreme values in the data set

Steps in Determining the Location of the Percentile Organize the data into ascending order Calculate the percentile location (i) using: Determine the location If i is a whole number, the Pth percentile is the average of the value at the ith location and the value at the (i + 1)th location. If i is not a whole number, the Pth percentile value is located at the whole-number part of i + 1. Where P = percentile i = percentile location n = number in the data set

Calculating Percentiles: An Example Raw Data: 14, 12, 19, 23, 5, 13, 28, 17 Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28 Problem: Find 30th percentile Number of observations n = 8 Location of 30th Percentile: The location index, i, is not a whole number. Therefore put location at whole number portion of ( i + 1) = 2.4 + 1 = 3.4. The whole number portion is 3. The 30th percentile is at the 3rd location of the array: 30th percentile = 13

Quartiles Quartiles are measures of central tendency that divide a group of data into four subgroups or parts Q1: 25% of the data set is below the first quartile Q2: 50% of the data set is below the second quartile Q3: 75% of the data set is below the third quartile Relationship between Quartiles and percentiles Q1 is equal to the 25th percentile Q2 is located at 50th percentile and equals the median Q3 is equal to the 75th percentile Quartile values are not necessarily members of the data set

Calculating Quartiles: An Example Let X be an ordered array: If X={ 106, 109, 114, 116, 121, 122, 125, 129} then Q1: Q2: Q3: Note that when i is a whole number the quartile is the average of the ith and (i+1)th values in the ordered set

Measures of Variability: Ungrouped Data Measures of variability are used to describe the spread or dispersion of data By using variability with measures of central tendency, the result is a more complete description of data Measures of variability for ungrouped data include: range, interquartile range, mean absolute deviation, variance, standard deviation, z scores and coefficient of variation

Measures of Variability: Ungrouped Data Measures of variability describe the dispersion (spread) of a set of data or the convergence (unity) of a set of data Dispersion explains how far data is spread apart or disassociates from the mean Convergence explains how data moves towards union or conformity of the mean Variability is most frequently expressed in terms of deviation from the norm or mean. The images in the next slides express this visually

Variability Mean Mean No Variability in Cash Flow (same amounts) Variability in Cash Flow (different amounts) Mean Mean

Variability No Variability Variability

Range The range is the difference between the largest and smallest values in the data set Usefulness: Advantage - simple to compute Disadvantages: Ignores all data points except the two extremes Influenced by extreme values Has no reference point Has limited use by itself Example of range using data provided:

Interquartile Range Interquartile Range = Q3 – Q1 The interquartile range contains all values in the interval between the first and third quartiles The interquartile range accounts for the middle 50% of values in the ordered data set The interquartile range is especially useful in situations where data users are more interested in values toward the middle and less interested in extremes The interquartile range is less influenced by extremes

Deviation from the Mean An examination of deviations from the mean can reveal information about the variability of data However, the individual deviations are used mostly as a tool to compute other measures of variability Example – The following data set includes: 5, 9, 16, 17, 18 with a mean of µ = 13 (x - ) show distances around the mean or individual deviation from the mean: -8, -4, 3, 4, 5

Mean Absolute Deviation Absolute deviations express the tendency for observations to differ on the average from the mean Easy to calculate but not as statistically useful or unbiased as the use of variance and standard deviation measures Below is an example calculating the mean absolute deviation

Population Variance Population variance is the sum of the square deviations divided by the number of observations Statistics are measured in terms of square units of measurement Square units of measurement are hard to interpret so variance is typically used as a process of obtaining the standard deviation of a data set

Example of Population Variance Given the following x values, the solution would be expressed as 26.0 units squared

Population Standard Deviation Square root of the population variance Easier to interpret in practice than the variance Measures the dispersion of the population data from the mean

Example of Sample Variance Sample variances are also expressed as units squared. For example:

Example of Sample Standard Deviation The sample standard deviation is the square root of the sample variance Easier to interpret in practice than square units Sample standard deviation is used as a good estimator of the population standard deviation

Standard Deviation Standard deviation is the square root of the variance Standard deviation of a population is denoted by: The standard deviation of a sample is denoted by:

Uses of Standard Deviation Indicator of financial risk Quality Control construction of quality control charts process capability studies Comparing two or more populations household incomes in two cities employee absenteeism at two plants used as a percentage of the mean, the coefficient of variation (CV)

Standard Deviation as an Indicator of Financial Risk

Symmetric and Asymmetric Distributions Data are either symmetric or non-symmetric with respect to some measure of central tendency Statisticians have observed that distributions describing many types of business and economic data tend to be symmetric or have a normal shape They found that in practical terms the processes that generate symmetric data have special and exact properties (the empirical rule) with respect to data concentration Non-symmetric distributions, in practice and theory, obey as a minimum specified rules with respect to the concentration of data values in a population (The Chebyschev Theorem)

Empirical Rule When data are normally distributed or approximately normal

- Chebyshev’s Theorem - When Data are Not Normally Distributed or Nonsymmetric. The Chebyshev Theorem applies to all distributions It measures the minimum mass or concentration of data that lies within a specified number of standard deviation around the mean

Number of Standard Deviations Chebyshev’s Theorem A general theory applying to all distributions Calculations for k= 2,3,4 . k = 1 is not defined Number of Standard Deviations k Distance from the Mean Minimum Proportion of Values Falling within Distance from the Mean 2 0.75 3 0.89 4 0.94

Z Scores The z score represents the number of standard deviations a value (x) is above or below the mean Data for a z score is normally distributed Translates into standard deviations Z score formula

Coefficient of Variation Ratio of the standard deviation to the mean, expressed as a percentage Measurement of relative dispersion expressed as: ( ) C V = s m 100

Examples of Coefficient of Variation ( ) 2 84 10 100 11 90 m s = C V . 1 29 4 6 15 86

Measures of Central Tendency and Variability: Grouped Data Mean Median Mode Measures of Variability Variance Standard Deviation

Mean of Grouped Data Weighted average of class midpoints Class frequencies are the weights Mean of group data:

Example Calculation of Grouped Mean

Median of Grouped Data

Calculating the Median of Grouped Data

Estimating the Mode from Grouped Data The modal class is class interval with the greatest frequency -(7- under 9) for the example below. The mode for the grouped data is the class midpoint of the modal class. Mode = 8 for the example below.

Variance and Standard Deviation from Grouped Data

Population Variance and Standard Deviation of Grouped Data

Descriptions and Measures of Shape Skewness Absence of symmetry Presence of extreme values in one or other side of a distribution Kurtosis Peakedness of a distribution Leptokurtic: high and thin peak Mesokurtic: normal or mound shaped top Platykurtic: flat topped and spread out Box and Whisker Plots Graphic display of a distribution using 5-summary statistics Reveals skewness and data location or clustering

Probability Distributions Showing Symmetry and Skewness Symmetrical Right or Positively Skewed Left or Negatively Skewed

Symmetrical Shape Frequency Histogram Showing Relationship of Mean, Median and Mode

Coefficient of Skewness A summary measure for skewness based on the relationship of mean to median and the variation in the data If < 0, the distribution is negatively skewed (skewed to the left). If = 0, the distribution is symmetric (not skewed). If > 0, the distribution is positively skewed (skewed to the right).

Effect of Changes in Mean on the Coefficient of Skewness

Types of Kurtosis

Requirements for A Box and Whisker Plot Five specific numbers are used: Median, Q2 First quartile, Q1 Third quartile, Q3 Minimum value in the data set Maximum value in the data set Inner Fences: First Indicators of extreme values IQR = Q3 - Q1 Lower inner fence = Q1 - 1.5 IQR Upper inner fence = Q3 + 1.5 IQR Outer Fences: Strong Indicators of extreme values Lower outer fence = Q1 - 3.0 IQR Upper outer fence = Q3 + 3.0 IQR

Skewness and the Box Plot Box and whisker plot can determine skewness of a distribution. The location of the median in the box can indicate the skewness of the middle 50% of the data. If the median is located on the right side of the box, then the middle 50% are skewed to the left . If the median is on the left side, then the middle 50% are skewed to the right. Researcher can make judgment about skewness based on length of whiskers If the longest whisker is to the right of the box, then the outer data are skewed to the right, and vice versa. See box and whisker plot in next slide

Box and Whisker Plot

COPYRIGHT Copyright © 2014 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or translation of this work beyond that permitted by Access Copyright (The Canadian Copyright Licensing Agency) is unlawful. Requests for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his or her own use only and not for distribution or resale. The author and the publisher assume no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.