Download presentation
Presentation is loading. Please wait.
1
Descriptive Statistics I
This module covers statistics commonly used to describe or summarize a set of data, including measures of central tendency (mean, median, mode) and measures of variability (range, standard deviation, variance). Author: Phillip E. Pfeifer © 2011 Phillip E. Pfeifer and Management by the Numbers, Inc.
2
Two Kinds of Descriptive Statistics
Measures of Central Tendency Mean Median Mode Measures of Variability Range (Maximum – Minimum) Standard Deviation Variance This MBTN module covers these six statistical measures. The first three describe the “center” of a data set. The latter three describe the spread of a data set. With each definition, we identify and explain the Excel function one can use to calculate the measure. MBTN | Management by the Numbers
3
The Sample Mean Definition Insight
The Sample Mean = The arithmetic average of the set of data (number1 + number2 +… numbern) / n Excel Function = Average(num1, num2, …, numn) - or Average(first cell:last cell) Insight If you know the sample mean and the number of data values, you can multiply the two to calculate the total. This is one reason the sample mean is such a popular statistic. MBTN | Management by the Numbers
4
The Sample Mean The Sample Mean Question 1: What is the sample mean of the following set of daily vehicle sales for a week? M=2, T=8, W=4, R=13, F=2 Answer: We know that sample mean = (number1 + number2 +… numbern) / n Therefore, substituting in our values: Sample Mean = ( ) / 5 = 5.8 We can also quickly calculate the total by multiplying 5.8 average vehicles x 5 days = 29 vehicles for the week MBTN | Management by the Numbers
5
The Median Definition Insight
The Median = The median is the point in the middle. An equal number of values are above & below the median. Note: If there are an even number of data values, the median is the average of the two middle values. Excel Function = Median(num1, num2, …, numn) - or Median(first cell:last cell) Insight Sorting the data makes it much easier to find the median. MBTN | Management by the Numbers
6
The Median The Median Question 1: What is the median of the following set of daily vehicle sales for a week? M=2, T=8, W=4, R=13, F=2 Answer: We know that the median is the point in the middle of the sorted data set Therefore, sorting our values: Median = 2, 2, 4, 8, 13 = 4 Note that two values are below (2, 2) and two values are above (8, 13) MBTN | Management by the Numbers
7
The Median The Median Question 2: What would be the median if our data set consisted of vehicle sales for Tuesday - Friday? T=8, W=4, R=13, F=2 Answer: We know that the median is the point in the middle of the sorted data set Therefore, sorting our values: Sorted Set = 2, 4, 8, 13 But, in this example, there are two points in the middle, 4 and 8. So take the average of the two points. Median = (4 + 8) / 2 = 6 MBTN | Management by the Numbers
8
The Mode Definition Definitions
The Mode = The Mode is the Value Occurring Most Often. Note: If there are no repeated values, rather than say all values “tie” for most occurring we say the data do not have a mode. Excel Function = Mode(num1, num2, …, numn) - or Mode(first cell:last cell) Definitions Unimodal = Where only one value occurs most often Bimodal = Where two values tie for occurring most often MBTN | Management by the Numbers
9
The Mode The Mode Question 1: What is the mode of the following set of daily vehicle sales for a week? M=2, T=8, W=4, R=13, F=2 Answer: We know that the mode is the value that occurs most often Therefore, sorting our values: 2, 2, 4, 8, 13 The mode is 2 as it occurs twice and the other three values occur only once. We can also describe this data set as unimodal because there is only one mode. MBTN | Management by the Numbers
10
The Mode The Mode Question 2: If the data set also included Saturday sales of 13 vehicles, what would be the mode of the 6-observation data set? M=2, T=8, W=4, R=13, F=2, S=13 Answer: We know that the mode is the value that occurs most often Therefore, sorting our values: 2, 2, 4, 8, 13, 13 The values 2 and 13 are both modes for this bimodal data set. MBTN | Management by the Numbers
11
Measures Of Central Tendency
Sample Mean The Arithmetic Average Median The Middle Value Mode The Value Occurring Most Often The ensemble of sample mean, median, and mode can tell you a lot about how the data values are distributed….as we shall now see. MBTN | Management by the Numbers
12
Symmetry and Skewness Definitions
If the data are unimodal and the mean, median, and mode are all equal, the data is said to be symmetric. If the data are unimodal and the mean, median, and mode are all different, the data is said to be asymmetric. Data is said to be skewed to the right where the data is characterized by a few large values and many small values. In this circumstance, the sample mean is normally greater than the median. Data is said to be skewed to the left where the data is characterized by a few small values and many large values. In this circumstance, the sample mean is normally less than the median. MBTN | Management by the Numbers
13
Symmetry and Skewness Symmetry and Skewness Question 1: Describe the following data of car sales for a week in terms of symmetry and skewness. M=2, T=12, W=9, R=7, F=5, S=7 Answer: First, let’s sort our values giving us: 2, 5, 7, 7, 9, 12 Mean = ( ) / 6 = 7 Median = 7 (middle value) Mode = 7 (occurs twice) Therefore, the mean, median and mode are all equal, so the data set would be described as symmetric (not skewed) MBTN | Management by the Numbers
14
Symmetry and Skewness Insight
Question 2: Describe the following data of car sales for a week in terms of symmetry and skewness. M=2, T=21, W=9, R=2, F=3, S=5 Answer: First, let’s sort our values giving us: 2, 2, 3, 5, 9, 21 Mean = ( ) / 6 = 7 Median = (3 + 5) / 2 = 4 (average of two middle values) Mode = 2 (occurs twice) The mean, median and mode are not equal the data would be considered asymmetric. Because the mean, median and mode are not equal with the mode being less than the median which, in turn, is less than the sample mean---we say the data are skewed to the right. Insight Business data sets are often skewed to the right (think of salaries, sales by customer, etc.) MBTN | Management by the Numbers
15
Measures of Variability
Many business decisions are based not only on averages, but also on variability around the average. Variability in temperature, for example, leads to higher heating/cooling cost. We turn now to three statistics that describe the spread of the data, e.g. measures of variability. Measures of Variability Range (Maximum – Minimum) Standard Deviation Variance MBTN | Management by the Numbers
16
The Range The Range Definition The Range = The difference between the maximum and minimum values in a data set. Excel Function = Range(number1, number2, …, numbern) - or Range(first cell:last cell) Question 1: What is range of the following set of daily vehicle sales for a week? M=2, T=8, W=4, R=13, F=2 Answer: We know that the range = Maximum - Minimum Therefore, substituting in our values: Range = 13 – 2 = 11 Note that the “range” is from 2 to 13, but the range of the data is 11. MBTN | Management by the Numbers
17
Sample Standard Deviation
Definition The Sample Standard Deviation is the square root of the “average” squared distances of the points from the sample average. (num1 – x )^2 + (num2 – x )^2 + … + (numn – x )^2 ^ (1/2) StdDev = n-1 Where x = sample average and n = number of data points in the data set Excel 2010 Function = stdev.s(num1, num2, …, numn) Excel 2007 Function = stdev(num1, num2, …, numn) The sample standard deviation is usually labeled as “s”….but I can live with StdDev. So….can we replace x with 𝑥 That would be super important. X usually represents one data value. If you want to stay with excel “num1” etc (I like the 1 as a subscript btw, we could change all the earlier excel functions to be consistent…and then numn would have a trailing subscript n and no longer need the blank), then you could invent something for average….probably “avg” or “num” with a bar across the top for sample average of the n numbers. Insight Think of the sample standard deviation as a measure of how variable the data are. If all the data take on the same value, the standard deviation will be zero. MBTN | Management by the Numbers
18
Sample Standard Deviation
Question 1: What is the sample standard deviation of the following set of daily vehicle sales for a week? M=2, T=8, W=4, R=13, F=2 Answer: We know that sample mean = (number1 + number2 +… numbern) / n Therefore, substituting in our values: Sample Mean = ( ) / 5 = 5.8 Then continuing our calculation for the sample standard deviation… Sum of Squared differences = (2 – 5.8)^2 + ( )^2 … + ( )^2 = 88.8 Std Dev = (88.8 / (5 – 1))^.5 = 4.71 Doing just one by hand will quickly demonstrate why Excel is such a valuable tool for statistics! MBTN | Management by the Numbers
19
Sample Standard deviation
Insight The sample standard deviation is a better measure of variability than the range because it uses all the data points (and for other technical reasons we will not get into.) To find a sample standard deviation, you will almost always use Excel….even if there are few data points. If there are lots of data points with a unimodal, symmetric (bell-shaped) distribution, a rough rule of thumb says that 68% of the values will fall within one standard deviation of the sample average. Using our previous example where the sample average = 5.8 and the standard deviation = 4.71 (and presuming a bell-shaped distribution – not the case), our rule of thumb would then say that we would expect 68% of the values to fall between 5.8 – 4.71 and (or between approx. 1.1 and 10.5) MBTN | Management by the Numbers
20
Sample Variance Definition Insight
The Sample Variance is the “average” squared distances of the points from the sample average (also the square of the standard deviation). (num1 – x )^2 + (num2 – x)^2 +…+ (numn – x)^2 Sample Variance = n - 1 Where x = sample average and n = number of data points in the data set Excel 2010 Function = var.s(num1, num2, …, numn) Excel 2007 Function = var(num1, num2, …, numn) Insight If this looks familiar, it should! Calculating sample variance requires all the steps in calculating sample standard deviation..except the final square root. Therefore, variance also equals StdDev ^ 2. MBTN | Management by the Numbers
21
Sample Variance Insight
Question 1: What is the sample variance of the following set of daily vehicle sales for a week? M=2, T=8, W=4, R=13, F=2 Answer: Sample Mean = ( ) / 5 = 5.8 Then continuing our calculation for the sample variance… Squares of the differences = (2 – 5.8)^2 + ( )^2 … + ( )^2 = 88.8 Variance = (88.8 / (5 – 1)) = 22.2 Insight Since the sample variance is the square of the sample standard deviation, if you know one you can easily calculate the other. Generally, the standard deviation is much easier to interpret, in part, because it has the same units as the data. (e.g. the 4.71 sample standard calculated earlier is 4.71 cars. The 22.2 is cars^2.) MBTN | Management by the Numbers
22
Descriptive Statistics
Measures of Central Tendency Mean Median Mode Measures of Variability Range (Maximum – Minimum) Standard Deviation Variance This completes our introduction to the six descriptive statistics listed above. What follows are a couple of slides that show how these statistics behave if you multiply the data by a constant “b” and add another constant “a”. This is called a linear transformation. The transformations used to convert pounds to kilograms, feet to miles, and millions to billions are all examples of linear transformations. MBTN | Management by the Numbers
23
Descriptive Statistics for Transformed Data
Let X represent the original data. Let Y = a + b * X be the transformed data. Sample mean(Y) = a + b * Sample Mean(X) Median(Y) = a + b * Median(X) Mode(Y) = a + b * Mode(X) Insight The mean, median, and mode all behave in the logical way for the linearly transformed data. Thus, if the median temperature was 68 degrees Fahrenheit, the median temperature (if calculated using the same data expressed in degrees Celsius) would be (5/9) * (68-32) = 20 degrees Celsius. This is true because the transformation of Fahrenheit to Celsius is linear…and because of the way the three statistics behave. MBTN | Management by the Numbers
24
Descriptive Statistics for Transformed Data
Let X represent the original data. Let Y = a + b * X be the transformed data. Range(Y) = abs(b) * Range(X) Standard Deviation(Y) = abs(b) * Standard Deviation(X) Variance(Y) = b^2 * Variance(X) Insight Since range, standard deviation, and variance all measure variability, it might come as no surprise that adding a constant to the data does NOT affect these three statistics. Multiplying the data by a constant, however, multiplies the range and standard deviation by the absolute value of the constant and multiplies the variance by the constant squared. Thus if the standard deviation of temperatures was 10 degrees Fahrenheit, the standard deviation of the same data would be (5/9)*10 or 50/9 in degrees Celsius. MBTN | Management by the Numbers
25
Descriptive Statistics – Further Reference
Any Introductory Statistics Book such as Introductory Statistics (9th Edition), Neil. A. Weiss, Pearson Publishing, 2010. Two-Variable Descriptive Statistics (advanced MBTN module – coming soon). This module provides further insight into statistics including correlation and regression. MBTN | Management by the Numbers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.