Descriptive Statistics: Numerical Methods Chapter III Descriptive Statistics: Numerical Methods
Key Learning Objectives and Topics in this Chapter Measures of Location: (Mean, Median, Mode, Percentiles, Quartiles) Measures of Dispersion/Variability ( Range, Variance, Standard Deviation, Coefficient of Variation) Measures of distribution shape, and association between two variables
Important Note In all cases : Know the formulas, learn the computation procedures (i.e., apply the formulas) and know the meaning (interpretation) of the measures computed. Use Excel; Practice! Practice! and Practice!
3.1. Introduction These measures could be computed for When describing data, usually we focus our attention on two types of measures.. Central location (e.g. average) Variability or Spread These measures could be computed for Population: Parameters Sample : Statistics
3.2 Measures of Central Location A center is a reference point. Thus a good measure of central location is expected to reflect the locations of all the other actual points in the data. How? With two data points, the central location should fall in the middle between them (in order to reflect the location of both of them). if the third data point appears on the left hand-side of the center, it should “pull” the central location to the left. With one data point clearly the central location is at the point itself.
Measures of Location Mean Median Mode Percentiles Quartiles If the measures are computed for data from a sample, they are called sample statistics. Mean Median Mode If the measures are computed for data from a population, they are called population parameters. Percentiles Quartiles A sample statistic is referred to as the point estimator of the corresponding population parameter.
i) The Arithmetic Mean (µ) This is the most popular and useful measure of central location Sum of the observations Number of observations Mean =
Observations in the data Sum of the values of Observations in the data i) The Arithmetic Mean Sample mean Population mean Number of observations In the sample (Sample size) Number of Observations In the Population (Population size)
i) The Arithmetic Mean Example 1 Time (hours) spent by 10 adults on the Internet are as follows: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Based on this data, compute the mean (average) amount of time spent on the Internet? Based on this data, the average amount of time spent on the internet by a typical adult is 11 hours.
ii) The Median The Median of a set of observations is the value that falls in the middle of a data that is arranged in certain order (ascending or descending). It is the value that divides the observation into two equal halves
ii) The Median To find the median: We Put the data in an array (in increasing or decreasing order). If the total number of observation in the data set is an ODD number, the median is the middle value. If the total number of observation contained in the data set is EVEN, then the median is the AVERAGE of the middle two values.
Odd Number Observations iii) The Median Example 2a Find the median for the following observations. 0, 7, 12, 5, 14, 8, 0, 9, 22 Step-1: Arrange the data in increasing/ decreasing order 0, 0, 5, 7, 8 9, 12, 14, 22 Odd Number Observations Median= 8 Step-2: Count the total number of observation in the data (9) …
Even number Observations iii) The Median Example 2b Find the median for the following observations. 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 0, 0, 5, 7, 8, 9, 12, 14, 22, 33 Step-1: Arrange the data in increasing/ decreasing order Step-2: Count the total number of observation in the data (10)… Even number Observations Median=(8+9)/2=8.5
ii) The Median Note: The median (8 in example 2a)of an odd set of data is a member of the data values. The median (8.5 in example 2b) of an even data set is not necessarily a member of the set of values. Unlike the mean, the median is not affected by the value of an observation in the data set.
III) The Center: Mode The mode is the most frequent value. The Mode is the value that occurs most frequently in the data. It is the value with the highest frequency In any data set there is only one value for the mean or the median. However, a data set may have more than one value for the mode.
III) The Center: Mode Two modal classes One modal class Histogram of Income distribution One modal class Two modal classes
III) The Center: Mode Example 3: What is the mode for the following data? 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 Solution All observation except “0” occur once. There are two “0” values. Thus, the mode is zero. Is this a good measure of central location? The value “0” does not reside at the center of this set (compare with the mean = 11.0 and the median = 8.5).
Comparing Measures of Central Tendency: Mean, Median, Mode If mean = median = mode, the shape of the distribution is symmetric.
Comparing Measures of Central Tendency: Mean, Median, Mode If mode < median < mean, the shape of the distribution trails to the right, is positively skewed. If mode > median > mean, the shape of the distribution trails to the left, is negatively skewed. A positively skewed distribution (“skewed to the right”) A negatively skewed distribution (“skewed to the left”) Mode Mean Mean Mode Median Median
Percentiles A percentile provides information about the relative location and spread of the data between the smallest to the largest value. Is a measure of the relative location, but not necessarily that of the central location Percentile tells us the proportion of observations that lie below or above a certain value in the data. Example: Admission test scores for colleges and universities are frequently reported in terms of percentiles.
Percentiles Definition: The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.
Computing Percentiles Arrange the data in ascending order. Compute the ith position of the pth percentile. If i is not an integer, round up. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i +1.
Rounding 7.5, we note that the 8th data value is Compute the 75th percentile of the following data i = (p/100)n = (75/100)X10 =7.5 Rounding 7.5, we note that the 8th data value is The 75th Percentile = 435
Averaging the 5th and 6th data value, we get Compute the 50th percentile of the following data i = (p/100)n = (50/100)X10 =5 Averaging the 5th and 6th data value, we get 5th Percentile = (435 + 435)/2 = 435
Quartiles Quartiles are specific percentiles. First Quartile = 25th Percentile Second Quartile = 50th Percentile = the Median Third Quartile = 75th Percentile
Quartiles Divide a data set into four equal parts
3.2 Measures of Variability
3.2 Measures of Variability Measures of central location fail to tell the whole story about the distribution. A question of interest that remains unanswered even after obtaining measures of central location is how spread out are the observations around the central (say, mean) value? Variability is Important in business decisions. For example, in choosing between two suppliers A and B, we might consider not only the average delivery time for each, but also the variability in delivery time for each.
Measures of Variability Range Inter-Quartile Range Variance Standard Deviation Coefficient of Variation
i) The Range Range = largest value – smallest value The range in a set of observations is the difference between the largest and smallest observations. The range is the distance between the smallest and the largest data value in the set. Range = largest value – smallest value Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points. It is also very sensitive to the smallest and largest data values
ii) Inter Quartile Range This is a measure of the spread of the middle 50% of the observations Large value indicates a large spread of the observations Is not sensitive to extreme data values Inter quartile range = Q3 – Q1
iii) The Variance Is the average of the squared differences between each data value and the measure of central location (mean) Is calculated differently when we use population and when we use a sample The variance is a measure of variability that utilizes all the data.
iv) The Variance Variance of a Population Variance of a sample
iii) The Variance Why square the difference? Sum of deviation from the mean is zero Why divide by n-1 instead of n ? Better approximation of the population variance
Example- Computing the Variance-Based on a Sample data Variance of a sample Find the variance of the following sample observations 9 11 8 12
Computing Variance of a sample Step-1: Find the mean 9-10= -1 11-10= +1 Step-2: Compute deviations from the mean 8-10= -2 12-10= +2 Step-3: Square the deviations, add them together, and divide the sum of the squared deviations by n-1
iv) Standard Deviation The standard deviation of a set of observations is the square root of the variance .
Why Standard Deviation? The standard deviation Is often reported in the actual unit of measure in which the data is recorded. Thus it can be used to compare the variability of several distributions that are measured in the same units, It can also be used to make a statement about the general shape of a distribution (Kurtosis).
Computing the standard deviation Step-1: Find the mean 9-10= -1 11-10= +1 Step-2: Compute deviations from the mean 8-10= -2 12-10= +2 Step-3: Square the deviations, add them together, and divide the sum of the squared deviations by n-1 step-4: Take the square root of the variance
V) Coefficient of Variation The coefficient of variation is a measure of how large the standard deviation is relative to the mean. The coefficient of variation is computed as follows: CV= for a sample for a population
Why Coefficient of Variation? Example: Is a standard deviation of 10 large? A standard deviation of 10 may be perceived large when the mean value is 100, but it is only moderately large if the mean value is 500 Coefficient of Variation can be used to compare variability in data sets that are measured in different units.
Coefficient of Variation Variance, Standard Deviation, and Coefficient of Variation Variance Standard Deviation the standard deviation is about 11% of the mean Coefficient of Variation
Compute the Mean, Median, Mode, Range, Variance, Standard Deviation and Coefficient of Variation for income (in $1000) data from the following cities City Income Akron, OH 74.1 Atlanta, GA 82.4 Birmingham, AL 71.2 Cleveland, OH 62.3 Columbia, SC 79.9 Danbury, CT 66.8 Denver, CO 132.3 Detroit, MI 83.4 Lancaster, PA 100.0 Madison, WI 77.0 Minneapolis, MN 67.8
Compute every single measure of central location and Variability you have learned in this chapter for the following sample rent data on 70 efficiency apartments