Download presentation
1
Descriptive Statistics: Numerical Methods
Chapter III Descriptive Statistics: Numerical Methods
2
Key Learning Objectives and Topics in this Chapter
Measures of Location: (Mean, Median, Mode, Percentiles, Quartiles) Measures of Dispersion/Variability ( Range, Variance, Standard Deviation, Coefficient of Variation) Measures of distribution shape, and association between two variables
3
Important Note In all cases :
Know the formulas, learn the computation procedures (i.e., apply the formulas) and know the meaning (interpretation) of the measures computed. Use Excel; Practice! Practice! and Practice!
4
3.1. Introduction These measures could be computed for
When describing data, usually we focus our attention on two types of measures.. Central location (e.g. average) Variability or Spread These measures could be computed for Population: Parameters Sample : Statistics
5
3.2 Measures of Central Location
A center is a reference point. Thus a good measure of central location is expected to reflect the locations of all the other actual points in the data. How? With two data points, the central location should fall in the middle between them (in order to reflect the location of both of them). if the third data point appears on the left hand-side of the center, it should “pull” the central location to the left. With one data point clearly the central location is at the point itself.
6
Measures of Location Mean Median Mode Percentiles Quartiles
If the measures are computed for data from a sample, they are called sample statistics. Mean Median Mode If the measures are computed for data from a population, they are called population parameters. Percentiles Quartiles A sample statistic is referred to as the point estimator of the corresponding population parameter.
7
i) The Arithmetic Mean (µ)
This is the most popular and useful measure of central location Sum of the observations Number of observations Mean =
8
Observations in the data
Sum of the values of Observations in the data i) The Arithmetic Mean Sample mean Population mean Number of observations In the sample (Sample size) Number of Observations In the Population (Population size)
9
i) The Arithmetic Mean Example 1
Time (hours) spent by 10 adults on the Internet are as follows: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Based on this data, compute the mean (average) amount of time spent on the Internet? Based on this data, the average amount of time spent on the internet by a typical adult is 11 hours.
10
ii) The Median The Median of a set of observations is the value that falls in the middle of a data that is arranged in certain order (ascending or descending). It is the value that divides the observation into two equal halves
11
ii) The Median To find the median:
We Put the data in an array (in increasing or decreasing order). If the total number of observation in the data set is an ODD number, the median is the middle value. If the total number of observation contained in the data set is EVEN, then the median is the AVERAGE of the middle two values.
12
Odd Number Observations
iii) The Median Example 2a Find the median for the following observations. 0, 7, 12, 5, 14, 8, 0, 9, 22 Step-1: Arrange the data in increasing/ decreasing order 0, 0, 5, 7, 8 9, 12, 14, 22 Odd Number Observations Median= 8 Step-2: Count the total number of observation in the data (9) …
13
Even number Observations
iii) The Median Example 2b Find the median for the following observations. 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 0, 0, 5, 7, 8, 9, 12, 14, 22, 33 Step-1: Arrange the data in increasing/ decreasing order Step-2: Count the total number of observation in the data (10)… Even number Observations Median=(8+9)/2=8.5
14
ii) The Median Note: The median (8 in example 2a)of an odd set of data is a member of the data values. The median (8.5 in example 2b) of an even data set is not necessarily a member of the set of values. Unlike the mean, the median is not affected by the value of an observation in the data set.
15
III) The Center: Mode The mode is the most frequent value.
The Mode is the value that occurs most frequently in the data. It is the value with the highest frequency In any data set there is only one value for the mean or the median. However, a data set may have more than one value for the mode.
16
III) The Center: Mode Two modal classes One modal class
Histogram of Income distribution One modal class Two modal classes
17
III) The Center: Mode Example 3: What is the mode for the following data? 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 Solution All observation except “0” occur once. There are two “0” values. Thus, the mode is zero. Is this a good measure of central location? The value “0” does not reside at the center of this set (compare with the mean = 11.0 and the median = 8.5).
18
Comparing Measures of Central Tendency: Mean, Median, Mode
If mean = median = mode, the shape of the distribution is symmetric.
19
Comparing Measures of Central Tendency: Mean, Median, Mode
If mode < median < mean, the shape of the distribution trails to the right, is positively skewed. If mode > median > mean, the shape of the distribution trails to the left, is negatively skewed. A positively skewed distribution (“skewed to the right”) A negatively skewed distribution (“skewed to the left”) Mode Mean Mean Mode Median Median
20
Percentiles A percentile provides information about the relative
location and spread of the data between the smallest to the largest value. Is a measure of the relative location, but not necessarily that of the central location Percentile tells us the proportion of observations that lie below or above a certain value in the data. Example: Admission test scores for colleges and universities are frequently reported in terms of percentiles.
21
Percentiles Definition:
The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.
22
Computing Percentiles
Arrange the data in ascending order. Compute the ith position of the pth percentile. If i is not an integer, round up. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i +1.
23
Rounding 7.5, we note that the 8th data value is
Compute the 75th percentile of the following data i = (p/100)n = (75/100)X10 =7.5 Rounding 7.5, we note that the 8th data value is The 75th Percentile = 435
24
Averaging the 5th and 6th data value, we get
Compute the 50th percentile of the following data i = (p/100)n = (50/100)X10 =5 Averaging the 5th and 6th data value, we get 5th Percentile = ( )/2 = 435
25
Quartiles Quartiles are specific percentiles.
First Quartile = 25th Percentile Second Quartile = 50th Percentile = the Median Third Quartile = 75th Percentile
26
Quartiles Divide a data set into four equal parts
27
3.2 Measures of Variability
28
3.2 Measures of Variability
Measures of central location fail to tell the whole story about the distribution. A question of interest that remains unanswered even after obtaining measures of central location is how spread out are the observations around the central (say, mean) value? Variability is Important in business decisions. For example, in choosing between two suppliers A and B, we might consider not only the average delivery time for each, but also the variability in delivery time for each.
29
Measures of Variability
Range Inter-Quartile Range Variance Standard Deviation Coefficient of Variation
30
i) The Range Range = largest value – smallest value
The range in a set of observations is the difference between the largest and smallest observations. The range is the distance between the smallest and the largest data value in the set. Range = largest value – smallest value Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points. It is also very sensitive to the smallest and largest data values
31
ii) Inter Quartile Range
This is a measure of the spread of the middle 50% of the observations Large value indicates a large spread of the observations Is not sensitive to extreme data values Inter quartile range = Q3 – Q1
32
iii) The Variance Is the average of the squared differences between each data value and the measure of central location (mean) Is calculated differently when we use population and when we use a sample The variance is a measure of variability that utilizes all the data.
33
iv) The Variance Variance of a Population Variance of a sample
34
iii) The Variance Why square the difference?
Sum of deviation from the mean is zero Why divide by n-1 instead of n ? Better approximation of the population variance
35
Example- Computing the Variance-Based on a Sample data
Variance of a sample Find the variance of the following sample observations
36
Computing Variance of a sample
Step-1: Find the mean 9-10= -1 11-10= +1 Step-2: Compute deviations from the mean 8-10= -2 12-10= +2 Step-3: Square the deviations, add them together, and divide the sum of the squared deviations by n-1
37
iv) Standard Deviation
The standard deviation of a set of observations is the square root of the variance .
38
Why Standard Deviation?
The standard deviation Is often reported in the actual unit of measure in which the data is recorded. Thus it can be used to compare the variability of several distributions that are measured in the same units, It can also be used to make a statement about the general shape of a distribution (Kurtosis).
39
Computing the standard deviation
Step-1: Find the mean 9-10= -1 11-10= +1 Step-2: Compute deviations from the mean 8-10= -2 12-10= +2 Step-3: Square the deviations, add them together, and divide the sum of the squared deviations by n-1 step-4: Take the square root of the variance
40
V) Coefficient of Variation
The coefficient of variation is a measure of how large the standard deviation is relative to the mean. The coefficient of variation is computed as follows: CV= for a sample for a population
41
Why Coefficient of Variation?
Example: Is a standard deviation of 10 large? A standard deviation of 10 may be perceived large when the mean value is 100, but it is only moderately large if the mean value is 500 Coefficient of Variation can be used to compare variability in data sets that are measured in different units.
42
Coefficient of Variation
Variance, Standard Deviation, and Coefficient of Variation Variance Standard Deviation the standard deviation is about 11% of the mean Coefficient of Variation
43
Compute the Mean, Median, Mode, Range, Variance, Standard Deviation and Coefficient of Variation for income (in $1000) data from the following cities City Income Akron, OH 74.1 Atlanta, GA 82.4 Birmingham, AL 71.2 Cleveland, OH 62.3 Columbia, SC 79.9 Danbury, CT 66.8 Denver, CO 132.3 Detroit, MI 83.4 Lancaster, PA 100.0 Madison, WI 77.0 Minneapolis, MN 67.8
44
Compute every single measure of central location and Variability you have learned in this chapter for the following sample rent data on 70 efficiency apartments
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.