Chapter 3 Numerical Descriptive Measures
3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram or a frequency distribution curve. 3 different types: Mean (average) Median (middle #) Mode (most frequent value)
Mean Mean is the average. It is the most often used central tendency. Mean Formulas:
Example of Mean (by hand then in calculator) 20 Test Scores for my class: 97, 98, 93, 95, 94, 81, 96, 80, 86, 95, 100, 83, 92, 97, 95, 80, 92, 93, 99, 97 a.) Let’s calculate Population Mean using: b.) Now take a sample of 5 and use the Sample Mean and compare the two results.
Difference between two means The value of the population mean is constant. However, the value of the sample mean varies from sample to sample. Sometimes a data set may contain a few very small or a few very large values relative to the majority. They are called outliers or extreme values. A major shortcoming of the MEAN is that is VERY sensitive to outliers.
Example Suppose 5 people take a test with the following scores: 10, 100, 95, 90, 88, 85 The score of 10 will affect the mean greatly, therefore the mean is NOT always the best measure of central tendency.
3.3 Part 1 Mean for Grouped Data Example (by hand and in calculator) GradeFreq. (f) Midpt (m) m*f * 4 = * 1 = * 12 = * 3 = FORMULAS: **Note: A Mean for Grouped Data is NOT the exact Value.
Median Median is the value of the middle term in a data set that has been ranked in increasing order. You can not find the Median for Grouped Data. ***When finding Median: 1.) Rank the data in increasing order. 2.) Find the middle term.
Example of Median 1.) 10, 5, 19, 8, 3 Rearrange to be: 3, 5, 8, 10, ) Let’s find the Median of our example: Rearranged we have: 80, 80, 81, 83, 86, 92, 92, 93, 93, 94, 95, 95, 95, 96, 97, 97, 97, 98, 99, 100. Median is ( ) / 2 = 94.5.
Advantages Median is NOT influenced by outliers. The median gives the center of the histogram, with half the data to the left and half to the right.
Mode Mode- (most) is the value that occurs with highest frequency in a data set. The mode of our example of test scores: 97, 98, 93, 95, 94, 81, 96, 80, 86, 95, 100, 83, 92, 97, 95, 80, 92, 93, 99, 97 is 95 and 97 since they both appear 3 times. Our example is Bimodal- has two most frequent #s.
Example of Mode 77, 69, 74, 81, 71, 68, 74, 73 Mode is 74. This is Unimodal since there is only one mode. More than 2 modes is called Multimodal. Advantage: Mode can be calculated for quantitative and qualitative data. Whereas, mean and median can only be used for quantitative data.
Relationship among the mean, median, and mode For symmetric graphs- the values of the mean, median, and mode are identical and lie at the center of the distribution. For skewed right graphs (tail to right)- the mean is the largest, mode is the smallest, and median lies between these two. For skewed left graphs (tail to left)- the mean is the smallest, the mode is the largest, and the median is between these two.
Homework Page 82 (3.12, 3.14, 3.16, 3.18) Page (3.64, 3.66, 3.68 find mean of the grouped data only.) Tomorrow we will do standard deviation and variance.
3.2 Measures of dispersion for ungrouped data Measures of dispersion are the spread or Variation of the data. Ex:Age of workers at 2 different Companies Comp 1: 47, 38, 35, 40, 36, 45, 39 Comp 2: 70, 33, 18, 52, 27 **Both Means are 40, but the spread or Variation of the data is much different! Company 2 has a MUCH larger spread. People often want more consistency in data with LESS variation.
3 more ways to measure data Together with Measures of Central Tendency we can also get a better picture from: Range Standard Deviation Variance
Example- Wait time (minutes) at two Local Banks Bank 1:6.5, 6.6, 6.7, 6.8, 7.1, 7.3, 7.4, 7.7, 7.7, 7.7 Bank 2:4.2, , 6.2, 6.7, 7.7, 7.7, 8.5, 10.0 Both have means = 7.15, Median = 7.2, and Mode = 7.7. Therefore, we need other ways to compare these sets of data.
Range for two banks is MUCH different. 1.) Range = Largest – Smallest. Comp 1: Range = 47 – 35 = 12 Comp 2: Range = 70 – 18 = 52 **Customers would appreciate more consistent wait times, and therefore might prefer Bank 1 since it has a smaller Range. Disadvantages of Range: Range is influenced by outliers. Range only uses two values of the data.
Standard Deviation 2.) Standard Deviation is most used. It tells us how closely the values of a data set are clustered around the mean. A smaller Standard Deviation- means that the smaller the range of the values around the mean. This means that more data points are around the mean, and thus, the data is more consistent. A larger Standard Deviation- indicates that the values of the data set are spread over a relatively larger range around the mean. Thus, the data is less consistent.
Standard Deviation cont… s is a sample standard deviation (sigma) is the population standard deviation
Example: Deviation of a Sample of four test scores with a mean 84. x 8282 – 84 = – 84 = – 84= – 84 = 864 THEREFORE, = = 443 FOR OUR EXAMPLE WE HAVE: ***The standard deviation is quite large. This is because the data has a large range and is NOT very consistent.
Variance Variance is the square of the standard deviation. is the population variance is the sample variance The values of the standard deviation and variance are NEVER negative. If no Variance, then all data is the same. Ex: 25, 25, 25 has a variance of ZERO.
Some more points… Standard Deviation has the same units of measure as the original data. Always round standard deviation and variance to one more decimal place than the original data. You can use a calculator to find standard deviation. STAT EDIT Enter Data in L1 STAT EDIT CALC 1-Var-Stats
Summary and Homework Sample Statistics- Population Parameters- Do page 91 (3.42, 3.46, 3.52, and 3.54)
3.3 Cont… Standard Deviation and Variance for Grouped Data Store midpoints of each group in L 1 and Frequency for each in L 2. STAT CALC 1-Var stats L 1, L 2 EXAMPLE 3.17 on page 97 HOMEWORK: Do Page (64, 66, 68) For 66 only find s. For 64 and 68 find s and s 2.