Math 3680 Lecture #2 Mean and Standard Deviation
Mean vs. Median
Example: In a certain class of 13 students, 10 showed up the first exam, while 3 blew it off: Here are the grades; in order: (A) Calculate the class median. (i) Include all students. (ii) Ignore the students who slept in. (B) Calculate the class mean (average). (i) Include all students. (ii) Ignore the students who slept in
Definition: Sample mean. For a data set of size n, the sample mean is Definition: Population mean. For a finite population of size N, the population mean is
Example: Suppose the student who got a 55 instead got a 15. Would the median change? Would the mean? Example: Suppose the 98 is replaced by 980. Would the median change? Would the mean? By how much? Note: The mean is much more sensitive to wild outliers than the median
Exercise: For registered students at universities in the U.S., which is larger: average age or median age? Repeat for the heights of 12-year-olds. Repeat for the weights of 12-year-olds. Repeat for the scores on a college final exam.
Like the median, the mean only captures central behavior and does not contain information about the spread of the data. Physical interpretation of the mean: a “balance.” Physical interpretation of the median: half the area lies on each side.
Median = 110,149
Average = 115,953
We have just explored the ideas of mean (average), median and mode. These measurements are useful in providing succinct numerical representations for measures of central tendencies.
Exercise: Two different groups of 10 students are given identical quizzes with the following results. Compute the mean, median, and mode. Group A Group B
Standard Deviation
Definition: Sample Standard Deviation. For a data set of size n, the sample standard deviation is 1. Square all of the deviations from average. 2. Sum the squares, then divide by n - 1 (the degrees of freedom). 3.Take the square root of the result of step 2. Intuition: The standard deviation gives a measure of how “spread out” the data is.
Exercise: For each list below, find x and s: (i)1, 4, 6, 7, 8, 10 (ii)5, 8, 10, 11, 12, 14 (iii)3, 12, 18, 21, 24, 30
Example: Each of the following lists has an average of 50. For which one is the SD of the numbers the biggest? Smallest? 0, 20, 40, 50, 60, 80, 100 0, 48, 49, 50, 51, 52, 100 0, 1, 2, 50, 98, 99, 100 Example: For a list of positive numbers, can the SD ever be larger than the average?
For large data sets, Microsoft Excel can compute the mean and standard deviation. =AVERAGE(A1:E10) =STDEV(A1:E10)
1) The SD says how far away numbers on a list are from their average. Most entries on the list will be somewhere around one SD away from the average. Very few will be more than two or three SDs away. 2)Roughly 68% of the values will be within one SD of the average, and 95% will be within two SDs. (This is only a rule of thumb!)
Example: Estimate the mean of the high temperatures recorded in Denton over the past 30 days. Then estimate the standard deviation.
Definition: Population standard deviation. This formula should be used in the (rare) occasion that the entire population is known, not a sample.
Definition: Sample variance: Definition: Population variance:
Grouped Data
To handle grouped data, we pretend that all members of each class are located at the midpoint (called the mark). Find and for the age of the population under 50% of the poverty threshold. Q: Why aren’t we finding x and s? Q: Will our answer be exact?
To handle grouped data, we pretend that all members of each class are located at the midpoint (called the mark). Now compute the mean and standard deviation:
Definition: Grouped mean: where m = number of groups Definition: Grouped Population variance: Definition: Grouped sample variance: