Working with one variable data
Measures of Central Tendency In statistics, the three most commonly used measures of central tendency are: Mean Median Mode Each measure has its particular advantage and disadvantage for a given set of data.
Mean Most commonly referred to as the average To find the mean, add up all of the numbers in your list and divide by the number of numbers. Really good when the data is fairly close together. Most commonly used.
Mean In statistics, it is important to distinguish between the mean of a population and the mean of a sample of that population
Mean - Population The Greek letter mu, μ – Represents a Population Mean ∑ x – is the sum of all values of X in the population. N – is the number of values in the entire population.
Mean - Sample read as “ x-bar – Represents a Sample Mean – is the sum of all values of X in the population. N – is the number of values in the entire population.
Median The median is the middle entry in an ordered list. There are as many data points above it as below it. When there is an even number of values, the median is the midpoint between the two middle values.
Mode The mode is the most frequent number in a data set. There can be no mode as well as more than one mode. Good when the value of the number is the most important information (e.g. shoe size). Only choice with categorical data.
Outliers Values distant from the majority of the data. The median is often a better measure of central tendency than the mean for small data sets that contain outliers. For larger data sets, the effect of outliers on the mean is less significant.
Choosing a Measure of Central Tendency If data contains outliers, use the median If the data are strongly skewed, use median If data is roughly symmetrical, the mean and the median will be close, so either is appropriate. If data is not numeric, use the mode.
Example The physics exam had the following results. 71, 82, 55, 76, 66, 71, 90, 84, 90, 64, 71, 70, 83, 45, 73, Determine the mean, median, and mode.
Example - Mean The physics exam had the following results. 71, 82, 55, 76, 66, 71, 90, 84, 90, 64, 71, 70, 83, 45, 73, 51 68
Example - Median The physics exam had the following results. 71, 82, 55, 76, 66, 71, 90, 84, 90, 64, 71, 70, 83, 45, 73, Order the data: 45, 51, 55, 64, 66, 68, 70, 71, 71, 71, 73, 76, 82, 83, 84, 90, 90, Therefore the median is 71.
Example - Mode The physics exam had the following results. 71, 82, 55, 76, 66, 71, 90, 84, 90, 64, 71, 70, 83, 45, 73, Therefore the mode is 71.
Weighted Mean Sometimes, certain data within a set are more significant than others. A weighted mean gives a measure of central tendency that reflects importance of the data Weighted means are often used in calculations of indices
Weighted Mean – sum of the weighted values. – sum of the various weighting factors.
Weighted Mean - Example The averages (means) of five Data Management classes are 69, 72, 66, 75, and 78. If the class sizes were 26, 33, 25, 35, and 37 respectively, determine the overall average (mean) for the entire grade.
Weighted Mean - Example ClassMean, x i Weight Factor Class Size, w i
Weighted Mean The average for the entire grade is 72.6%
Mean for Group Data The mean should always be calculated using the original data before they are grouped into intervals. If you are presented with the data already summarized in a frequency table approximation of the centres of the data can be made.
Mean for Group Data – sum of the interval midpoints times the number of data in the interval. – sum of all the frequencies.
Mean for Group Data - Example The following table represents the number of hours per day of watching TV in a sample of 500 people. Number of hours Frequency a)What is the mean number of TV viewing hours in this group? b)What length of time is most often spent in front of a TV by this group? c)What is the median number of TV viewing hours?
Interval Midpoint (m i ) Frequency f i Cumulative Frequency fixifixi x 0.5 = x 2.5 = x 4.5 = x 6.5 = x 8.5 = x 10.5 = x 12.5 = Find the midpoints and cumulative frequencies for the intervals 2.Calculate the midpoints times frequency for each interval 3.Determine the sum of frequency and f i m i
Mean for Group Data The mean number of viewing hours for this group was approximately five hours.
Mean for Group Data - Example b) What length of time is most often spent in front of a TV by this group? The mode is the answer to this question. From the frequency table the model interval is identified by the larges frequency. The most frequent period of time spent in front of a TV by this group is between four and five hours.
Mean for Group Data - Example c) What is the median number of TV viewing hours? The median is the middlemost datum. The median is the average of the 250 th or 251 th By referring to the cumulative frequency column we notice that the 250 th or 251 th data occur in the interval 4-5. We would then estimate the median to be 4.5 hours of viewing time.
Homework Pg 133 #1,3,5,7,8,9,11