1.4 Defining Data Spread An average alone doesn’t always describe a set of data effectively or completely. An average doesn’t indicate whether the data clusters, whether the set contains outliers, what the range is, how the data is spread etc. In general it does not tell about the set’s distribution. The various data distribution plots we have studied help to do that. Investigate the following to discover a way to determine a single number that can indicate the spread and variation in a data set.
For the following data try to determine the average (mean) distance the values are from the mean of the set. Test Scores Step 1- Calculate the mean of the set mean = 67.7 approx. 68 Step 2 – Calculate the distance each value is from the mean ( mean – the value ). This is called the deviation from the mean.
Data Value Deviation(mean – value) =
Step 3 – Square each deviation (to remove the negatives) Data Value Deviation Squared Deviation
Step 4 - Find the mean of the squared deviations / 12 = approx. 299 Step 5 – Find the square root of step 4 (the mean of the squared deviations) √ 299 = 17.3
You just found what is called….. Standard deviation – a # that describes the spread/variation within a set of data. It represents the average distance the data values are from the mean of the set. The greater the standard deviation… - the more spread/variation - the farther the random piece of data is from the mean The lower the standard deviation… - the closer the random piece of data is to the mean - the more clustering around the mean - the less variation/spread