Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Dispersion

Similar presentations


Presentation on theme: "Measures of Dispersion"— Presentation transcript:

1 Measures of Dispersion
Measures of dispersion are descriptive statistics that show how similar or varied the data are for a particular variable (or data item). Measures of spread include the range, quartiles and the interquartile range, variance, standard deviation and coefficient of variation. Measures of dispersion (variability) will provide more information, specifically about the level of spread of the data around the mean, which will make the data more useful for the user.

2 Summarising the dataset can help us understand the data, especially when the dataset is large.

3 Why look at dispersion? The mode, median, and mean summarise the data into a single value that is typical or representative of all the values in the dataset. But this is only part of the 'picture' that summarises a dataset. Measures of spread summarise the data in a way that shows how scattered the values are and how much they differ from the mean value. Batsman A has four innings and scores 25, 25, 25, 25 Batsman B has four innings and scores 0, 0, 0, 100 They both average 25 but they are very different scores.

4 Measures of Dispersion
Measures of dispersion are sometimes referred to as variation or spread. The main measures of dispersion are: Range Quartile deviation Mean deviation Standard deviation Variance Coefficient of variation Write these on the board

5 Range Measures the difference between the highest and the lowest item of the data. Range = highest observation – lowest observation While easy to calculate and understand, the range can easily be distorted by extreme values.

6 Example using Range

7 Quartiles . The quartiles divide the set of measurements into four equal parts.  Twenty-five per cent of the measurements are less than the lower quartile Fifty per cent of the measurements are less than the median Seventy-five per cent of the measurements are less than the upper quartile.  So, fifty per cent of the measurements are between the lower quartile and the upper quartile. The lower quartile, median and upper quartile are often denoted by Q1, Q2 and Q3 respectively. The median is also denoted by m.

8 Quartiles A quartile is found by dividing by dividing the arrayed data into four quarters. There will be three quartiles (not four!). Draw a line on the board and split into quartiles – label Q1 Q2 Q3

9 To determine the interquartile range
deduct Q1 from Q3

10 Quartile Deviation

11 Calculating quartiles
Let n = the number of observations Where n/4 is not a whole number - let m= the next whole number larger than n/4 the lower quartile is the mth observation of the sorted data counting from the lower end. the upper quartile is the mth observation of the sorted data counting from the upper end. Write this on board

12 Calculating quartiles
Where n/4 is a whole number - let m= n/4 the lower quartile is halfway between the mth observation and the (m + 1)th observation of the sorted data counting from the lower end. the upper quartile is similarly defined counting from the upper end Write this on board

13 Array data across board

14 The median of an even data set is calculated as
the average of n/2 and [(n/2) +1] Work out the mean as well for next segment on Mean Deviation n/4 is a whole number so Where n/4 is a whole number - let m= n/4 the lower quartile is halfway between the mth observation and the (m + 1)th observation of the sorted data counting from the lower end. the upper quartile is similarly defined counting from the upper end The median of an even data set is calculated as the average of n/2 + [(n/2) +1]

15

16 Benefits of the interquartile range
By measuring the middle 50% of values only, the interquartile range overcomes the problem of outlying observations. It may be calculated from grouped frequency distributions that contain open-ended class intervals but still ignores 50% of the values in the distribution

17 Deviation is the difference between each item of data and the mean.
Mean Deviation Deviation is the difference between each item of data and the mean. The mean deviation measures the average distance of each observation away from the mean of the data. Mean deviation gives an equal weight to each observation and is generally more sensitive than either the range or interquartile range, since a change in any value will affect it. A measure that does take into account the actual value of each observation is the Mean Deviation.

18 Calculate the Mean Deviation
Calculate the mean of the data Subtract the mean from each observation and record the difference Write down the absolute value of each of the differences (i.e. ignore positive and negative signs) Calculate the mean of the absolute values

19 Using Statistical Notation
The four steps for mean deviation are written as 1. Find x̅ 2. For each x, find x – x̅ 3. Now find Ix - x̅I for each x 4. Find ΣIx - x̅I and divide by n Calculate the mean of the data Subtract the mean from each observation and record the difference Write down the absolute value of each of the differences (i.e. ignore positive and negative signs) Calculate the mean of the absolute values

20

21 Example using Mean Deviation
The batting score of two cricketers, Joe and John were recorded over their 10 completed innings to date. Their scores were Joe John 1. For each cricketer calculate the batting average (mean score) and the mean deviation 2. There is only one batting position left on the team for the next match. Would you pick Joe or John? Why?

22 Joe’s batting average x̅ = = 30.5 runs

23 John’s batting average
x̅ = x̅ = 30.5 runs

24 Absolute value of deviation
. Mean Deviation calculations for Joe Score ( x ) Deviation from mean ( x - x̅ ) Absolute value of deviation I x - x̅ I 32 +1.5 1.5 27 -3.5 3.5 38 +7.5 7.5 25 -5.5 5.5 20 -10.5 10.5 34 +3.5 28 -2.5 2.5 40 +9.5 9.5 29 -1.5 Σ( x - x̅ ) = 0 ΣI x - x̅ I = 47.0 Mean = 30.5

25 Joe’s mean deviation Joe = ΣIx - x̅I n = 47.0 10 = 4.7

26 Absolute value of deviation
. Mean Deviation calculations for John Score ( x ) Deviation from mean ( x - x̅ ) Absolute value of deviation I x - x̅ I 3 -27.5 27.5 80 +49.5 49.5 64 +33.5 33.5 5 -25.5 25.5 11 -19.5 19.5 87 +56.5 56.5 -30.5 30.5 2 -28.5 28.5 53 +22.5 22.5 Σ( x - x̅ ) = 0 ΣI x - x̅ I = 324.0 Mean = 30.5 Mean deviation =324/10 = 32.4

27 John’s mean deviation John = ΣI x - x̅I n = = 32.4

28 Who fills the batting position?
It depends on your priorities! If you are looking for a consistent batter, the choice will be Joe, since he has a much smaller mean deviation. While he probably would not make a large score, his past record indicates he can be relied on to make a score fairly close to his average (the mean deviation of his score is less than 5).

29 . If you are looking for a batter who could possibly obtain a large score (and in doing so considerably help to win a match) then John will be the choice. However there also seems a high risk that he would get a very low score.

30 Standard Deviation The standard deviation measures the average distance each item of data is from the mean. It differs from the mean deviation in that it squares each deviation and then finds the square root of this rather than taking the absolute value. Standard deviation is the most commonly used measure of dispersion for statisticians.

31 The aim is basically to find an ‘average’ measure of each observation away from the mean of the set of observations.

32 Standard deviation formula for Populations can be written as
Talk here about the formula for Samples?

33 The formula for Population standard deviation can also be written
. . _____ Write this on board with ‘population’

34 Sample Standard Deviation
In practice, it is rare to calculate the value of mu since populations are usually very large. Instead, it is far more likely that the sample standard deviation (denoted by S) will be required. The formula for calculating S is not the same as simply substituting S for and n for N. There are good theoretical reasons for not doing so. Although it would be tempting…the formula

35 The use of n-1 If we did this, and used the value of S to estimate the value of , the result would be too small. To correct this error, instead of dividing by n we divide by (n-1). This results in the following formula for S: Write this on board with ‘sample’

36 The Formula for Samples
What do all these letters stand for?

37 Standard deviation example
A market researcher, Gavin, was interested in the discrepancy in the prices charged by supermarkets for a leading brand of pet food. To check this he selected a random sample of 12 stores and recorded the price displayed for the same 400 gram can. The prices in cents were 89 72 77 78 82 94 80 88 85 73 76 Find a) the mean b) the range of prices c) the mean deviation of prices d) the standard deviation of prices This is from Croucher 5th edition p351

38

39

40 Now use the Financial Calculator to Find the Mean and Standard Deviation… check the question to see if it a sample or a population. Distribute handout

41 Important points about the Standard Deviation
The standard deviation can not be negative The more scattered the data, the greater the standard deviation The standard deviation of a set of data is zero if, and only if, the observations are of equal value A rough guide to whether a calculated answer is ‘reasonable’ is for the standard deviation to be approximately 30% of the range

42 Note for this data set is the standard deviation around 30% of the range?
Range is …… 94 – 72 = 22 Standard Deviation is … x .3 = 6.6 …. It won’t always be this close Distribute handout

43 More important points on the Standard Deviation
The standard deviation can never exceed the range of data Due to the squaring operation involved in its calculation, the standard deviation is more influenced by extreme values than is the mean deviation and is usually slightly larger than the mean deviation The square of the standard deviation is called variance

44 Variance Variance measures the spread (in total) of the data. Variance is equal to the square of the standard deviation so Variance = (Standard Deviation) 2

45 Example using standard deviation
Batsman A has four innings & scores 25, 25, 25, 25 Batsman B scores 0, 0, 0, 100 What are their averages ? What are their Standard Deviations?

46 Using the calculator Stat Mode 1,1 then 25, xy, 0, ENT, 25,xy, 0, ENT,
RCL 4 and RCL 7 will give the calculation for the mean score for each batsman. Both have an average of 25 but Batsman A has a standard deviation of 0 and Batsman B has a Standard Deviation of 43.3.

47

48 What is the difference between the Population and a Sample?
How can I remember that on my calculator? Sample smaller than the population 5<6 and 8<9? OR “S” for sample

49 Back to our batsmen …. Batsman A has four innings and scores 25, 25, 25, 25 Batsman B scores 0, 0, 0, 100 What are their Standard Deviations? If we took a sample of their batting scores – perhaps there were 20 innings and we sampled 4 innings – or the population that is they had only batted 4 times – these were the complete scores Batsman A has a standard deviation of 0 whether it is a sample or not (RCL 5, RCL 6) and Batsman B has a Standard Deviation of 50 if it was a sample (RCL 8) and 43.3 if it was the population (total data) (RCL 9) Long Hand calculation : -

50 Long Hand calculation : Sample for A (0^2 + 0^2 + 0^2 + 0^2) / 3 = 0
Population for A (0^2 + 0^2 + 0^2 + 0^2) / 4 = 0 Dev Scores B From mean Squared 1 -25 625 2 3 4 100 75 5625 Total 7500 Sum of deviations divided by 3 2500 Now find the square root 50 Sum of deviations divided by 4 1875 Answers given here for both population and sample

51 Coefficient of variation
This is a measure of relative variability. It is used to measure the changes that have taken place in a population over time, or to compare the variability of two populations that are expressed in different units of measurement. It is expressed as a percentage rather than in terms of the units of the particular data.

52 V = 100 multiplied by S and divided by x̅
Another formula The formula for the coefficient of variation, denoted by V is: V = 100 multiplied by S and divided by x̅ Where x̅ = the mean of the sample S = the standard deviation of the sample V = S. %

53

54

55

56 This is the Standard Deviation divided by the mean – that is the ratio of the standard deviation to the mean – the higher the figure the greater the deviation Back to Batsman B we would have a Coefficient of variation of 50 / 25 = 2 – quite a significant variation

57 Using the calculator for the Standard Deviation – Mode 1,0 , then 10, ENT, 15, ENT………. Then RCL 5 since the question said it was a sample ( not RCL 6) Answer is

58

59 Using Calc – Mode, 1,0 (2nd f , Alpha,0,0 – to clear just in case
36, xy, 3, ENT, 37, xy, 3, ENT ………. Then RCL 4 for the mean and RCL 5 for sample deviation = 1.70

60 Note we will get the calculator to calculate the standard deviation – just demo long hand calculation here – also shouldn’t be asked for the Mean Deviation in a class test.

61 Suggested Questions from Textbook……
Select a range of questions from the Problems in this chapter – enough so that you feel comfortable with this topic


Download ppt "Measures of Dispersion"

Similar presentations


Ads by Google