Measures of Dispersion

Slides:



Advertisements
Similar presentations
Measures of Central Tendency
Advertisements

Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Measures of Central Tendency & Spread
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter 3 Descriptive Measures
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
1 Review Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central Measures (mean,
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Objectives The student will be able to: find the variance of a data set. find the standard deviation of a data set.
13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Variability Pick up little assignments from Wed. class.
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
1.  In the words of Bowley “Dispersion is the measure of the variation of the items” According to Conar “Dispersion is a measure of the extent to which.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
CHAPTER 2: Basic Summary Statistics
Measures of Central Tendency: Averages or other measures of “location” that find a single number that reflects the middle of the distribution of scores—
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
One-Variable Statistics. Descriptive statistics that analyze one characteristic of one sample  Where’s the middle?  How spread out is it?  How do different.
Chapter 3 Numerical Descriptive Measures. 3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram.
Chapter 4 – Statistics II
Descriptive Statistics ( )
GOVT 201: Statistics for Political Science
Measures of Dispersion
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
One-Variable Statistics
Business Decision Making
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Mathematical Presentation of Data Measures of Dispersion
Descriptive Statistics (Part 2)
Introductory Mathematics & Statistics
Distribution of the Sample Means
Chapter 6 ENGR 201: Statistics for Engineers
Descriptive Statistics: Presenting and Describing Data
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Summary descriptive statistics: means and standard deviations:
Chapter 3 Describing Data Using Numerical Measures
Objectives The student will be able to: use Sigma Notation
Numerical Descriptive Measures
Descriptive Statistics
Central tendency and spread
Measures of Location Statistics of location Statistics of dispersion
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Data with Numerical Measures
Numerical Descriptive Measures
Lesson 1: Summarizing and Interpreting Data
Summary descriptive statistics: means and standard deviations:
Measures in Variability
Example: Sample exam scores, n = 20 (“sample size”) {60, 60, 70, 70, 70, 70, 70, 70, 70, 70, 80, 80, 80, 80, 90, 90, 90, 90, 90, 90} Because there are.
Numerical Descriptive Measures
Numerical Descriptive Statistics
Mean Deviation Standard Deviation Variance.
Numerical Descriptive Measures
Summary (Week 1) Categorical vs. Quantitative Variables
CHAPTER 2: Basic Summary Statistics
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
St. Edward’s University
Measures of Dispersion
Business and Economics 7th Edition
Numerical Descriptive Measures
Numerical Statistics Measures of Variability
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Measures of Dispersion Measures of dispersion are descriptive statistics that show how similar or varied the data are for a particular variable (or data item). Measures of spread include the range, quartiles and the interquartile range, variance, standard deviation and coefficient of variation. Measures of dispersion (variability) will provide more information, specifically about the level of spread of the data around the mean, which will make the data more useful for the user.

Summarising the dataset can help us understand the data, especially when the dataset is large.

Why look at dispersion? The mode, median, and mean summarise the data into a single value that is typical or representative of all the values in the dataset. But this is only part of the 'picture' that summarises a dataset. Measures of spread summarise the data in a way that shows how scattered the values are and how much they differ from the mean value. Batsman A has four innings and scores 25, 25, 25, 25 Batsman B has four innings and scores 0, 0, 0, 100 They both average 25 but they are very different scores.

Measures of Dispersion Measures of dispersion are sometimes referred to as variation or spread. The main measures of dispersion are: Range Quartile deviation Mean deviation Standard deviation Variance Coefficient of variation Write these on the board

Range Measures the difference between the highest and the lowest item of the data. Range = highest observation – lowest observation While easy to calculate and understand, the range can easily be distorted by extreme values.

Example using Range

Quartiles . The quartiles divide the set of measurements into four equal parts.  Twenty-five per cent of the measurements are less than the lower quartile Fifty per cent of the measurements are less than the median Seventy-five per cent of the measurements are less than the upper quartile.  So, fifty per cent of the measurements are between the lower quartile and the upper quartile. The lower quartile, median and upper quartile are often denoted by Q1, Q2 and Q3 respectively. The median is also denoted by m.

Quartiles A quartile is found by dividing by dividing the arrayed data into four quarters. There will be three quartiles (not four!). Draw a line on the board and split into quartiles – label Q1 Q2 Q3

To determine the interquartile range deduct Q1 from Q3

Quartile Deviation

Calculating quartiles Let n = the number of observations Where n/4 is not a whole number - let m= the next whole number larger than n/4 the lower quartile is the mth observation of the sorted data counting from the lower end. the upper quartile is the mth observation of the sorted data counting from the upper end. Write this on board

Calculating quartiles Where n/4 is a whole number - let m= n/4 the lower quartile is halfway between the mth observation and the (m + 1)th observation of the sorted data counting from the lower end. the upper quartile is similarly defined counting from the upper end Write this on board

Array data across board

The median of an even data set is calculated as the average of n/2 and [(n/2) +1] Work out the mean as well for next segment on Mean Deviation n/4 is a whole number so Where n/4 is a whole number - let m= n/4 the lower quartile is halfway between the mth observation and the (m + 1)th observation of the sorted data counting from the lower end. the upper quartile is similarly defined counting from the upper end The median of an even data set is calculated as the average of n/2 + [(n/2) +1]

Benefits of the interquartile range By measuring the middle 50% of values only, the interquartile range overcomes the problem of outlying observations. It may be calculated from grouped frequency distributions that contain open-ended class intervals but still ignores 50% of the values in the distribution

Deviation is the difference between each item of data and the mean. Mean Deviation Deviation is the difference between each item of data and the mean. The mean deviation measures the average distance of each observation away from the mean of the data. Mean deviation gives an equal weight to each observation and is generally more sensitive than either the range or interquartile range, since a change in any value will affect it. A measure that does take into account the actual value of each observation is the Mean Deviation.

Calculate the Mean Deviation Calculate the mean of the data Subtract the mean from each observation and record the difference Write down the absolute value of each of the differences (i.e. ignore positive and negative signs) Calculate the mean of the absolute values

Using Statistical Notation The four steps for mean deviation are written as 1. Find x̅ 2. For each x, find x – x̅ 3. Now find Ix - x̅I for each x 4. Find ΣIx - x̅I and divide by n Calculate the mean of the data Subtract the mean from each observation and record the difference Write down the absolute value of each of the differences (i.e. ignore positive and negative signs) Calculate the mean of the absolute values

Example using Mean Deviation The batting score of two cricketers, Joe and John were recorded over their 10 completed innings to date. Their scores were Joe 32 27 38 25 20 32 34 28 40 29 John 3 80 64 5 11 87 0 2 53 0 1. For each cricketer calculate the batting average (mean score) and the mean deviation 2. There is only one batting position left on the team for the next match. Would you pick Joe or John? Why?

Joe’s batting average x̅ = 32+27+38+25+20+32+34+28+40+29 10 = 30.5 runs

John’s batting average x̅ = 3+80+64+5+11+87+0+2+53+0 10 x̅ = 30.5 runs

Absolute value of deviation . Mean Deviation calculations for Joe Score ( x ) Deviation from mean ( x - x̅ ) Absolute value of deviation I x - x̅ I 32 +1.5 1.5 27 -3.5 3.5 38 +7.5 7.5 25 -5.5 5.5 20 -10.5 10.5 34 +3.5 28 -2.5 2.5 40 +9.5 9.5 29 -1.5 Σ( x - x̅ ) = 0 ΣI x - x̅ I = 47.0 Mean = 30.5

Joe’s mean deviation Joe = ΣIx - x̅I n = 47.0 10 = 4.7

Absolute value of deviation . Mean Deviation calculations for John Score ( x ) Deviation from mean ( x - x̅ ) Absolute value of deviation I x - x̅ I 3 -27.5 27.5 80 +49.5 49.5 64 +33.5 33.5 5 -25.5 25.5 11 -19.5 19.5 87 +56.5 56.5 -30.5 30.5 2 -28.5 28.5 53 +22.5 22.5 Σ( x - x̅ ) = 0 ΣI x - x̅ I = 324.0 Mean = 30.5 Mean deviation =324/10 = 32.4

John’s mean deviation John = ΣI x - x̅I n = 324.0 10 = 32.4

Who fills the batting position? It depends on your priorities! If you are looking for a consistent batter, the choice will be Joe, since he has a much smaller mean deviation. While he probably would not make a large score, his past record indicates he can be relied on to make a score fairly close to his average (the mean deviation of his score is less than 5).

. If you are looking for a batter who could possibly obtain a large score (and in doing so considerably help to win a match) then John will be the choice. However there also seems a high risk that he would get a very low score.

Standard Deviation The standard deviation measures the average distance each item of data is from the mean. It differs from the mean deviation in that it squares each deviation and then finds the square root of this rather than taking the absolute value. Standard deviation is the most commonly used measure of dispersion for statisticians.

The aim is basically to find an ‘average’ measure of each observation away from the mean of the set of observations.

Standard deviation formula for Populations can be written as Talk here about the formula for Samples?

The formula for Population standard deviation can also be written . . _____ Write this on board with ‘population’

Sample Standard Deviation In practice, it is rare to calculate the value of mu since populations are usually very large. Instead, it is far more likely that the sample standard deviation (denoted by S) will be required. The formula for calculating S is not the same as simply substituting S for and n for N. There are good theoretical reasons for not doing so. Although it would be tempting…the formula

The use of n-1 If we did this, and used the value of S to estimate the value of , the result would be too small. To correct this error, instead of dividing by n we divide by (n-1). This results in the following formula for S: Write this on board with ‘sample’

The Formula for Samples What do all these letters stand for?

Standard deviation example A market researcher, Gavin, was interested in the discrepancy in the prices charged by supermarkets for a leading brand of pet food. To check this he selected a random sample of 12 stores and recorded the price displayed for the same 400 gram can. The prices in cents were 89 72 77 78 82 94 80 88 85 73 76 Find a) the mean b) the range of prices c) the mean deviation of prices d) the standard deviation of prices This is from Croucher 5th edition p351

Now use the Financial Calculator to Find the Mean and Standard Deviation… check the question to see if it a sample or a population. Distribute handout

Important points about the Standard Deviation The standard deviation can not be negative The more scattered the data, the greater the standard deviation The standard deviation of a set of data is zero if, and only if, the observations are of equal value A rough guide to whether a calculated answer is ‘reasonable’ is for the standard deviation to be approximately 30% of the range

Note for this data set is the standard deviation around 30% of the range? Range is …… 94 – 72 = 22 Standard Deviation is 6.7 … 22 x .3 = 6.6 …. It won’t always be this close Distribute handout

More important points on the Standard Deviation The standard deviation can never exceed the range of data Due to the squaring operation involved in its calculation, the standard deviation is more influenced by extreme values than is the mean deviation and is usually slightly larger than the mean deviation The square of the standard deviation is called variance

Variance Variance measures the spread (in total) of the data. Variance is equal to the square of the standard deviation so Variance = (Standard Deviation) 2

Example using standard deviation Batsman A has four innings & scores 25, 25, 25, 25 Batsman B scores 0, 0, 0, 100 What are their averages ? What are their Standard Deviations?

Using the calculator Stat Mode 1,1 then 25, xy, 0, ENT, 25,xy, 0, ENT, RCL 4 and RCL 7 will give the calculation for the mean score for each batsman. Both have an average of 25 but Batsman A has a standard deviation of 0 and Batsman B has a Standard Deviation of 43.3.

What is the difference between the Population and a Sample? How can I remember that on my calculator? Sample smaller than the population 5<6 and 8<9? OR “S” for sample

Back to our batsmen …. Batsman A has four innings and scores 25, 25, 25, 25 Batsman B scores 0, 0, 0, 100 What are their Standard Deviations? If we took a sample of their batting scores – perhaps there were 20 innings and we sampled 4 innings – or the population that is they had only batted 4 times – these were the complete scores Batsman A has a standard deviation of 0 whether it is a sample or not (RCL 5, RCL 6) and Batsman B has a Standard Deviation of 50 if it was a sample (RCL 8) and 43.3 if it was the population (total data) (RCL 9) Long Hand calculation : -

Long Hand calculation : Sample for A (0^2 + 0^2 + 0^2 + 0^2) / 3 = 0 Population for A (0^2 + 0^2 + 0^2 + 0^2) / 4 = 0   Dev Scores B From mean Squared 1 -25 625 2 3 4 100 75 5625 Total 7500 Sum of deviations divided by 3 2500 Now find the square root 50 Sum of deviations divided by 4 1875 43.30127 Answers given here for both population and sample

Coefficient of variation This is a measure of relative variability. It is used to measure the changes that have taken place in a population over time, or to compare the variability of two populations that are expressed in different units of measurement. It is expressed as a percentage rather than in terms of the units of the particular data.

V = 100 multiplied by S and divided by x̅ Another formula The formula for the coefficient of variation, denoted by V is: V = 100 multiplied by S and divided by x̅ Where x̅ = the mean of the sample S = the standard deviation of the sample V = 100 . S. % x̅

This is the Standard Deviation divided by the mean – that is the ratio of the standard deviation to the mean – the higher the figure the greater the deviation Back to Batsman B we would have a Coefficient of variation of 50 / 25 = 2 – quite a significant variation

Using the calculator for the Standard Deviation – Mode 1,0 , then 10, ENT, 15, ENT………. Then RCL 5 since the question said it was a sample ( not RCL 6) Answer is 4.1231

Using Calc – Mode, 1,0 (2nd f , Alpha,0,0 – to clear just in case 36, xy, 3, ENT, 37, xy, 3, ENT ………. Then RCL 4 for the mean and RCL 5 for sample deviation = 1.70

Note we will get the calculator to calculate the standard deviation – just demo long hand calculation here – also shouldn’t be asked for the Mean Deviation in a class test.

Suggested Questions from Textbook…… Select a range of questions from the Problems in this chapter – enough so that you feel comfortable with this topic