Download presentation
Presentation is loading. Please wait.
1
Measures of dispersion
2
Measure of Dispersion The measure of central tendency gives a single value that represents the whole value, however central tendency cannot describe the observation fully. The measure of dispersion helps us to study the variability of the items. With dispersion measures We measure the variation of the items among themselves We measure the variation around the average.
3
Measure of Dispersion With dispersion you can determine the reliability of the average.
4
Understanding dispersion
Consider the soil carbon study: µ = 33.90, median = µ = 2.73, median=2.46
5
Describing dispersion - range
The difference between the largest and the smallest value. It is the simplest measure of dispersion.
6
Describing dispersion - range
Min=1.327; Max.= Min= 0.283; Max.= 6.260 TC_g.kg = Min. 1st Qu. Median Mean 3rd Qu. Max. Log(TC_g.kg) = Min st Qu. Median Mean 3rd Qu. Max.
7
Describing dispersion - quartiles
Quartile deviation: difference between the upper quartile and lower quartiles taken and is called the interquartile range. Quartiles are the values that divide the list of numbers into quarters. First quartile: 25% of the values (numbers)in the data set lie below Q1 and about 75% lie above Q1. Third quartile: 75% of the values in the data lie below Q3 and about 25% lie above Q3.
8
Describing dispersion - quartiles
Min=1.327; Max.= Min= 0.283; Max.= 6.260 Q1 =7.800; Q3= Q1 =2.054; Q3= 3.125 Q1-Q3 = encompass 50% of all the data The central 50% not influenced by extreme values Will give a better estimate of data variability to expect Q1-Q3 = encompass 50% of all the data (this is 50% around the median – which remember is the midpoint of the data). The central 50% is not influenced by any of the extreme values so will give you a better estimate of the variability you may expect. TC_g.kg = Min. 1st Qu. Median Mean 3rd Qu. Max. Log(TC_g.kg) = Min st Qu. Median Mean 3rd Qu. Max.
9
Quartiles Q1 Q2 Q1 – Q2
10
The Variance The objective measure of data clustering around the mean
𝑠 2 = 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 Degrees of freedom (n-1) Used to account for the estimates made in calculation In this case the 𝑥 is an estimate of µ Issue with variance is the units of measurement eg. If data is in min then unit will be min2
11
Standard deviation 𝑠= 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1
𝑠= 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 Now the answer will have the same units of the measurement 1s = 68% data 2s = 95% data 3s = 99.7% 𝑥 = 0, s=1
12
Standard deviation vs Variance
Both are derived from the mean. VAR SD 𝑠 2 = 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 𝑠= 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1
13
Standard deviation vs Variance
The variance measures the average degree to which each point differs from the mean. SD is simply the square root of the variance. Why the calculation of the variance uses squares? 𝑠 2 = 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1
14
Standard deviation vs Variance
The calculation of the variance uses squares because it weights outliers more heavily than the data near to the mean. 1 , 2 , 5 𝑠 2 = 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 𝑠 2 = 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 𝑛−1
15
Standard deviation vs Variance
This also prevents differences above the mean from cancelling out those below, which can sometimes result in a variance of zero. 1 , 2 , -3
16
Standard deviation vs Variance
However, because of this squaring, the variance is no longer in the same unit of measurements as the original data. 𝑠 2 = 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 𝑠 2 = 𝑖=1 𝑛 (𝑚𝑒𝑡𝑒𝑟𝑠− 𝑚𝑒𝑡𝑒𝑟𝑠 ) 2 𝑛−1 =m2
17
Standard deviation vs Variance
Taking the root of the variance means the SD is restored to the original unit of measure. 𝑠= 𝑖=1 𝑛 ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 𝑠 = 𝑖=1 𝑛 (𝑚𝑒𝑡𝑒𝑟𝑠− 𝑚𝑒𝑡𝑒𝑟𝑠 ) 2 𝑛− =m
18
Comparing distributions
𝑥 = 10; black = s=1; blue s = 1.7 Which distribution has more variability in the data?
19
Population statistics
Parameter Mean 𝑥 Variance 𝑠 2 𝜎 2 Standard deviation 𝑠 𝜎
20
Exercise Provide the range, and 1st and 3rd quartiles for your two data sets Manually calculate in excel the standard deviation for your sample of soil values. Using the stdev formula in excel calculate the standard deviations for all the sample sets How variable are the standard deviations? Which of the datasets comes closest to the population std deviation? (70.27 g.kg-1)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.