Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Similar presentations


Presentation on theme: "Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4."— Presentation transcript:

1 Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4

2 Describing Variability Describes in an exact quantitative measure, how spread out/clustered together the scores are Variability is usually defined in terms of distance How far apart scores are from each other How far apart scores are from the mean How representative a score is of the data set as a whole

3 Describing Variability: the Range Simplest and most obvious way of describing variability Range =  Highest -  Lowest (real limits) The range only takes into account the two extreme scores and ignores any values in between. To counter this there the distribution is divided into quarters (quartiles). Q1 = 25%, Q2 =50%, Q3 =75%  The Interquartile range: the distance of the middle two quartiles (Q3 – Q1)  The Semi-Interquartile range: is one half of the Interquartile range

4 The most common percentiles are quartiles. Quartiles divide data sets into fourths or four equal parts. The 1 st quartile, denoted Q 1, divides the bottom 25% the data from the top 75%. Therefore, the 1 st quartile is equivalent to the 25 th percentile. The 2 nd quartile divides the bottom 50% of the data from the top 50% of the data, so that the 2 nd quartile is equivalent to the 50 th percentile, which is equivalent to the median. The 3 rd quartile divides the bottom 75% of the data from the top 25% of the data, so that the 3 rd quartile is equivalent to the 75 th percentile. Interquartile range (IQR)

5 The interquartile range (IQR) is the distance between the 75th percentile and the 25th percentile The IQR is essentially the range of the middle 50% of the data Because it uses the middle 50%, the IQR is not affected by outliers (extreme values)

6 Interquartile range (IQR) Example: Compute the interquartile range for the sorted 18, 33, 58, 67, 73, 93, 147 The 25th and 75th percentiles are the.25*(7+1) and.75*(7+1) = 2nd and 6th observations, respectively. IQR = 93-33 = 60.

7 Describing Variability: Deviation in a Population A more sophisticated measure of variability is one that shows how scores cluster around the mean Deviation is the distance of a score from the mean X - , e.g. 11 - 6.35 = 3.65, 3 – 6.35 = -3.35 A measure representative of the variability of all the scores would be the mean of the deviation scores  (X -  ) Add all the deviations and divide by n N  However the deviation scores add up to zero (as mean serves as balance point for scores)

8 Describing Variability: Variance in a Population To remove the +/- signs we simply square each deviation before finding the average. This is called the Variance:  (X -  )² = 106.55 = 5.33 N 20 The numerator is referred to as the Sum of Squares (SS): as it refers to the sum of the squared deviations around the mean value SS is a basic component of variability – the sum of squared deviation scores

9 Variability: Variance in a Population let X = [3, 4, 5,6, 7] Mean = 5 (X - Mean ) = [-2, -1, 0, 1, 2]  subtract Mean from each number in X (X - Mean ) 2 = [4, 1, 0, 1, 4]  squared deviations from the mean  (X - Mean ) 2 = 10  sum of squared deviations from the mean (SS)  (X - Mean ) 2 /N = 10/5 = 2  average squared deviation from the mean

10 Variability: Variance in a Population let X = [1, 3, 5, 7, 9] Mean = 5 (X - Mean) = [-4, -2, 0, 2, 4 ]  subtract Mean from each number in X (X - Mean) 2 = [16, 4, 0, 4, 16]  squared deviations from the mean  (X - Mean) 2 = 40  sum of squared deviations from the mean (SS)  (X - Mean) 2 /n = 40/5 = 8  average squared deviation from the mean

11 Variance can be calculated with the sum of squares (SS) divided by n Variability: Variance in a Population

12 Variability: Variance in a Sample Variance in a sample n is the number of scores -1 SS is the Sum of Squared Deviations From the Mean So, variance (S 2 ) is the average squared deviation from the mean

13 Describing Variability: Population and Sample Variance Population variance is designated by  ²  ² =  (X -  )² = SS N N Sample Variance is designated by s² Samples are less variable than populations: they therefore give biased estimates of population variability Degrees of Freedom (df): the number of independent (free to vary) scores. In a sample, the sample mean must be known before the variance can be calculated, therefore the final score is dependent on earlier scores: df = n -1 s² =  (x - M)² = SS = 106.55 = 5.61 n - 1 n -1 20 -1

14 Describing Variability: the Standard Deviation Variance is a measure based on squared distances In order to get around this, we can take the square root of the variance, which gives us the standard deviation Population (  ) and Sample (s) standard deviation  =   (X -  )² N s =   (X - M)² n - 1

15 Variability: Standard Deviation of a Sample The square root of Variance is called the Standard Deviation Variance Standard Deviation

16 Variability: Standard Deviation “The Standard Deviation tells us approximately how far the scores vary from the mean on average” It is approximately the average deviation of scores from the mean

17 The Standard Deviation and the Normal Distribution There are known percentages of scores above or below any given point on a normal curve 34% of scores between the mean and 1 SD above or below the mean An additional 14% of scores between 1 and 2 SDs above or below the mean Thus, about 96% of all scores are within 2 SDs of the mean (34% + 34% + 14% + 14% = 96%) Note: 34% and 14% figures can be useful to remember Probability Density

18 Describing Variability The standard deviation is the most common measure of variability, but the others can be used. A good measure of variability must: Must be stable and reliable: not be greatly affected by little details in the data  Extreme scores  Multiple sampling from the same population  Open-ended distributions Both the variance and SD are related to other statistical techniques

19 SS Computational Formula Note this formula on page 93. In later chapters, we will be using this alternate SS formula.

20 Credits http://www.le.ac.uk/pc/sk219/introtostats1.ppt#259,4,Plotting Data: describing spread of data http://math.usask.ca/~miket/Sullivan_PP/Chapter_3/sec3_4.ppt#24


Download ppt "Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4."

Similar presentations


Ads by Google