Chapter 4 – Statistics II 04 Chapter 4 – Statistics II In this chapter you have learned about: Measures of centre (mean, mode and median) Sampling variability Measures of variation Measures of relative standing The normal distribution and the empirical rule Distributions
Advantages/disadvantages 04 Statistics II Measures of Centre Definitions We will look at three measures of centre: the mean, the mode and the median. The mean of a set of values is the sum of all the values divided by the number of values. The mode of a set of values is the value that has the greatest frequency (occurs most often). The median of a set of values is the middle value when the values are arranged in order. Deciding Which Average to Use Average When to use Advantages/disadvantages Mode Usually for categorical data. Advantages: Easy to find. Not affected by extreme values. Disadvantage: There is not always a mode. Median For numerical data. If there are extreme values. Advantages: Easy to calculate. Not affected by extreme values. Mean For numerical data. If there are not extreme values. Advantage: It uses all the data. Disadvantage: It is affected by extreme values.
04 Statistics II Measures of Variation Range and Interquartile Range Definitions The range is the difference between the maximum value and the minimum value. Q1 The lower quartile of a ranked set of data is a value such that one-quarter of the values are less than or equal to it. Q2 The second quartile is the median of the data. Q3 The upper quartile of a ranked set of data is a value such that three-quarters of the values are less than or equal to it. The inter quartile range = Q2− Q1. Outliers are extreme values that are not typical of the other values in a data set. Standard Deviation Definition This is the measure of the average deviation or spread from the mean of all values in a set.
04 Statistics II Formula 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑠𝑒𝑡 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝜎= (𝑥−𝜇 ) 2 𝑛 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑡𝑎𝑏𝑙𝑒 𝜎= 𝑓(𝑥−𝜇 ) 2 𝑓 (F and T: P33) 𝜎 is the standard deviation means ′sum of ′ 𝑥 is the variable 𝜇 ( 𝑥 ) is the mean 𝑛 is the number of variables 𝑓 is the frequency Find (i) the range (ii) the standard deviation of {1, 5, 9, 14, 21} Why is the standard deviation a better measure of spread than the range? (i) The range = 21 − 1 = 20. Note: 𝒅=𝒙−𝝁. 1 5 9 14 21 10 81 25 1 16 121 𝑑 2 = 244 −9 −5 −1 4 11 (ii) 𝜇 ( 𝑥 ) = 1+5+9+14+21 5 =10 𝜎= (𝑑 ) 2 𝑛 = 244 5 =6∙99 The standard deviation is better because it uses all the numbers.
Note: 14–18 means 14 is included and 18 is not. 04 Statistics II One hundred students are given a maths problem to solve. The times taken to solve the problem are as follows: Using mid-interval values, estimate the mean of the distribution, and hence, estimate the standard deviation from the mean. Give your answers to two decimal places. Note: 14–18 means 14 is included and 18 is not. M.I.V. x f fx Mean Dev. 12 16 20 24 28 13 28 26 21 12 156 448 520 504 336 19∙64 −7∙64 −3∙64 0∙36 4∙36 8∙36 58∙3696 13∙2496 0∙1296 19∙0096 69∙8896 758∙8048 370∙9888 3∙3696 399∙2016 838∙6752 100 1,964 2,371∙04 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝜎 = 𝑓 𝑑 2 𝑓 = 2,371∙04 100 =4∙87 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑒𝑎𝑛 𝜇 = 𝑓𝑥 𝑓 = 1,964 100 =19∙64
Measure of Relative Standing 04 Statistics II Measure of Relative Standing Measures of relative standing are used to compare values within a data set or to compare values from different data sets. Percentiles In a class of 30 students John has scored 85% in the test. Twenty-four students scored lower than John. What is John’s relative standing in the group for this test? John’s percentile ranking = 24 30 ×100= 80 th percentile. 80% of the class scored lower than John in the test. z-Scores Dublin’s maximum February temperatures average 6∙5°C with a standard deviation of 0∙75°C, while in July the mean maximum temperature is 18°C with a standard deviation of 1∙5°C. In which month is it more unusual to have a maximum temperature of 10°C? February: 𝑧= 10−6∙5 0∙75 =4∙6 July: 𝑧= 10−18 1∙5 =−5 1 3 It is more unusual to have a maximum temperature of 10°C in July because 5 1 3 > 4.6. Note that both z-scores are unusual values, as they lie outside -2 ≤ z ≤ 2.
The Normal Distribution and the Empirical Rule 04 Statistics II The Normal Distribution and the Empirical Rule The Empirical Rule In any normal distribution: 68% of the population lies within one standard deviation of the mean. Approximately 95% of the population lies within two standard deviations of the mean. Approximately 99∙7% of the population lies within three standard deviations of the mean. Approximately 68% of the distribution Approximately 95% of the distribution Approximately 99∙7% of the distribution