Measures of Position
● The standard deviation is a measure of dispersion that uses the same dimensions as the data (remember the empirical rule) ● The distance of a data value from the mean, calculated as the number of standard deviations, would be a useful measurement ● This distance is called the z-score
● If the mean was 20 and the standard deviation was 6 The value 26 would have a z-score of 1.0 (1.0 standard deviation higher than the mean) The value 14 would have a z-score of –1.0 (1.0 standard deviation lower than the mean) The value 17 would have a z-score of –0.5 (0.5 standard deviations lower than the mean) The value 20 would have a z-score of 0.0
● The population z-score is calculated using the population mean and population standard deviation ● The sample z-score is calculated using the sample mean and sample standard deviation
● z-scores can be used to compare the relative positions of data values in different samples Pat received a grade of 82 on her statistics exam where the mean grade was 74 and the standard deviation was 12 Pat received a grade of 72 on her biology exam where the mean grade was 65 and the standard deviation was 10 Pat received a grade of 91 on her kayaking exam where the mean grade was 88 and the standard deviation was 6 Calculate each z-score and see what class has the highest RELATIVE grade.
● Statistics Grade of 82 z-score of (82 – 74) / 12 =.67 ● Biology Grade of 72 z-score of (72 – 65) / 10 =.70 ● Kayaking Grade of 81 z-score of (91 – 88) / 6 =.50 ● Biology was the highest relative grade
The median divides the lower 50% of the data from the upper 50% The median is the 50 th percentile If a number divides the lower 34% of the data from the upper 66%, that number is the 34 th percentile
The quartiles are the 25 th, 50 th, and 75 th percentiles Q 1 = 25 th percentile Q 2 = 50 th percentile = median Q 3 = 75 th percentile Quartiles are the most commonly used percentiles The 50 th percentile and the second quartile Q 2 are both other ways of defining the median
● Quartiles divide the data set into four equal parts ● The top quarter are the values between Q 3 and the maximum ● The bottom quarter are the values between the minimum and Q 1
Quartiles divide the data set into four equal parts The interquartile range (IQR) is the difference between the third and first quartiles IQR = Q 3 – Q 1 The IQR is a resistant measurement of dispersion
Can we find the Quartiles with a Calculator? Data 1,2,3,4,5,6,8,10,15,20
● Extreme observations in the data are referred to as outliers ● Outliers should be investigated ● Outliers could be Chance occurrences Measurement errors Data entry errors Sampling errors ● Outliers are not necessarily invalid data
● One way to check for outliers uses the quartiles ● Outliers can be detected as values that are significantly too high or too low, based on the known spread ● The fences used to identify outliers are Lower fence = LF = Q 1 – 1.5 IQR Upper fence = UF = Q IQR ● Values less than the lower fence or more than the upper fence could be considered outliers
● Are there any outliers? 1, 3, 4, 7, 8, 15, 16, 19, 23, 24, 27, 31, 33, 54 ● Calculations (You can use your Calculator to find these!) Q 1 = 7 Q 3 = 27 IQR = 20 Lower Fence = Q 1 – 1.5 IQR Upper Fence = Q IQR
z-scores Measures the distance from the mean in units of standard deviations Can compare relative positions in different samples Percentiles and quartiles Divides the data so that a certain percent is lower and a certain percent is higher Outliers Extreme values of the variable Can be identified using the upper and lower fences