Download presentation
Presentation is loading. Please wait.
1
We have Been looking at:
Univariate data: Collecting sets of data and analysing it to see the results Ungrouped data set: a list of individual results (3, 4, 4, 2, 6, 5, 8, 5) Grouped data set: data is put into intervals rather than as individual results (e.g. 0-10, 11-20) Analysing measures of central tendency: Mean (average) Median (middle score when they are in order) Mode (most common) Analysing measures of spread Range Interquartile range (IQR)
2
13C – box and whisker plots
Range and interquartile range
3
5 point summaries A 5 point summary is a set of 5 numbers which represent (in order): The lowest score (Xmin) The lower quartile (Q1 or QL) Median The upper quartile (Q3 or QU) The highest score (Xmax)
4
5 point summary: worked example
From the following five-point summary: , find the median, the interquartile range (IQR), and the range In a 5 point summary, the 5 numbers are arranged in order of: Xmin, Q1, median, Q3 and Xmax The median = 39 The IQR = Q3 – Q1 = 44 – 37 = 7 The range = Xmax – Xmin = 48 – 29 = 19 Xmin Q1 Median Q3 Xmax 29 37 39 44 48
5
box and whisker plots A box and whisker plot (aka boxplot) is a diagram which illustrates the 5 points from a 5 point summary: The interquartile range is represented by the box The median is a vertical line within the box The whiskers (horizontal lines) represent the range Note: box and whisker plots are always drawn to scale (see two examples below), and can be vertical or horizontal
6
Extreme values: outliers
Extreme values (outliers) are data scores which appear out of sync with the rest of the data – they are either MUCH lower or higher than the rest of the data These often make the whiskers appear longer than they should and give the appearance that the data are spread over a much greater range than it really is. If an extreme value or outlier occurs in a set of data it can be shown by a small cross on the box-and-whisker plot. The whisker is then shortened to the next largest (or smallest) figure. For example, the box-and-whisker plot below shows that the lowest score was 5. This was an extreme value as the rest of the scores were located within the range 15 to 42. What values might be possible outliers in the following data set?
7
Box and whisker plot: worked example
The following stem-and-leaf plot gives the speed of 25 cars caught by a roadside speed camera 1. Prepare a five-point summary of the data. 2. Draw a box-and-whisker plot of the data. (Identify any extreme values.) 3. Describe the distribution of the data.
8
1. Five point summary In order to do the 5 point summary, we need to identify the median, Q1 and Q3, Xmin and Xmax Median: the middle score How many scores are there? 25 The position of the median is at (n + 1) ÷ 2 = (25 + 1) ÷ 2 = 26 ÷ 2 = 13. Therefore the median is the 13th score, which is 89 Q1 and Q3 are the medians of the bottom and top halves of the data There are 12 scores in each half of the data Q1 will be located at (n + 1) ÷ 2 = (12 + 1) ÷ 2 = 13 ÷ 2 = Therefore Q1 is between the 6th and 7th scores. The 6th score is 84 and the 7th score is 85, therefore Q1 is 84.5 Q3 will be between the 6th and 7th scores of the top half of the data which are 94 and 95, therefore Q3 is 94.5 Xmin and Xmax are the lowest and highest scores Xmin = 82 Xmax = 114 Now we can write the 5 point summary: 82, 84.5, 89, 94.5, 114
9
2. Draw a box and whisker plot of the data
Use the information from the previous question: 82, 84.5, 89, 94.5, 114 Start by drawing a scale (like a ruler) – don’t forget to add in the units (km/h) Then draw a dot at each of the 5 points from above underneath the scale Are there any outliers? If so, mark these with a cross and draw a dot at the next lowest/highest score Draw a box around the IQR Draw lines to connect Xmin and Xmax Draw a vertical line at the median Xmin Q1 Median Q3 Xmax 82 84.5 89 94.5 114
10
3. Describe the distribution range of the data
Simply write a sentence describing how the data is spread out: is there a big spread before/after the median? Is the data clustered around the median (i.e. a small box around the median)? Or is the data well spread out evenly? Are there any outliers? For this, we could say that: Even when the outlier is excluded, the data appears to be have the higher values being spread over a much greater range, whereas the lower end of the data is more clustered together.
11
questions Exercise 13C page 447: Questions 1, 3, 5, 6, 7, 8, 9, 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.