Download presentation
Presentation is loading. Please wait.
1
REPRESENTATION OF DATA
2
Histograms A Histogram is a graphical representation of the distribution of data. It consists of adjacent rectangles with an area equal to the frequency of the observations in the interval. The height of a rectangle is equal to the frequency density of the interval. Frequency Class width Frequency density = The rectangles of a histogram are drawn so that they touch each other (i.e. no gaps as a bar chart has) to indicate that the original variable is continuous.
3
Example 1: The histogram below shows the speed in miles per hour, of
cars on a motorway. 50 60 90 70 80 1 6 5 4 3 2 Frequency Density Speed (m.p.h.) Complete the frequency table. x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 Estimate the number of cars with a speed of between 70m.p.h. and 85 m.p.h. b) Find an estimate of the mean speed of the cars.
4
For 65-75: For 75-90: Frequency density = Frequency Class width
50 60 90 70 80 1 6 5 4 3 2 Frequency Density For 65-75: For 75-90: Speed (m.p.h.) Frequency density = Frequency Class width f.d. × c.w = frequency frequency = 3 × 10 = 30 frequency = 1 × 15 = 15 x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 30 15
5
a) Half of the cars in the 65 – 75 group have a speed of 70 m.p.h.
For the number of cars with a speed of between 70m.p.h. and 85 m.p.h. 50 60 90 70 80 1 6 5 4 3 2 Frequency Density We want to find the number of cars represented by the shaded region. Speed (m.p.h.) x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 15 a) Half of the cars in the 65 – 75 group have a speed of 70 m.p.h. or more. Two thirds of the cars in the 75 – 90 group have a speed of 70 m.p.h. or more. = 25 cars have speeds between 70 m.p.h. and 85 m.p.h.
6
x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 15 b) For the mean, the mid-points of x are needed. Speed x f fx 50 – 55 12 55 – 60 20 60 – 65 30 65 – 75 75 – 90 15 52.5 57.5 62.5 70 82.5 630 1150 1875 2100 1237.5 6992.5 107 = Totals: 107 6992.5 = 65.35 The mean speed of the cars is 65.4 m.p.h. (3 sig.figs)
7
Example 2: In a fitness centre survey a random sample of 100 men
were asked how many hours, to the nearest hour, they spent jogging in the last week. The results are summarised below. Number of hours Frequency 0 – 2 17 3 – 5 24 6 – 10 29 11 – 15 30 A histogram was drawn and the group (3 – 5) hours was represented by a rectangle that was 1.5 cm wide and 12 cm high. Calculate the width and height of the rectangle representing the group (11 – 15) hours. The height of each rectangle is proportional to the frequency density. Frequency Class width Frequency density =
8
For the (3 – 5) group, the class width of 3 is represented by 1.5cm.
Number of hours Boundaries Frequency Frequency density 0 – 2 17 3 – 5 24 6 – 10 29 11 – 15 30 2.5 – 5.5 8 10.5 – 15.5 6 2.5 5.5 8 6 15.5 10.5 12cm 1.5cm h w For the (3 – 5) group, the class width of 3 is represented by 1.5cm. For the (11 – 15) group, the class width of 5 is represented by 2.5cm. For the (3 – 5) group, the f.d. of 8 is represented by 12cm. Each unit of f.d. is represented by 1.5cm. For the (11 – 15) group, the f.d of 6 is represented by 9cm.
9
Stem and leaf diagrams A stem and leaf diagram is a way of displaying numerical data and shows the shape of the data (the distribution). A simple stem and leaf diagram contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves. To draw a stem-and-leaf diagram, the data is sorted in ascending order. Box Plots A box plot (or box and whisker diagram) is based on five key values for a set of data: The smallest value, the largest value and the three quartiles – the upper and lower quartile and the median. They also show outliers (extreme values).
10
b) Show that 75 is the only outlier.
Example 3: In a study of how students use their mobile phones, the usage of a random sample of 19 students was examined for a particular day. The length of the calls for the 19 students are shown in the stem and leaf diagram. (2) (4) (6) (5) (0) Key: 1 | 6 means a time of 16 minutes 6 7 5 (1) a) Find the median and quartiles for these data. A value that is greater than Q × (Q3 – Q1) or smaller than Q1 – 1.5 × (Q3 – Q1) is defined as an outlier. b) Show that 75 is the only outlier. c) Draw a box plot for these data.
11
a) For non-grouped data the median is the 10th = 37
(2) (4) (6) (5) (0) Key: 1 | 6 means a time of 16 minutes 6 7 5 (1) (n + 1)th 2 = (19 + 1)th 2 = a) For non-grouped data the median is the 10th = 37 This leaves 9 values above and 9 values below the median. The lower quartile and upper quartile are the middle values of these sets. (9 + 1)th 2 i.e. Q1 = = 5th value = 26 Q3 = 5th value from the largest value = 44
12
Hence 75 is the only outlier.
We now have: Q1 = 26 1 5 6 5 6 0 7 5 The median, Q2 = 37 Q3 = 44 Q × (Q3 – Q1) = × (44 – 26) = 71 Q1 – 1.5 × (Q3 – Q1) = 26 – 1.5 × (44 – 26) = – 1 Hence 75 is the only outlier. We also have: The smallest value is 15. The largest value is 60 (excluding the outlier). (Note: The line on the box plot here can also be placed at 71). 20 10 40 30 60 50 70 80 Time taken (minutes)
13
Skewness Skewness is a measure of the asymmetry of a set of data. A distribution which is symmetrical has zero skewness. A distribution which has a longer tail on the right is positively skewed. The mean > median and Q3 – Q2 > Q2 – Q1. A distribution which has a longer tail on the left is negatively skewed. The mean < median and Q3 – Q2 < Q2 – Q1.
14
The data is negatively skewed.
1 5 6 5 6 0 7 5 In Example 3, we found: Q1 = 26 Q3 = 44 The median, Q2 = 37 The mean is: …… 19 = 36.3 So the mean < median The data is negatively skewed. Also, Q3 – Q2 = 44 – 37 = 7 Q2 – Q1 = 37 – 26 = 11 So, Q3 – Q2 < Q2 – Q1 The data is negatively skewed.
15
Measures of Average There are three main measures of an average or typical value for a set of data: The mean – the arithmetic average The median – the middle value The mode – the most common value. Measures of Spread There are several ways to measure the spread of a set of data: The range : The largest value minus the smallest The interquartile range: The range of the middle half of the data IQR = Q3 – Q1. We shall also look at the standard deviation and variance later.
16
Again in Example 3, we found:
1 5 6 5 6 0 7 5 Again in Example 3, we found: Q1 = 26 Q3 = 44 The median, Q2 = 37 The range = 75 – 15 = 60 The interquartile range = 44 – 26 = 18 The mode = 26 and 33 In this case there are two modes, this is known as a bimodal distribution.
17
A histogram consists of adjacent rectangles with an area equal to the
Summary of key points: Histograms A histogram consists of adjacent rectangles with an area equal to the frequency of the observations in the interval. The height of a rectangle is equal to the frequency density of the interval. Frequency Class width Frequency density = Stem and leaf diagrams A simple stem and leaf diagram contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves. Box Plots A box plot is based on five key values for a set of data: The smallest value, the largest value and the three quartiles – the upper quartile, the lower quartile and the median. This PowerPoint produced by R.Collins ; Updated Feb. 2014
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.