D5 Frequency diagrams for continuous data

Slides:



Advertisements
Similar presentations
Graphic representations in statistics (part II). Statistics graph Data recorded in surveys are displayed by a statistical graph. There are some specific.
Advertisements

Chapter 13 – Univariate data
Secondary National Strategy Handling Data Graphs and charts Created by J Lageu, KS3 ICT Consultant – Coventry Based on the Framework for teaching mathematics.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
Copyright © Cengage Learning. All rights reserved. 2 Descriptive Analysis and Presentation of Single-Variable Data.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Chapter 2 Frequency Distributions
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
Grouped Frequency Tables
S1: Chapter 4 Representation of Data Dr J Frost Last modified: 20 th September 2015.
KS3 Mathematics N5 Using Fractions
INTRODUCTION TO STATISTICS
Straight Line Graph.
Virtual University of Pakistan
Histograms with unequal class widths
GCSE: Histograms Dr J Frost
One or two middle numbers?
CHAPTER 12 Statistics.
Types of Data Qualitative  Descriptive Quantitative  Numerical
Frequency tables for Continuous Data
Descriptive Statistics SL
Statistics for Business
Bar Charts, Line Graphs & Frequency Polygons
Descriptive Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Calculate Areas of Rectangles, Triangles, Parallelograms and Circles
CONSTRUCTION OF A FREQUENCY DISTRIBUTION
Linear graphs.
Histograms.
DS2 – Displaying and Interpreting Single Data Sets
Introduction to Summary Statistics
Frequency diagrams Frequency diagrams can be used to display continuous data that has been grouped together. This frequency diagram shows the distribution.
Discrete and continuous data Numerical data can come in a wide variety of different forms. Discrete data is data that can only take certain values,
Introduction to Summary Statistics
The normal distribution
S1: Chapter 4 Representation of Data
KS4 Mathematics A6 Quadratic equations.
Gillis Mathematics Inequalities.
KS4 Mathematics Linear Graphs.
Key Words and Introduction to Expressions
Introduction to Summary Statistics
Histograms.
Sexual Activity and the Lifespan of Male Fruitflies
KS3 Mathematics A5 Functions and graphs
N6 Calculating with fractions
CHAPTER 12 Statistics.
Introduction to Summary Statistics
Starter Work out the missing lengths for these squares and cuboids
Scatter Plot 3 Notes 2/6/19.
Data Management & Graphing
Histograms.
Introduction to Summary Statistics
N7 Prime factor decomposition, HCF and LCM
S4 Mean and calculating statistics
CHAPTER 12 Statistics.
Introduction to Summary Statistics
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
A8 Linear and real-life graphs
Histograms.
Frequency Distributions
CHAPTER 12 Statistics.
Reading, Constructing, and Analyzing Graphs
Graphical Descriptions of Data
Presentation transcript:

D5 Frequency diagrams for continuous data KS4 Mathematics D5 Frequency diagrams for continuous data

D5.1 Grouping continuous data Contents D5 Frequency diagrams for continuous data D5.1 Grouping continuous data D5.2 Frequency diagrams D5.3 Frequency polygons D5.4 Histograms D5.5 Frequency density

Analysing data Tom is a sixteen-year-old who regularly takes part in downhill cycle races. He records the competitors’ race times on a spreadsheet. His best time is 101.6 seconds. How accurately has he measured this time? The real data used in this presentation was collected by Tom Pye at a race at Aston Hill, Bucks. Encourage pupils to refer back to previous work on averages, graphs and types of data. Emphasise the size of the data set; guide them towards grouping the data. Remind pupils that the larger the number the slower the speed. Is the data continuous or discrete?

Analysing data Here are some race times in seconds from a downhill racing event. 88.4 91.5 92.1 93.3 93.9 94.7 95.0 95.3 95.5 95.6 95.6 96.3 96.5 96.9 97.0 97.0 97.0 97.3 97.4 97.4 97.7 97.8 98.0 98.2 98.2 98.4 98.4 98.5 98.9 99.0 99.1 99.6 99.6 99.8 100.0 100.6 100.6 101.1 101.4 101.4 101.5 101.6 101.6 101.8 101.9 102.1 102.5 102.6 102.7 103.1 103.1 103.1 104.1 105.0 105.2 105.6 105.6 105.7 105.8 105.9 It is advisable to print out slides with large sets of data for ease of use by pupils. Encourage pupils to refer back to previous work on averages and graphs if appropriate. Emphasise the size of the data set; guide them towards grouping the data. Remind pupils that the larger the number the slower the speed. If you wanted to analyze the boys’ performance, what could you do with the data? How easy is the format of the data to analyze at the moment? Can you draw any conclusions?

Choosing the right graph In a piece of GCSE coursework, a student used a spreadsheet program to produce a graph of the race data. This is the graph he printed. 80.0 85.0 90.0 95.0 100.0 105.0 110.0 What labels could be added to the axes? What does the graph show? Pupils should realize that each bar represents one data item. Since the data is in ascending order, the bars gradually increase in height. The horizontal axis represents the riders 1 – 60; the vertical axis the race times. The graph should illustrate the problems of using a spreadsheet without full understanding. Suggest that grouping the data might help to see general trends and an overall picture. Ask what the class intervals should be. Point out that when the data is grouped, the times will go on the horizontal axis and frequency on the vertical axis. Is it an appropriate graph?

What is the best size for the class intervals for the race times data? Grouping data A list of results is called a data set. It is often easier to analyze a large data set if we put the data into groups. These are called class intervals. A frequency diagram or histogram can then be drawn. You will need to decide on the size of the class interval so that there are roughly between 5 and 10 class intervals. On the next page the results are shown again to aid discussion. Ask pupils what the shortest and fastest times are. It is appropriate for the interval to be a multiple of 5 or 10 if possible. The best interval here would be 5 seconds. What is the best size for the class intervals for the race times data?

class intervals The times roughly range from 85 to 110 seconds. 88.4 91.5 92.1 93.3 93.9 94.7 95.0 95.3 95.5 95.6 95.6 96.3 96.5 96.9 97.0 97.0 97.0 97.3 97.4 97.4 97.7 97.8 98.0 98.2 98.2 98.4 98.4 98.5 98.9 99.0 99.1 99.6 99.6 99.8 100.0 100.6 100.6 101.1 101.4 101.4 101.5 101.6 101.6 101.8 101.9 102.1 102.5 102.6 102.7 103.1 103.1 103.1 104.1 105.0 105.2 105.6 105.6 105.7 105.8 105.9 The times roughly range from 85 to 110 seconds. 110 – 85 = 25 seconds. Suppose we decide to use class intervals with a width of 5 seconds. 25 ÷ 5 = 5 class intervals

Notation for class intervals How should the class intervals be written down? Times in seconds Frequency 85 – 90 90 – 95 95 – 100 100 – 105 105 - 110 What is wrong with this table? Discuss where 90 and 105 should go. The table is ambiguous. For discrete data it would be possible to edit the table to say 86 – 90, 91 – 95, 96 – 100, 101 – 105 etc (or 85 – 89, 90 – 94 etc) but for continuous data this would not work.

Notation for class intervals Can you explain what the symbols in the middle column mean? 100 ≤ t < 105 105 ≤ t < 110 95 ≤ t < 100 90 ≤ t < 95 85 ≤ t < 90 Times in seconds 85 – 90 but not including 90 Frequency 90 – 95 but not including 95 95 – 100 but not including 100 Discuss where 90, 95 etc would go in this table. Represent the inequality on a number line. Ask pupils where numbers such as 99.9999999 would go. Note that the data has been rounded off to 1 d.p. so it could be argued that we could write the intervals as 85.0 – 89.9, 90.0 – 94.9 etc but this would imply that the data is discrete, so this would be incorrect. 100 – 105 but not including 105 105 – 110 but not including 110

Notation for class intervals 85 ≤ t < 90 means “times larger than or equal to 85 seconds and less than 90 seconds” Another way to say this is “from 85 up to but not including 90” Can you say these in both ways? 1) 90 ≤ t < 95 “times larger than or equal to 90 seconds and less than 95 seconds” or “from 90 up to but not including 95”. Pupils could work together in pairs to practise the correct use of vocabulary. 2) 105 ≤ t < 110 “times larger than or equal to 105 seconds and less than 110 seconds” or “from 105 up to but not including 110”.

Notation for class intervals This activity involves pupils deciding which class interval a number belongs in. The numbers should be dragged into the right interval.

Use the data to fill in the table. Class intervals 88.4 91.5 92.1 93.3 93.9 94.7 95.0 95.3 95.5 95.6 95.6 96.3 96.5 96.9 97.0 97.0 97.0 97.3 97.4 97.4 97.7 97.8 98.0 98.2 98.2 98.4 98.4 98.5 98.9 99.0 99.1 99.6 99.6 99.8 100.0 100.6 100.6 101.1 101.4 101.4 101.5 101.6 101.6 101.8 101.9 102.1 102.5 102.6 102.7 103.1 103.1 103.1 104.1 105.0 105.2 105.6 105.6 105.7 105.8 105.9 100 ≤ t < 105 105 ≤ t < 110 95 ≤ t < 100 90 ≤ t < 95 85 ≤ t < 90 Times in seconds Frequency 1 Pupils would benefit from having a print out of the data to avoid missing out data items. A tally could be used if required. They should check they have 60 items in total in the frequency column. Use the data to fill in the table. 5 28 19 7

D5 Frequency diagrams for continuous data Contents D5 Frequency diagrams for continuous data D5.1 Grouping continuous data D5.2 Frequency diagrams D5.3 Frequency polygons D5.4 Histograms D5.5 Frequency density

Frequency diagrams Frequency diagrams can be used to display grouped continuous data. For example, this frequency diagram shows the distribution of heights for a group students: Frequency Height (cm) 5 10 15 20 25 30 35 150 155 160 165 170 175 180 185 Heights of students Stress that the difference between this graph and a bar graph is that the bars are touching. Bar graphs can only be used to display qualitative data or discrete numerical data where as histograms are used to show continuous data. Strictly speaking, for a histogram we plot frequency density rather than frequency along the vertical axis. However, this is not make any difference when the class intervals are equal as in this example. This type of frequency diagram is often called a histogram.

Drawing frequency diagrams When drawing a frequency diagrams for grouped continuous data remember the following points: The time intervals go on the horizontal axis. The frequencies go on the vertical axis. The highest and lowest times in each interval go at either end of the bar, as shown below: 80 85 90 The bars must be joined together, to indicate that the data is continuous.

Frequency diagram of cycling data The cycling data we looked at earlier can be displayed in the following frequency diagram: Frequency 80 5 10 15 20 25 30 85 90 95 100 105 Times in seconds What conclusions can you draw from the graph?

Changing the class interval When the class intervals are changed the same data produces the following graph: Times in seconds 85 87.5 90 92.5 95 97.5 100 102.5 105 107.5 Frequency 5 10 15 20 The more class intervals there are, the more information you gain about the breakdown of the race times. However, this can disguise general trends and be too complex for immediate analysis. Discuss different purposes the graph might be used for, such as sponsors targeting certain racers or planning strategies to enable racers to improve; evaluating different bicycles, saddles, tyres etc; comparing age groups. What size class intervals have been used? What additional information is available from this graph? Which graph is more useful?

D5 Frequency diagrams for continuous data Contents D5 Frequency diagrams for continuous data D5.1 Grouping continuous data D5.2 Frequency diagrams D5.3 Frequency polygons D5.4 Histograms D5.5 Frequency density

What are the midpoints of each class interval for the race times data? As well as a frequency diagram, it might also be appropriate to construct a frequency polygon. This plots the midpoints of each bar and joins them together. What are the midpoints of each class interval for the race times data? Times in seconds Midpoint 85 ≤ t < 90 90 ≤ t < 95 95 ≤ t < 100 100 ≤ t < 105 105 ≤ t < 110 To find the midpoint of two numbers, add them together and divide by 2. 87.5 Pupils may recall the midpoint of grouped data from previous work on estimating the mean from grouped data. 92.5 97.5 102.5 107.5

Midpoints In this activity, pupils should add the two number together and divide by 2 to find the midpoint in preparation for drawing a frequency polygon.

Line graph of midpoints If we join together the midpoints for each bar the following graph is produced: Frequency 80 5 10 15 20 25 30 85 90 95 100 105 Times in seconds 110 75 This graph is an illustration of how to produce a frequency polygon from a histogram/ frequency diagram. Notice that the empty class intervals at each end have been included.

Frequency polygon of cycling data Removing the bars leaves a frequency polygon. Frequency 80 5 10 15 20 25 30 85 90 95 100 105 Times in seconds 110 75 Frequency 80 5 10 15 20 25 30 85 90 95 100 105 Times in seconds 110 75 Discuss with pupils the different information available on the two types of graphs. Often frequency polygon graphs are used to compare two sets of data as the shape of the graph can be more easily analyzed than bars. Note that frequency polygons can be left open at either end, or joined to the horizontal axis like this one.

Comparing frequency polygons Here are the race times for two age categories. Juniors are aged from 17 to 18 and seniors are aged from 19 to 30. Junior category 2 4 6 8 10 85 90 95 100 105 110 115 120 125 130 135 Senior category 5 10 15 20 85 90 95 100 105 110 115 120 125 130 135 For each category, find A print out of this page would be useful for pupils to analyze. Discuss how to work backwards from the midpoint to the class intervals. Pupils will need to review the definition of “modal class interval” from previous work. Discuss the difference between mode and modal class interval. Draw pupils’ attention to the fact that since we only have the grouped data frequencies, we cannot know the exact range. We must therefore use the lowest number in the lowest class interval (for example, 95 for the Juniors) and the highest in the highest class interval (135) rather than the midpoints. Encourage pupils to discuss the similarities and differences between the two groups. Provide pupils with model answers that refer to the range and an average. Discuss whether there is too much information with 10 intervals, and compare with the graphs on the next slide. the size of the class intervals the modal class interval the range. Compare the performances in the two categories.

Comparing frequency polygons The same data has been used in these graphs. Junior category Senior category 20 30 20 10 10 85 95 105 115 125 135 85 95 105 115 125 135 For each category, find the size of the class intervals It would be useful to print this page for pupils. the number of class intervals the modal class interval. Compare these graphs with the previous ones. Which do you find more useful for analyzing the race times and why?

Comparing sets of data The range of times for the Junior category is smaller than for the Senior category. This suggests the Seniors are less consistent. Using the first set of graphs, the modal class interval for the Juniors is 95 ≤ t < 100, whereas the modal class interval for the Seniors is 110 ≤ t < 115. Using the second set of graphs, the modal class interval for the Juniors is 95 ≤ t < 105, whereas the modal class interval for the Seniors is 105 ≤ t < 115. The wider intervals used for the averages in the second set of graphs illustrates the loss of information when the number of intervals is reduced. Emphasise the importance of using the phrase “on average” in the final sentence. It might be interesting to discuss why younger bikers are faster than older ones! This means that on average Juniors are faster than Seniors.

D5 Frequency diagrams for continuous data Contents D5 Frequency diagrams for continuous data D5.1 Grouping continuous data D5.2 Frequency diagrams D5.3 Frequency polygons D5.4 Histograms D5.5 Frequency density

Is the bar three times as big? Histograms This frequency diagram represents the race times for the Youth category, which is 14 to 16 year olds. Frequency 2 4 6 8 10 12 Time in seconds 95 100 105 110 115 120 125 130 135 There are __ times as many people in the 105 ≤ t < 110 interval than there are in the 95 ≤ t < 100 interval. 3 Is the bar three times as big? Ask pupils how many people are represented by each square (2). This will help introduce the idea that frequency is proportional to area in a histogram. How many people are represented by each square on the grid?

Discuss this statement. Do you agree or disagree? Histograms Discuss this statement. Do you agree or disagree? “If a bar is twice as high as another, the area will be twice as big and so the frequency will be twice the size.” Frequency 2 4 6 8 10 12 Time in seconds 95 100 105 110 115 120 125 130 135 The area of the bars is proportional to the frequency. Because all the bars are the same width, we only need to look at the height of the bars to work out the frequency.

Which intervals would you combine? Combining intervals Some of the intervals are very small, which makes any conclusions about them unreliable. Frequency 2 4 6 8 10 12 Time in seconds 95 100 105 110 115 120 125 130 135 It is sometimes sensible to combine intervals together. Which intervals would you combine? Encourage pupils to combine the first two intervals and the last three.

Histograms with bars of unequal width This graph represents the same data as the previous one. What has changed? The first two intervals both had a frequency of 2. The first bar now represents an interval twice as big. Frequency 2 4 6 8 10 12 Time in seconds 95 100 105 110 115 120 125 130 135 How many people are in this interval? Establish that the first two bars and the last three bars have been combined. There are 4 people represented by the first bar, 2 in each square. The area still represents the frequency, but the height of the bar does not. Discuss the end bar as well. The original frequencies were 2,3 and 1. Ask what the new frequency should be (6). Again, the area of the bar is 3 squares which is 6 people, but the height is not. How many people does one square represent? Do the numbers along the vertical axis still represent frequency?

Histograms with bars of unequal width In the original histogram, the frequency was proportional to the area. Is this still true in the new histogram? Frequency Time in seconds 95 100 105 110 115 120 125 130 135 The frequency for 105 ≤ t < 110 is the same as the frequency for ___________. 120 ≤ t < 135 Are the areas of the bars the same? The answer to the first question is yes. The other questions on this slide should lead towards this conclusion. These discussion questions should lead towards an appreciation that the vertical axis needs to create a relationship between the frequency and the area. Each square stills represents two people. In a histogram, the frequency is equal to the area of the bar.

D5 Frequency diagrams for continuous data Contents D5 Frequency diagrams for continuous data D5.1 Grouping continuous data D5.2 Frequency diagrams D5.3 Frequency polygons D5.4 Histograms D5.5 Frequency density

Frequency density In a histogram, the frequency is given by the area of each bar. frequency density 4 people 95 105 110 It follows that the height of the bar × the width of bar must be the area. Therefore, the height must equal the area ÷ the width. The area of the bar gives the frequency and so we can write, The frequency density also represents the constant of proportionality connecting the frequency to the length of the interval. If the interval were 1, then the frequency and the frequency density would be equal, since frequency ÷ 1 = frequency density. Thus the wider the interval, the smaller the frequency density. Height of the bar = frequency width of interval This height is called the frequency density.

What scale do we need for the vertical axis? Frequency density Frequency density = frequency width of interval In our example, each square represents 2 people. 0.4 4 people 95 105 110 What scale do we need for the vertical axis? Remind pupils that the area of the bar represents the frequency. Width of interval = 10 Area = 4 Height = 4 ÷ 10 = 0.4 Frequency density = 0.4

Calculating the frequency We can use the formula, Frequency = frequency density × width of interval to check this scale for the other bars in the graph. Frequency density 0.4 0.8 1.2 1.6 2.0 2.4 Time in seconds 95 100 105 110 115 120 125 130 135 Frequency density × width 110 ≤ t < 115 115 ≤ t < 120 105 ≤ t < 110 120 ≤ t < 135 95 ≤ t < 105 Time in seconds Area (frequency) 0.4 × 10 4 The results obtained should be compared with the original graph. 1.2 × 5 6 1.4 × 5 7 2.2 × 5 11 0.4 × 15 6

Calculating the frequency density width of interval Complete the table for this data and draw a histogram. 2 130 ≤ t < 150 12 115 ≤ t < 130 8 105 ≤ t < 115 5 100 ≤ t < 105 95 ≤ t < 100 Frequency density Frequency ÷ width of interval Frequency Time in seconds 8 ÷ 5 1.6 5 ÷ 5 1.0 Discuss how it is possible for two bars to have the same frequency density. Once the graph has been drawn, ask pupils to write down some questions that could be asked from it, both analytical questions and exam type questions. Ask them how they could use the frequency density and the intervals to work out the frequencies from their graph if they had not been given them. 8 ÷ 10 0.8 12 ÷ 15 0.8 2 ÷ 20 0.1

Calculating the frequency density Your histogram should look like this: Frequency density 0.2 0.4 0.6 0.8 1.0 1.2 Time in seconds 95 100 105 110 115 120 125 130 135 140 145 150 1.4 1.6 Verify that the area of each bar is proportional to the frequency it represents. In this example, each square represents 1 person. However, pupils’ graphs will vary from the one shown depending on the scale they have used.

Calculating the class intervals This is a histogram of race times from a longer race. Time in seconds Frequency density 2 4 6 8 The first bar represents 40 people. The lowest time was 100 seconds. Ask pupils to make a copy of this histogram. As the scale along the bottom is calculated on the next slide, pupils can fill it in on their graph. 100 Work out the scale along the bottom and the frequencies for each interval.

Calculating the frequency density We can use the formula, Frequency = frequency density × width of interval to complete the following table for the data in the histogram. Length of interval Frequency density Frequency class interval 40 1 40 ÷ 1= 40 100 ≤ t < 140 4 × 40 = 160 4 40 140 ≤ t < 180 Discuss how it is possible for two bars to have the same frequency density. Once the graph has been drawn, ask pupils to write down some questions that could be asked from it, both analytical questions and exam type questions. Ask them how they could use the frequency density and the intervals to work out the frequencies from their graph if they had not been given them. 80 × 6 = 480 6 80 180 ≤ t < 260 20 × 5 = 100 5 20 260 ≤ t < 280 20 × 3 = 60 3 20 280 ≤ t < 300

Review Write a definition of each word below and then design a mind map outlining the key facts you have learnt. data set class interval midpoint range axes frequency diagram frequency frequency polygon modal class interval histogram frequency density Include methods for calculating and drawing; possible mistakes to avoid …