Representation of Data

Slides:



Advertisements
Similar presentations
Box plot Edexcel S1 Mathematics 2003 (or box and whisker plot)
Advertisements

Describing Quantitative Variables
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Describing Distributions Numerically
QBM117 Business Statistics
Starter 1.Find the median of Find the median of Calculate the range of Calculate the mode.
GCSE Session 28 - Cumulative Frequency, Vectors and Standard Form.
Describing distributions with numbers
REPRESENTATION OF DATA.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Visual Displays for Quantitative Data
Warm Up Find the mean, median, mode, range, and outliers of the following data. 11, 7, 2, 7, 6, 12, 9, 10, 8, 6, 4, 8, 8, 7, 4, 7, 8, 8, 6, 5, 9 How does.
1 Further Maths Chapter 2 Summarising Numerical Data.
Measures of Center vs Measures of Spread
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Vocabulary to know: *statistics *data *outlier *mean *median *mode * range.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Exploratory Data Analysis
5,8,12,15,15,18,20,20,20,30,35,40, Drawing a Dot plot.
S1: Chapter 4 Representation of Data Dr J Frost Last modified: 20 th September 2015.
All About that Data Unit 6 Data.
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Exploratory Data Analysis
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Drawing and comparing Box and Whisker diagrams (Box plots)
Statistics 1: Statistical Measures
Get out your notes we previously took on Box and Whisker Plots.
Chapter 1: Exploring Data
4. Interpreting sets of data
U4D3 Warmup: Find the mean (rounded to the nearest tenth) and median for the following data: 73, 50, 72, 70, 70, 84, 85, 89, 89, 70, 73, 70, 72, 74 Mean:
Descriptive Statistics SL
Chapter 6 ENGR 201: Statistics for Engineers
Statistical Reasoning
DS5 CEC Interpreting Sets of Data
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Descriptive Statistics:
Box and Whisker Plots Algebra 2.
DAY 3 Sections 1.2 and 1.3.
Topic 5: Exploring Quantitative data
Unit 3: Statistics Final Exam Review.
S1: Chapter 4 Representation of Data
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
Unit 2: Statistics Final Exam Review.
1.2 Describing Distributions with Numbers
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
Advanced Placement Statistics Ch 1.2: Describing Distributions
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Click the mouse button or press the Space Bar to display the answers.
Warm Up # 3: Answer each question to the best of your knowledge.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Ticket in the Door GA Milestone Practice Test
Two Way Frequency Table
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Representation of Data

Representation of Data Stem and Leaf Diagrams You will have seen stem and leaf diagrams on your GCSE. They are also on A-level, but you will be asked more questions on them. 20, 9, 17, 12, 28, 31, 22, 24, 17, 25, 24, 24, 26 Stem Leaf Stem Leaf 9 9 1 7 2 7 1 2 7 7 2 8 2 4 5 4 4 6 2 2 4 4 4 5 6 8 3 1 3 1 The leaf will usually be the last number, and the stem the rest. Make sure the data is in order! 4A

Representation of Data Twin Stem and Leaf Diagrams Sometimes you will have 2 sets of data on one diagram. The following numbers represent flower widths for 2 different plants of the same species (cm). Plant 1 2.5 2.1 3.0 3.2 1.9 1.5 2.2 2.4 Plant 2 3.1 2.6 2.9 3.3 3.5 4.0 3.7 2.7 Plant 2 Stem Plant 1 Key: 6 | 2 | 1 1 5 9 9 Means 2.1 for plant 1 and 2.6 for plant 2 9 7 6 2 1 2 4 5 7 5 3 1 3 2 4 4A

Representation of Data Twin Stem and Leaf Diagrams Calculate the Median and Inter-quartile range for the following Stem and Leaf diagram. n 2 13 2 Q2  Q1  Q3  Q3 – Q1  6.5 (7th term) 38 Stem Leaf 2 3 6 n 4 13 4 3.25 (4th term) 3 1 5 7 7 8 9 35 4 3 3 4 3n 4 39 4 5 2 9.75 (10th term) 43 13 Numbers 43 - 35 = 8 4A

Representation of Data Twin Stem and Leaf Diagrams Calculate the Median and Inter-quartile range for the following Stem and Leaf diagram. n 2 14 2 Q2  Q1  Q3  Q3 – Q1  7 (7.5th term) 71.5 Stem Leaf 6 1 2 5 5 8 n 4 14 4 3.5 (4th term) 7 1 2 3 6 7 65 8 1 4 3n 4 42 4 9 10.5 (11th term) 77 14 Numbers 77 - 65 = 12 4A

Representation of Data Outliers An outlier is an extreme value that lies outside the overall pattern of data. An outlier is any value that is; Bigger than; Upper Quartile + (1.5 x Inter-quartile Range)  Q3 + 1.5(IQR) Smaller than; Lower Quartile – (1.5 x Inter-quartile Range)  Q1 – 1.5(IQR) So basically, work out ‘1.5 x IQR’. Then add it to the upper quartile, subtract it from the lower quartile and you have the acceptable range of values. The rules above are standard but you may be given a different rule to apply in the exam. 4B

Representation of Data Outliers For the Stem and Leaf diagram below, calculate the quartiles and find any outliers. n 2 30 2 Q2  Q1  Q3  Q3 – Q1  15 (15.5th term) 3.8 Key: 3 | 1 means 3.1 Stem Leaf 2 2 2 3 3 5 7 n 4 30 4 7.5 (8th term) 3 1 2 6 7 7 7 8 8 8 8 9 9 9 3.2 4 4 5 6 7 8 5 1 5 3n 4 30 4 22.5 (23rd term) 30 Numbers 4.0 4.0 – 3.2 = 0.8 4B

Representation of Data Outliers For the Stem and Leaf diagram below, calculate the quartiles and find any outliers. Q1 = 3.2 Q2 = 3.8 Q3 = 4.0 IQR = 0.8 Key: 3 | 1 means 3.1 Lowest acceptable value Highest acceptable value Stem Leaf 2 2 2 3 3 5 7 3 1 2 6 7 7 7 8 8 8 8 9 9 9 Q1 – 1.5(IQR) Q3 + 1.5(IQR) 4 4 5 6 7 8 3.2 – 1.5(0.8) 4 + 1.5(0.8) 5 1 5  2  5.2 30 Numbers So 5.5 is the only outlier. 4B

Representation of Data Box Plots and comparing data  Any outliers are plotted as crosses outside the main plot  Each ‘section’ contains 25% of the observations in the sample Smallest value Lower Quartile Upper Quartile Largest value Median Outlier 25% 25% 25% 25% 10 20 30 40 50 60 70 80 4C/4D

Representation of Data Drawing the box plot Q1 = 3.2 Q2 = 3.8 Q3 = 4.0 IQR = 0.8 Lowest acceptable value Highest acceptable value Key: 3 | 1 means 3.1 Stem Leaf 2 2 2 3 3 5 7 Q1 – 1.5(IQR) Q3 + 1.5(IQR) 3 1 2 6 7 7 7 8 8 8 8 9 9 9 3.2 – 1.5(0.8) 4 + 1.5(0.8) 4 4 5 6 7 8  2  5.2 5 1 5 So 5.5 is the only outlier. 2 2.5 3 3.5 4 4.5 5 5.5 4C/4D

Representation of Data Drawing the box plot The blood glucose level of 30 males is recorded. Below is a summary of the results. Given that there was only one outlier, draw a box plot for the data. IQR = 4.7 – 3.6 = 1.1 Max value = 4.7 + 1.5(1.1) = 6.35 Min value = 3.6 – 1.5(1.1) = 1.95 So 1.4 is the outlier. Lower Quartile = 3.6 Upper Quartile = 4.7 Median = 4 Lowest Value = 1.4 Highest Value = 5.2 As we do not know the actual lowest value, we use the lower boundary (1.95) 1 2 3 4 5 6 4C/4D

Representation of Data Comparing Box Plots When you compare 2 box plots you should always comment on the Median and the Inter-quartile range. This is because Median is a measure of location (average), and the Inter-quartile range is a measure of spread. The median is higher for males, and they also have a larger Inter-quartile range. This indicates that males have a higher blood glucose level on average, and also have a wider range of values. Females Males 1 2 3 4 5 6 Glucose Level 4C/4D

Representation of Data Histograms A Histogram is similar to a bar chart but there are 2 major differences  There are no gaps between bars (continuous data)  The area of a bar is proportional to the frequency When drawing a Histogram, use Frequency Density rather than frequency. You may also need to use the following formula when interpreting a Histogram. Area of Bar = k x Frequency Usually the Area of the bar is equal to the frequency. But it may be that all areas have been halved (ie k = 0.5) in order to make the diagram smaller. Frequency Density Frequency = Class width 4E

Representation of Data Frequency Density Frequency Histograms The following table shows how long a sample of 200 students took to complete their homework. Draw a Histogram to represent the data. = Class width 14 12 Time (mins) Frequency Frequency Density 10 25-30 55 11 (55 ÷ 5) 8 Frequency Density 30-35 39 7.8 (39 ÷ 5) 6 35-40 68 13.6 (68 ÷ 5) 4 40-50 32 3.2 (32 ÷ 10) 2 50-80 6 0.2 (6 ÷ 30) 20 30 40 50 60 70 80 90 Time (mins) 4E

Representation of Data Histograms Use the Histogram to estimate the number of students whose times were between 36 and 45 minutes. As Area represents Frequency, we need to calculate the Area of each Rectangle we are including. Rectangle 1:  4 x 13.6  54.4 students Rectangle 2:  5 x 3.2  16 students 36 to 45 14 13.6 12 10 Frequency Density 8 1 Overall our estimate would be 70.4 (70) students between 36 and 45 minutes. 6 4 3.2 2 2 20 30 40 50 60 70 80 90 Time (mins) 4E

Representation of Data Histograms The Histogram to the right shows the time taken (s) for a group of children to complete a puzzle. Why has a Histogram been used?  Time is Continuous Data What is the underlying feature of each bar?  It is proportional to the group Frequency 14 16 18 20 22 24 26 28 30 32 Time (s) 4E

Representation of Data Histograms The Histogram to the right shows the time taken (s) for a group of children to complete a puzzle. Bar A represents 78 children. What Area represents 1 child? Area represents Frequency  2 x 27.3  54.6cm2 27.3 A 78 Children = 54.6cm2 ÷ 78 14 16 18 20 22 24 26 28 30 32 1 Child = 0.7cm2 2 Time (s) 4E

Representation of Data Histograms The Histogram to the right shows the time taken (s) for a group of children to complete a puzzle. 1 Child = 0.7cm2 If the Area is 210cm2 in total, how many children were surveyed? x 0.7 1 Child = 0.7cm2 ? Children = 210cm2 14 16 18 20 22 24 26 28 30 32 ÷ 0.7 Time (s) 210cm2 ÷ 0.7 = 300 Children 4E

Teachings for Exercise 4F

Representation of Data Skewness and Comparisons The Skewness of data can be described using diagrams, measures of location and measures of spread. Data which is spread evenly  Symmetrical Data which is mostly at the lower values  Positive Skew Data which is mostly at the higher values  Negative Skew Symmetrical Positive Skew Negative Skew 4F

Representation of Data Skewness and Comparisons There are several ways of comparing Skewness. Sometimes you will be told which to use, and sometimes you will have to choose one depending on what data you have available. You can see shape of the data from a box plot. You can also look at the quartiles Q1 Q2 Q3 Symmetrical Q2 – Q1 = Q3 – Q2 Q1 Q2 Q3 Positive Skew Q2 – Q1 < Q3 – Q2 Q1 Q2 Q3 Negative Skew Q2 – Q1 > Q3 – Q2 4F

Representation of Data Skewness and Comparisons There are several ways of comparing Skewness. Sometimes you will be told which to use, and sometimes you will have to choose one depending on what data you have available. Another test uses the measures of location: Symmetrical  mean = median = mode Positive Skew  mean > median > mode Negative Skew  mean < median < mode Low mode = lots of low values ie) Positive Skew High mode = lots of high values ie) Negative Skew 4F

Representation of Data Skewness and Comparisons There are several ways of comparing Skewness. Sometimes you will be told which to use, and sometimes you will have to choose one depending on what data you have available. The final test is a formula: A value of 0 implies that mean = median  Symmetrical Data A positive value implies that median < mean  Positive Skew A negative value implies that median > mean  Negative Skew The further from 0 a positive or negative value is, the more skewed the data is. 3(Mean – Median) Standard Deviation Negative Skew Symmetrical Positive Skew 4F

Representation of Data Skewness and Comparisons Find the 3 Quartiles for this data on test marks for 50 students. Q2  Q1  Q3  Key: 6 | 1 means 61 Stem Leaf n 2 50 2 25 (25.5th term) 2 1 2 8 60 3 3 4 7 8 9 4 1 2 3 5 6 7 9 n 4 50 4 12.5 (13th term) 5 2 3 3 5 5 6 8 9 9 46 6 1 2 2 3 4 4 5 6 6 8 8 8 9 9 7 2 3 4 5 7 8 9 3n 4 150 4 37.5 (38th term) 8 1 4 69 4F

Representation of Data Skewness and Comparisons Given the two values below, calculate the Mean and Standard Deviation of the data. Key: 6 | 1 means 61 Stem Leaf 2 1 2 8 3 3 4 7 8 9 4 1 2 3 5 6 7 9 Mean Standard Deviation 5 2 3 3 5 5 6 8 9 9 6 1 2 2 3 4 4 5 6 6 8 8 8 9 9 7 2 3 4 5 7 8 9 8 1 4 Q1 = 46 Q2 = 60 Q3 = 69 (2dp) 4F

Representation of Data Skewness and Comparisons Use the formula below to calculate the Skewness of the data. Key: 6 | 1 means 61 Stem Leaf 2 1 2 8 3(Mean – Median) 3 3 4 7 8 9 Standard Deviation 4 1 2 3 5 6 7 9 5 2 3 3 5 5 6 8 9 9 3(57.46 - 60) 15.67 6 1 2 2 3 4 4 5 6 6 8 8 8 9 9 7 2 3 4 5 7 8 9 -7.62 8 1 4 15.67 Q1 = 46 Mean = 57.46 = -0.486 Q2 = 60 Standard Deviation = 15.67 So the data is Negatively Skewed! Q3 = 69 Mode = 68 4F

Representation of Data Skewness and Comparisons Use another two methods to show the data is Negatively Skewed. Key: 6 | 1 means 61 Stem Leaf 2 1 2 8 1) Q2 – Q1 = 14 3 3 4 7 8 9 Q3 – Q2 = 9 4 1 2 3 5 6 7 9 5 2 3 3 5 5 6 8 9 9 Q2 – Q1 > Q3 – Q2 6 1 2 2 3 4 4 5 6 6 8 8 8 9 9  Negative Skew 7 2 3 4 5 7 8 9 8 1 4 2) Mean < Median < Mode Q1 = 46 Mean = 57.46 57.46 < 60 < 68 Q2 = 60 Standard Deviation = 15.67 High mode implies many higher values… Q3 = 69 Mode = 68  Negative Skew 4F

Representation of Data Skewness and Comparisons A company runs two manufacturing lines, A and B. They both make 2cm rods in different ways. Samples are taken from both lines and data summarised in the following table. Which manufacturing line is best in this situation? Mean Standard Deviation A 2 0.015 B 0.05 The rods need to be accurate… Standard Deviation measures spread The rods from line A have a lower Standard Deviation Line A is therefore more reliable 4F

Representation of Data Skewness and Comparisons This table shows data on pupils taking a Statistics and Mechanics Paper. Which will be easier to set fair grade boundaries for? A higher standard deviation means the marks are more spread out Therefore the grade boundaries will be more spread out for Statistics And will therefore be fairier! Mean Standard Deviation Statistics 55 16 Mechanics 4 4F

Summary We have looked at using Stem and Leaf diagrams and Histograms to represent data We have looked at comparing data using these, as well as box plots We have learnt what outliers are We have learnt what Skewness is and used several measures to test it