REPRESENTATION OF DATA.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Measures of Dispersion
1 Chapter 1: Sampling and Descriptive Statistics.
IB Math Studies – Topic 6 Statistics.
Exploratory Data Analysis (Descriptive Statistics)
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
QBM117 Business Statistics
Statistics: Use Graphs to Show Data Box Plots.
Starter 1.Find the median of Find the median of Calculate the range of Calculate the mode.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Aims: To be able to find the smallest & largest values along with the median, quartiles and IQR To be able to draw a box and whisker plot To be able to.
Descriptive Statistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Objectives Describe the central tendency of a data set.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Chapter 2 Describing Data.
6-1 Numerical Summaries Definition: Sample Mean.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Categorical vs. Quantitative…
1 Further Maths Chapter 2 Summarising Numerical Data.
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
Using Measures of Position (rather than value) to Describe Spread? 1.
LIS 570 Summarising and presenting data - Univariate analysis.
Numerical descriptions of distributions
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Statistics and Data Analysis
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Introduction Data sets can be compared by examining the differences and similarities between measures of center and spread. The mean and median of a data.
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
5,8,12,15,15,18,20,20,20,30,35,40, Drawing a Dot plot.
S1: Chapter 4 Representation of Data Dr J Frost Last modified: 20 th September 2015.
Exploratory Data Analysis
Figure 2-7 (p. 47) A bar graph showing the distribution of personality types in a sample of college students. Because personality type is a discrete variable.
a graphical presentation of the five-number summary of data
Bellwork 1. Order the test scores from least to greatest: 89, 93, 79, 87, 91, 88, Find the median of the test scores. 79, 87, 88, 89, 91, 92, 93.
Numerical descriptions of distributions
ISE 261 PROBABILISTIC SYSTEMS
Statistics Unit Test Review
4. Interpreting sets of data
Chapter 6 ENGR 201: Statistics for Engineers
Laugh, and the world laughs with you. Weep and you weep alone
Descriptive Statistics
An Introduction to Statistics
Topic 5: Exploring Quantitative data
Numerical Measures: Skewness and Location
S1: Chapter 4 Representation of Data
Warmup What five numbers need to be mentioned in the complete sentence you write when the data distribution is skewed?
Warmup Draw a stemplot Describe the distribution (SOCS)
Representation of Data
Chapter 1: Exploring Data
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Presentation transcript:

REPRESENTATION OF DATA

Histograms A Histogram is a graphical representation of the distribution of data. It consists of adjacent rectangles with an area equal to the frequency of the observations in the interval. The height of a rectangle is equal to the frequency density of the interval. Frequency Class width Frequency density = The rectangles of a histogram are drawn so that they touch each other (i.e. no gaps as a bar chart has) to indicate that the original variable is continuous.

Example 1: The histogram below shows the speed in miles per hour, of cars on a motorway. 50 60 90 70 80 1 6 5 4 3 2 Frequency Density Speed (m.p.h.) Complete the frequency table. x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 Estimate the number of cars with a speed of between 70m.p.h. and 85 m.p.h. b) Find an estimate of the mean speed of the cars.

For 65-75: For 75-90: Frequency density = Frequency Class width 50 60 90 70 80 1 6 5 4 3 2 Frequency Density For 65-75: For 75-90: Speed (m.p.h.) Frequency density = Frequency Class width f.d. × c.w = frequency frequency = 3 × 10 = 30 frequency = 1 × 15 = 15 x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 30 15

a) Half of the cars in the 65 – 75 group have a speed of 70 m.p.h. For the number of cars with a speed of between 70m.p.h. and 85 m.p.h. 50 60 90 70 80 1 6 5 4 3 2 Frequency Density We want to find the number of cars represented by the shaded region. Speed (m.p.h.) x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 15 a) Half of the cars in the 65 – 75 group have a speed of 70 m.p.h. or more. Two thirds of the cars in the 75 – 90 group have a speed of 70 m.p.h. or more. 15 + 10 = 25 cars have speeds between 70 m.p.h. and 85 m.p.h.

x 50-55 55-60 60-65 65-75 75-90 Frequency 12 20 30 15 b) For the mean, the mid-points of x are needed. Speed x f fx 50 – 55 12 55 – 60 20 60 – 65 30 65 – 75 75 – 90 15 52.5 57.5 62.5 70 82.5 630 1150 1875 2100 1237.5 6992.5 107 = Totals: 107 6992.5 = 65.35 The mean speed of the cars is 65.4 m.p.h. (3 sig.figs)

Example 2: In a fitness centre survey a random sample of 100 men were asked how many hours, to the nearest hour, they spent jogging in the last week. The results are summarised below. Number of hours Frequency 0 – 2 17 3 – 5 24 6 – 10 29 11 – 15 30 A histogram was drawn and the group (3 – 5) hours was represented by a rectangle that was 1.5 cm wide and 12 cm high. Calculate the width and height of the rectangle representing the group (11 – 15) hours. The height of each rectangle is proportional to the frequency density. Frequency Class width Frequency density =

For the (3 – 5) group, the class width of 3 is represented by 1.5cm. Number of hours Boundaries Frequency Frequency density 0 – 2 17 3 – 5 24 6 – 10 29 11 – 15 30 2.5 – 5.5 8 10.5 – 15.5 6 2.5 5.5 8 6 15.5 10.5 12cm 1.5cm h w For the (3 – 5) group, the class width of 3 is represented by 1.5cm. For the (11 – 15) group, the class width of 5 is represented by 2.5cm. For the (3 – 5) group, the f.d. of 8 is represented by 12cm. Each unit of f.d. is represented by 1.5cm. For the (11 – 15) group, the f.d of 6 is represented by 9cm.

Stem and leaf diagrams A stem and leaf diagram is a way of displaying numerical data and shows the shape of the data (the distribution). A simple stem and leaf diagram contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves. To draw a stem-and-leaf diagram, the data is sorted in ascending order. Box Plots A box plot (or box and whisker diagram) is based on five key values for a set of data: The smallest value, the largest value and the three quartiles – the upper and lower quartile and the median. They also show outliers (extreme values).

b) Show that 75 is the only outlier. Example 3: In a study of how students use their mobile phones, the usage of a random sample of 19 students was examined for a particular day. The length of the calls for the 19 students are shown in the stem and leaf diagram. 5 6 (2) 0 2 6 6 (4) 1 3 3 7 8 9 (6) 0 1 4 5 8 (5) (0) Key: 1 | 6 means a time of 16 minutes 6 7 5 (1) a) Find the median and quartiles for these data. A value that is greater than Q3 + 1.5 × (Q3 – Q1) or smaller than Q1 – 1.5 × (Q3 – Q1) is defined as an outlier. b) Show that 75 is the only outlier. c) Draw a box plot for these data.

a) For non-grouped data the median is the 10th = 37 5 6 (2) 0 2 6 6 (4) 1 3 3 7 8 9 (6) 0 1 4 5 8 (5) (0) Key: 1 | 6 means a time of 16 minutes 6 7 5 (1) (n + 1)th 2 = (19 + 1)th 2 = a) For non-grouped data the median is the 10th = 37 This leaves 9 values above and 9 values below the median. The lower quartile and upper quartile are the middle values of these sets. (9 + 1)th 2 i.e. Q1 = = 5th value = 26 Q3 = 5th value from the largest value = 44

Hence 75 is the only outlier. We now have: Q1 = 26 1 5 6 2 0 2 6 6 3 1 3 3 7 8 9 4 0 1 4 5 8 5 6 0 7 5 The median, Q2 = 37 Q3 = 44 Q3 + 1.5 × (Q3 – Q1) = 44 + 1.5 × (44 – 26) = 71 Q1 – 1.5 × (Q3 – Q1) = 26 – 1.5 × (44 – 26) = – 1 Hence 75 is the only outlier. We also have: The smallest value is 15. The largest value is 60 (excluding the outlier). (Note: The line on the box plot here can also be placed at 71). 20 10 40 30 60 50 70 80 Time taken (minutes)

Skewness Skewness is a measure of the asymmetry of a set of data. A distribution which is symmetrical has zero skewness. A distribution which has a longer tail on the right is positively skewed. The mean > median and Q3 – Q2 > Q2 – Q1. A distribution which has a longer tail on the left is negatively skewed. The mean < median and Q3 – Q2 < Q2 – Q1.

The data is negatively skewed. 1 5 6 2 0 2 6 6 3 1 3 3 7 8 9 4 0 1 4 5 8 5 6 0 7 5 In Example 3, we found: Q1 = 26 Q3 = 44 The median, Q2 = 37 The mean is: 15 + 16 + 20 + …….+ 60 + 75 19 = 36.3 So the mean < median The data is negatively skewed. Also, Q3 – Q2 = 44 – 37 = 7 Q2 – Q1 = 37 – 26 = 11 So, Q3 – Q2 < Q2 – Q1 The data is negatively skewed.

Measures of Average There are three main measures of an average or typical value for a set of data: The mean – the arithmetic average The median – the middle value The mode – the most common value. Measures of Spread There are several ways to measure the spread of a set of data: The range : The largest value minus the smallest The interquartile range: The range of the middle half of the data IQR = Q3 – Q1. We shall also look at the standard deviation and variance later.

Again in Example 3, we found: 1 5 6 2 0 2 6 6 3 1 3 3 7 8 9 4 0 1 4 5 8 5 6 0 7 5 Again in Example 3, we found: Q1 = 26 Q3 = 44 The median, Q2 = 37 The range = 75 – 15 = 60 The interquartile range = 44 – 26 = 18 The mode = 26 and 33 In this case there are two modes, this is known as a bimodal distribution.

A histogram consists of adjacent rectangles with an area equal to the Summary of key points: Histograms A histogram consists of adjacent rectangles with an area equal to the frequency of the observations in the interval. The height of a rectangle is equal to the frequency density of the interval. Frequency Class width Frequency density = Stem and leaf diagrams A simple stem and leaf diagram contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves. Box Plots A box plot is based on five key values for a set of data: The smallest value, the largest value and the three quartiles – the upper quartile, the lower quartile and the median. This PowerPoint produced by R.Collins ; Updated Feb. 2014