Chapter 13 – Univariate data

Slides:



Advertisements
Similar presentations
STATISTICS.
Advertisements

Histogram Most common graph of the distribution of one quantitative variable.
Averages and Range. Range Range = highest amount - lowest amount.
Intro to Statistics         ST-L1 Objectives: To review measures of central tendency and dispersion. Learning Outcome B-4.
Frequency Distribution Ibrahim Altubasi, PT, PhD The University of Jordan.
1 CUMULATIVE FREQUENCY AND OGIVES. 2 AS (a) Collect, organise and interpret univariate numerical data in order to determine measures of dispersion,
AP Statistics Day One Syllabus AP Content Outline Estimating Populations and Subpopulations.
MEASURES of CENTRAL TENDENCY.
Summarizing Scores With Measures of Central Tendency
STATISTICAL GRAPHS.
Graphs of Frequency Distribution Introduction to Statistics Chapter 2 Jan 21, 2010 Class #2.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
Data and Data Analysis. Measures of Central Tendency Used to interpret data by choosing one number to represent all the numbers in the data set.
Kinds of data 10 red 15 blue 5 green 160cm 172cm 181cm 4 bedroomed 3 bedroomed 2 bedroomed size 12 size 14 size 16 size 18 fred lissy max jack callum zoe.
CHAPTER 36 Averages and Range. Range and Averages RANGE RANGE = LARGEST VALUE – SMALLEST VALUE TYPES OF AVERAGE 1. The MOST COMMON value is the MODE.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 2-2 Frequency Distributions.
Univariate Data Concerned with a single attribute or variable.
WE HAVE BEEN LOOKING AT: Univariate data: Collecting sets of data and analysing it to see the results Ungrouped data set: a list of individual results.
Chapter 11 Data Descriptions and Probability Distributions Section 1 Graphing Data.
Chapter 9 Statistics.
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 11 Review Important Terms, Symbols, Concepts Sect Graphing Data Bar graphs, broken-line graphs,
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
Averages Aim: To calculate averages and the range.
Introduction to statistics I Sophia King Rm. P24 HWB
MATH 110 Sec 14-1 Lecture: Statistics-Organizing and Visualizing Data STATISTICS The study of the collection, analysis, interpretation, presentation and.
Summation Notation, Percentiles and Measures of Central Tendency Overheads 3.
13.7 Histograms SWBAT make and read a histogram SWBAT locate the quartiles of a set of data on a histogram SWBAT interpret a frequency polygon.
Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
Grouped Frequency Tables
Descriptive Statistics
Exploratory Data Analysis
INTRODUCTION TO STATISTICS
Relative Cumulative Frequency Graphs
Mathematics GCSE Revision Key points to remember
BUSINESS MATHEMATICS & STATISTICS.
Chapter 2: Methods for Describing Data Sets
Frequency Distributions
Year 8 Mathematics Averages
4. Interpreting sets of data
7. Displaying and interpreting single data sets
Summarizing Scores With Measures of Central Tendency
Introduction to Summary Statistics
Introduction to Summary Statistics
PROBABILITY AND STATISTICS
Histograms.
Representation and Summary of Data - Location.
DS2 – Displaying and Interpreting Single Data Sets
6.5 Measures of central tendency
Introduction to Summary Statistics
Analyzing graphs and histograms
An Introduction to Statistics
Class Data (Major) Ungrouped data:
Introduction to Summary Statistics
Introduction to Summary Statistics
Histograms.
Sexual Activity and the Lifespan of Male Fruitflies
LESSON 3: CENTRAL TENDENCY
Introduction to Summary Statistics
We have Been looking at:
Introduction to Summary Statistics
Find median-table data
Introduction to Summary Statistics
Displaying Distributions with Graphs
Presentation transcript:

Chapter 13 – Univariate data

What is THIS topic about? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to the data to learn about it A data set is a group of numbers that we find from research e.g. survey results, observations of the world ‘Univariate’ means that there is one (‘uni’) variable (‘variate’) A variable is something that varies or changes We measure this variable in order to learn about whatever we are researching

13A: Measures of central tendency Measures of central tendency are methods that we use to look at the middle or centre point of the data we have collected through research There are three different ways of doing this: Mean: the average of all observations in a set of data ( 𝑥 ) Median: the middle observation in an set of data that is put in order Mode:  the most frequent/common observation in a data set Grouped vs ungrouped data sets Ungrouped means each individual data observation is looked at within the data set Grouped means that the data has been put into different groups or intervals, rather than looking at each data observation separately

Symbols and tables Frequency table: used to count the number of times something is observed ‘Frequency’ just means the number of observations means ‘sum’ or ‘total’ 𝑥 is called ‘x bar’ and is the symbol for ‘mean’ or ‘average’ Observation Frequency Red cars 2 Blue cars 5 Yellow cars 3

The following applies to ungrouped data sets

Mean ( 𝑥 ) To find the mean (average) of the data set: Add all the observations/scores in the data set together (they do not have to be in order) Divide by the number of observations/scores We can write this as: Or, as: where x is the scores and n is the number of scores

Finding the mean: worked example Find the mean of the data set: 6, 2, 4, 3, 4, 5, 4, 5 Add the observations/scores together (in other words, find 𝑥 which is the total/sum of the scores) 𝑥 = 6 + 2 + 4 + 3 + 4 + 5 + 4 + 5 𝑥 = 33 Divide by the number of scores (n) There are 8 scores in this data set (n = 8) 𝑥 = 𝑥 𝑛 𝑥 = 33 8 𝑥 = 4.125

Median To find the median (middle/centre score) of the data set: Arrange the scores in numerical order (smallest to biggest is the easiest way) Put one finger on the smallest score, and a finger on the biggest score, and move your fingers inward one number at a time until they meet at the middle score If there are an odd number of scores, the median is the middle score If there are an even number of scores, find the mean/average ( 𝑥 ) of the two middle scores

Finding the MEDIAN: worked example Find the median of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3 Put the scores in numerical order: 2, 3, 3, 4, 4, 4, 5, 5, 6 Working inwards from the smallest and biggest numbers, we find that the middle score is 4 Therefore, the median of this data set is 4.

Mode To find the mode (most frequent/common score) of the data set: Work through the data set and record how many times each score appears (it might be easier to put them in order first to ensure you don’t miss any) Whichever score appears most frequently/commonly is the mode Note: Sometimes there is no mode – each score appears once only Sometimes there is one clear mode – one number that appears most frequently/commonly Sometimes there is more than one mode

Finding the Mode: worked example Find the mode of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3 (optional, but useful) Put the scores in numerical order: 2, 3, 3, 4, 4, 4, 5, 5, 6 Determine which number (or numbers) appear most commonly In this case, the mode is 4 (it appears three times in this data set)

Calculating the mean, median and mode from a frequency table First, we draw up a table with four columns: Score (x), Frequency (f), Frequency x score (fx), Cumulative frequency (cf) We find the MEAN using this formula: f = frequency, x = the scores We find the MEDIAN by finding the position of each score in cumulative frequency column We then use the formula to find where (at what position) the median will appear, and read this score off the cf column We find the MODE by looking for the score with the highest frequency

Worked example: frequency table This is what the question might look like: Find the mean, median and mode of the data set below. If you were to write this data out as a list, it would be: 4, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8 (i.e. one 4, two 5s, five 6s, four 7s, three 8s) Score (x) Frequency (f) 4 1 5 2 6 7 8 3 Total n

Worked example: frequency table Draw up this table, but add these two extra columns: Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf) 4 1 5 2 6 7 8 3 Total n (fx)

Worked example: frequency table In this column, add the frequencies together from one row to the next (the first number will always be the first frequency) In this column, multiple the score by the frequency Fill in all the data Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf) 4 1 4 x 1 = 4 5 2 5 x 2 = 10 1 + 2 = 3 6 6 x 5 = 30 3 + 5 = 8 7 7 x 4 = 28 8 + 4 = 12 8 3 8 x 3 = 24 12 + 3 = 15 Total 1 + 2 + 5 + 4 + 3 n = 15 4 + 10 + 30 + 28 + 24 (fx) = 96 (not needed)

Worked example: frequency table MEAN Use the formula: 𝑥 = 96 15 𝑥 =6.4 MEDIAN Locate the position of the median using Median position = 15+1 2 = 8, which means that the median is the 8th score Use the cf column to find the 8th score, which is 6 MODE The score with the highest frequency is 6, therefore 6 is the mode Use the data to find the mean, median and mode Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf) 4 1 5 2 10 3 6 30 8 7 28 12 24 15 Total n = 15 (fx) = 96

Questions (ungrouped data) Exercise 13A page 435 questions: 1acd 2 (stem and leaf plot – see ) 3 (frequency tables) 13abcd The stem is the first number, and the leaves are the second number, so for Science, 8 7 3 ǀ 3 becomes 38, 37 and 33. For Maths, 4 ǀ 0 6 8 becomes 40, 46 and 48.

The following applies to Grouped data sets When data is grouped, we lose the original values, because instead of having individual numbers, we are given an interval or group (e.g. 0-10) Therefore, we need to estimate the mean, median and mode using different methods

Mean With class intervals, the individual values are lost. Use midpoints of the intervals into which these values fall. For example, when measuring heights of students in a class, if we found that 4 students had a height between 180 and 185 cm, we have to assume that each of those 4 students is 182.5 cm tall. The formula used for calculating the mean is the same as for data presented in a frequency table  Here x represents the midpoint (or class centre) of each class interval, f is the corresponding frequency and n is the total number of observations in a set. Median The median is found by drawing a cumulative frequency polygon (ogive) of the data and estimating the median from the 50th percentile. Modal class We do not find a mode because exact scores are lost. We can, however, find a modal class. This is the class interval that has the highest frequency.

Worked example: grouped data

Step 1 Draw up this table but add in three columns: ‘midpoint’, ‘midpoint x frequency’ and ‘cumulative frequency’ (the blue is the stuff I have added) Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 5 70-<80 7 80-<90 10 90-<100 12 100-<110 8 110-<120 3 Total (not needed) 45 (n) (fx)

This means the mid point of the class interval (i. e This means the mid point of the class interval (i.e. the middle number between 60 and 70 is 65 etc.) Step 2 Fill in the data Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 65 5 65 x 5 = 325 70-<80 75 7 75 x 7 = 525 5 + 7 = 12 80-<90 85 10 85 x 10 = 850 12 + 10 = 22 90-<100 95 12 95 x 12 = 1140 22 + 12 = 34 100-<110 105 8 105 x 8 = 840 34 + 8 = 42 110-<120 115 3 115 x 3 = 345 42 + 3 = 45 Total (not needed) 45 (n) (fx) = 4025

Step 3 MEAN Use the formula: 𝑥 = 4025 45 𝑥 =89.4 𝑥 = 4025 45 𝑥 =89.4 Therefore, we can say that the mean is ≈ 89.4 (use a wavy equals sign to show that it is approximate as the we had to use intervals rather than individual data) MODAL CLASS The interval with the highest frequency is 90-<100, which is the modal class. Use the data to find the mean, modal class and median Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 65 5 325 70-<80 75 7 525 12 80-<90 85 10 850 22 90-<100 95 1140 34 100-<110 105 8 840 42 110-<120 115 3 345 45 Total 45 (n) (fx) = 4025

1. Draw a combined cumulative frequency histogram (bar graph) MEDIAN 1. Draw a combined cumulative frequency histogram (bar graph) The mid points for each interval go along the bottom (x) axis, and the cumulative frequency (cf) up along the y axis 2. Draw a dot on each corner where the bars meet, and connect the dots with a line (this is called the ogive) 3. Find the middle of the cf axis (which is the last cf value divided by 2  45 ÷ 2 = 22.5) 4. Draw a horizontal line at this point and see where it meets the ogive 5. Draw a vertical line down to meet the data (x) axis 6. This is the approximate median, so we say that the median ≈ 90 (again, use the wavy equals sign) Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 65 5 325 70-<80 75 7 525 12 80-<90 85 10 850 22 90-<100 95 1140 34 100-<110 105 8 840 42 110-<120 115 3 345 45 Total 45 (n) (fx) = 4025

Questions (grouped data) Exercise 13A page 435 questions: 5 8 (multiple choice abcd)