Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 13 – Univariate data

Similar presentations


Presentation on theme: "Chapter 13 – Univariate data"— Presentation transcript:

1 Chapter 13 – Univariate data

2 What is THIS topic about?
In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to the data to learn about it A data set is a group of numbers that we find from research e.g. survey results, observations of the world ‘Univariate’ means that there is one (‘uni’) variable (‘variate’) A variable is something that varies or changes We measure this variable in order to learn about whatever we are researching

3 13A: Measures of central tendency
Measures of central tendency are methods that we use to look at the middle or centre point of the data we have collected through research There are three different ways of doing this: Mean: the average of all observations in a set of data ( 𝑥 ) Median: the middle observation in an set of data that is put in order Mode:  the most frequent/common observation in a data set Grouped vs ungrouped data sets Ungrouped means each individual data observation is looked at within the data set Grouped means that the data has been put into different groups or intervals, rather than looking at each data observation separately

4 Symbols and tables Frequency table: used to count the number of times something is observed ‘Frequency’ just means the number of observations means ‘sum’ or ‘total’ 𝑥 is called ‘x bar’ and is the symbol for ‘mean’ or ‘average’ Observation Frequency Red cars 2 Blue cars 5 Yellow cars 3

5 The following applies to ungrouped data sets

6 Mean ( 𝑥 ) To find the mean (average) of the data set:
Add all the observations/scores in the data set together (they do not have to be in order) Divide by the number of observations/scores We can write this as: Or, as: where x is the scores and n is the number of scores

7 Finding the mean: worked example
Find the mean of the data set: 6, 2, 4, 3, 4, 5, 4, 5 Add the observations/scores together (in other words, find 𝑥 which is the total/sum of the scores) 𝑥 = 𝑥 = 33 Divide by the number of scores (n) There are 8 scores in this data set (n = 8) 𝑥 = 𝑥 𝑛 𝑥 = 33 8 𝑥 = 4.125

8 Median To find the median (middle/centre score) of the data set:
Arrange the scores in numerical order (smallest to biggest is the easiest way) Put one finger on the smallest score, and a finger on the biggest score, and move your fingers inward one number at a time until they meet at the middle score If there are an odd number of scores, the median is the middle score If there are an even number of scores, find the mean/average ( 𝑥 ) of the two middle scores

9 Finding the MEDIAN: worked example
Find the median of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3 Put the scores in numerical order: 2, 3, 3, 4, 4, 4, 5, 5, 6 Working inwards from the smallest and biggest numbers, we find that the middle score is 4 Therefore, the median of this data set is 4.

10 Mode To find the mode (most frequent/common score) of the data set:
Work through the data set and record how many times each score appears (it might be easier to put them in order first to ensure you don’t miss any) Whichever score appears most frequently/commonly is the mode Note: Sometimes there is no mode – each score appears once only Sometimes there is one clear mode – one number that appears most frequently/commonly Sometimes there is more than one mode

11 Finding the Mode: worked example
Find the mode of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3 (optional, but useful) Put the scores in numerical order: 2, 3, 3, 4, 4, 4, 5, 5, 6 Determine which number (or numbers) appear most commonly In this case, the mode is 4 (it appears three times in this data set)

12 Calculating the mean, median and mode from a frequency table
First, we draw up a table with four columns: Score (x), Frequency (f), Frequency x score (fx), Cumulative frequency (cf) We find the MEAN using this formula: f = frequency, x = the scores We find the MEDIAN by finding the position of each score in cumulative frequency column We then use the formula to find where (at what position) the median will appear, and read this score off the cf column We find the MODE by looking for the score with the highest frequency

13 Worked example: frequency table
This is what the question might look like: Find the mean, median and mode of the data set below. If you were to write this data out as a list, it would be: 4, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8 (i.e. one 4, two 5s, five 6s, four 7s, three 8s) Score (x) Frequency (f) 4 1 5 2 6 7 8 3 Total n

14 Worked example: frequency table
Draw up this table, but add these two extra columns: Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf) 4 1 5 2 6 7 8 3 Total n (fx)

15 Worked example: frequency table
In this column, add the frequencies together from one row to the next (the first number will always be the first frequency) In this column, multiple the score by the frequency Fill in all the data Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf) 4 1 4 x 1 = 4 5 2 5 x 2 = 10 1 + 2 = 3 6 6 x 5 = 30 3 + 5 = 8 7 7 x 4 = 28 8 + 4 = 12 8 3 8 x 3 = 24 = 15 Total n = 15 (fx) = 96 (not needed)

16 Worked example: frequency table
MEAN Use the formula: 𝑥 = 𝑥 =6.4 MEDIAN Locate the position of the median using Median position = = 8, which means that the median is the 8th score Use the cf column to find the 8th score, which is 6 MODE The score with the highest frequency is 6, therefore 6 is the mode Use the data to find the mean, median and mode Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf) 4 1 5 2 10 3 6 30 8 7 28 12 24 15 Total n = 15 (fx) = 96

17 Questions (ungrouped data)
Exercise 13A page 435 questions: 1acd 2 (stem and leaf plot – see ) 3 (frequency tables) 13abcd The stem is the first number, and the leaves are the second number, so for Science, ǀ 3 becomes 38, 37 and 33. For Maths, 4 ǀ becomes 40, 46 and 48.

18 The following applies to Grouped data sets
When data is grouped, we lose the original values, because instead of having individual numbers, we are given an interval or group (e.g. 0-10) Therefore, we need to estimate the mean, median and mode using different methods

19 Mean With class intervals, the individual values are lost. Use midpoints of the intervals into which these values fall. For example, when measuring heights of students in a class, if we found that 4 students had a height between 180 and 185 cm, we have to assume that each of those 4 students is cm tall. The formula used for calculating the mean is the same as for data presented in a frequency table  Here x represents the midpoint (or class centre) of each class interval, f is the corresponding frequency and n is the total number of observations in a set. Median The median is found by drawing a cumulative frequency polygon (ogive) of the data and estimating the median from the 50th percentile. Modal class We do not find a mode because exact scores are lost. We can, however, find a modal class. This is the class interval that has the highest frequency.

20 Worked example: grouped data

21 Step 1 Draw up this table but add in three columns: ‘midpoint’, ‘midpoint x frequency’ and ‘cumulative frequency’ (the blue is the stuff I have added) Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 5 70-<80 7 80-<90 10 90-<100 12 100-<110 8 110-<120 3 Total (not needed) 45 (n) (fx)

22 This means the mid point of the class interval (i. e
This means the mid point of the class interval (i.e. the middle number between 60 and 70 is 65 etc.) Step 2 Fill in the data Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 65 5 65 x 5 = 325 70-<80 75 7 75 x 7 = 525 5 + 7 = 12 80-<90 85 10 85 x 10 = 850 = 22 90-<100 95 12 95 x 12 = 1140 = 34 100-<110 105 8 105 x 8 = 840 = 42 110-<120 115 3 115 x 3 = 345 = 45 Total (not needed) 45 (n) (fx) = 4025

23 Step 3 MEAN Use the formula: 𝑥 = 4025 45 𝑥 =89.4
𝑥 = 𝑥 =89.4 Therefore, we can say that the mean is ≈ 89.4 (use a wavy equals sign to show that it is approximate as the we had to use intervals rather than individual data) MODAL CLASS The interval with the highest frequency is 90-<100, which is the modal class. Use the data to find the mean, modal class and median Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 65 5 325 70-<80 75 7 525 12 80-<90 85 10 850 22 90-<100 95 1140 34 100-<110 105 8 840 42 110-<120 115 3 345 45 Total 45 (n) (fx) = 4025

24 1. Draw a combined cumulative frequency histogram (bar graph)
MEDIAN 1. Draw a combined cumulative frequency histogram (bar graph) The mid points for each interval go along the bottom (x) axis, and the cumulative frequency (cf) up along the y axis 2. Draw a dot on each corner where the bars meet, and connect the dots with a line (this is called the ogive) 3. Find the middle of the cf axis (which is the last cf value divided by 2  45 ÷ 2 = 22.5) 4. Draw a horizontal line at this point and see where it meets the ogive 5. Draw a vertical line down to meet the data (x) axis 6. This is the approximate median, so we say that the median ≈ 90 (again, use the wavy equals sign) Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx) Cumulative frequency (cf) 60-<70 65 5 325 70-<80 75 7 525 12 80-<90 85 10 850 22 90-<100 95 1140 34 100-<110 105 8 840 42 110-<120 115 3 345 45 Total 45 (n) (fx) = 4025

25 Questions (grouped data)
Exercise 13A page 435 questions: 5 8 (multiple choice abcd)


Download ppt "Chapter 13 – Univariate data"

Similar presentations


Ads by Google