Statistics 1
How long is a name? To answer this question, we might collect some data on the length of a name.
How long is a name? First we need to establish our target population.
How long is a name? First we need to establish our target population. Let’s say in this mathematics class.
How long is a name? What names should we use?
How long is a name? What names should we use? Names as listed on the roll.
Data
Averaging We call this a central tendency. There are 3 measures which we can use. MEAN MEDIAN MODE
Mean Usually when we say average, we are referring to the mean. To find the mean, we add up all the numbers and divide by how many there are.
Example Find the mean of 4, 0, 2, 1, 6
In Excel we can use the formula =average(highlight cells)
Data on names
Median A median is the middle value when the data is put in order. If there are an odd number of data, the middle is unique. If there is an even number of data, we need to average the two middles.
Example Find the median of 4, 8, 2, 9, 1 First put them in order 1, 2, 4, 8, 9
Example Find the median of 4, 8, 2, 9, 1 First put them in order 1, 2, 4, 8, 9 The middle number is ‘4’
Example Find the median of 4, 8, 2, 9, 1, 6 First put them in order 1, 2, 4, 6, 8, 9 The middle number is ‘4’ and ‘6’ Averaging gives median is 5.
Sort data on Excel or use formula =median(data)
Mode The mode is the most common number. You can have 2 modes but not more than 2.
Example Find the mode of 6, 4, 3, 7, 8, 6, 7, 2
Example Find the mode of 6, 4, 3, 7, 8, 6, 7, 2 There are two modes 6 and 7
Using Excel Formula =mode(data) You must be careful as Excel will only give one mode
Which average is the best? Generally we use the mean as it includes all the data but if we have extreme values, the median is a better measure as it is not affected by extreme values.
Example These are the incomes of a group of university students. $2400, $1500, $2000, $1800, $ Find the best ‘average’.
Example $2400, $1500, $2000, $1800, $ The mean is not representative whereas the median is.
Frequency tables LengthTallyFrequency 3ll2 4llll5 5llll llll llll14 6llll ll7 7llll5 8ll2
Mode is 5 LengthTallyFrequency 3ll2 4llll5 5llll llll llll14 6llll ll7 7llll5 8ll2
Median is also 5 LengthTallyFrequency 3ll2 4llll5 5llll llll llll14 6llll ll7 7llll5 8ll2
Mean is 5.4 LengthTallyFrequency 3ll2 4llll5 5llll llll llll14 6llll ll7 7llll5 8ll2
Calculating the mean by hand
Using the calculator STAT mode Place data in list 1 Place frequency in list 2 CALC, SET, 1Var Xlist list1 1Var Freq list2 Exe 1Var
Measures of spread It is not enough to just give the ‘average’. The mean, median and mode is the same for all 3 sets of data: But the data sets are quite different
Measures of spread Range is (highest number) - (lowest number) For our data set the first names have a range of = 5
Measures of spread Again, if there are extreme values, the range can distort the true spread of the data.
5-number summary We often sort the data into a 5 number summary. The data is split into 4 groups
Example numbers
Example Lowest is 1 Median is 49 Highest is 95
Example Lowest is 1 Lower quartile is 35 Median is 49 Upper quartile is 82 Highest is 95
Example
Example
Example number summary is
For first names in our class The 5-number summary is Lower quartile is 4 Upper quartile is 6 Interquartile range is the difference between quartiles = 2
Statistics so far Central tendencies: Mean = 5.4 Median = 5 Mean = 5 Because the mean and median are about the same, we wouldn’t expect extreme values.
Statistics so far Measures of spread: Range = 5 Interquartile range = 2
Statistics so far 5 - number summary