Ch. 18- Descriptive Statistics
Continuous data is data that can take on an infinite number of values. ie.- heights of students, or speeds of cars. As opposed to discrete data, such as number of basketball shots made. We often organize continuous data using a histogram to show frequency: frequency range of values
Measures of Central Tendency The mean of a data set is the arithmetic average: is a data value is the number of data values in the sample is the mean of a sample is the mean of a population The median is the middle value of an ordered set. If there is an odd number of data, the median is one of the data. If there is an even number of data, the median is the average of the two middle values.
The mode is the most frequently occurring value in a set. If two values occur with the same frequency, the set is bimodal. Ex. 1 The table shows the number of aces served by tennis players in their first set of a tournament. Determine the mean number of aces in the first set. Number of Aces 1 2 3 4 5 6 Frequency 11 18 13 7
in the table. Calculate the mean, median, and mode. Ex. 2 In a class of 20 students the results of a spelling test out of 10 are shown in the table. Calculate the mean, median, and mode. Score Number of Students 5 1 6 2 7 4 8 9 10 Total 20 Mean = Since there are 20 scores, the median is the average of the 10th and 11th scores. The 10th and 11th students both have a score of 8. So the median is 8. The score with the highest frequency is 8, so the mode is also 8.
the category to represent all scores within that interval. Grouped Data When data has been gathered into categories, we can use the midpoint of the category to represent all scores within that interval. We are assuming that the scores within each class are evenly distributed throughout that interval. The mean calculated will therefore be an approximation to the true value: Ex. 3 Find the approximate mean of the ages of bus drivers data, to the nearest year. midpoint: 23 28 33 38 43 48 53 age 21-25 26-30 31-35 36-40 41-45 46-50 51-55 frequency 11 14 32 27 29 17 7 Mean =
Ex. 4 The data shown gives the weights of 120 male footballers. Construct a cumulative frequency distribution table. Represent the data on an ogive. Use your graph to estimate the i) median weight ii) number of men weighing less than 73 kg iii) number of men weighing more than 92 kg Weight (in kg) Frequency cumulative frequency 2 3 5 12 17 14 31 19 50 37 87 22 109 8 117 119 1 120 Weight (in kg) Frequency 2 3 12 14 19 37 22 8 1
b) c) Since there are 120 football players, the median is the average of the 60th and 61st players’ weights. From the ogive, you can see there are approximately 25 players who weigh less than 73 kg. From the ogive, you can see there are approximately 120-112 = 8 players who weight more than 92 kg.
Measuring the Spread of Data The range for a given set of data is the difference between the maximum and the minimum data values. The median divides the ordered data into two halves. Each of these halves can be divided in half again into quartiles. The middle value of the lower half is called the lower quartile. So 25% of the data have a value less than or equal to the lower quartiles. The middle value of the upper half is called the upper quartile. So 25% of the data have a value greater than or equal to the upper quartile. The interquartile range is the range of the middle half of the data. IQR = Q3 – Q1
Ex. 5 For the data set: 6,4,7,5,3,4,2,6,5,7,5,3,8,9,3,6,5 find the: a) median, b) lower quartile, c) upper quartile, and d) interquartile range. Put the data set in order first: 2,3,3,3,4,4,5,5,5,5,6,6,6,7,7,8,9 So the median is 5. Lower half Upper half Q1 = median of lower half = 2,3,3,3,4,4,5,5 5,6,6,6,7,7,8,9 Q3 = median of upper half =
Box-and-Whisker Plots minimum = The box represents the “middle” half of the data set. lower quartile = median = The lower whisker represents the 25% of the data with the smallest values. upper quartile = maximum = The upper whisker represents the 25% of the data with the greatest values.
A percentile is the score, below which a certain percentage of the data lies. The lower quartile is the 25th percentile. The median is the 50th percentile. The upper quartile is the 75th percentile.
The deviation of a data value x from the mean is given by: For a Sample: The variance is: The standard deviation is: where n is the sample size. For a Population: Use to estimate the value of Use to estimate the value of
The counts for the number of blemished apples are: Ex. 6 A greengrocer chain is to purchase apples from two different suppliers. They take six random samples of 50 apples to examine them for blemishes. The counts for the number of blemished apples are: Wholesaler Redapp 5 17 15 3 9 11 Wholesaler Pureapp 10 13 12 Find the means and standard deviations for each apple sample. What do these statistics tell us? For Wholesaler Redapp: Total # of blemished apples = 60 # of samples 6 blemishes/ sample x 5 -5 25 17 7 49 15 3 -7 9 -1 1 11 60 Total 150
For Wholesaler Pureapp: Total # of blemished apples = 69 =11.5 Wholesaler Redapp 5 17 15 3 9 11 Wholesaler Pureapp 10 13 12 For Wholesaler Pureapp: Total # of blemished apples = 69 =11.5 # of samples 6 blemishes/ sample x 10 -1.5 2.25 13 1.5 12 0.5 0.25 11 -0.5 69 Total 5.5 On the average, Redapp supplied apples with fewer blemishes, however the number of blemishes with Purapp had less variability.
Ex. 7 A random sample of 48 sheep was taken from a flock of over 2000 sheep. The sample mean of their weights is 23.6 kg with variance 4.34 kg. Find the standard deviation of the sample. Find an unbiased estimation of the mean weight of sheep in the flock. Find an unbiased estimation of the standard deviation of the population from which the sample was taken. (Old IB Standards) a) b) Use the sample mean of as the estimate for c)
s is the standard deviation x is any score, is the mean. f is the frequency of each score. For grouped data: (This is the formula for standard deviation that’s on your reference sheet.) Ex. 8 Find the standard deviation of the distribution: Score 1 2 3 4 5 Frequency x f 1 -2 4 2 -1 3 12 8 5 Total 30
Normal bell-curve distribution Notice the curve has inflection points at
Ex. 9 A sample of 200 cans of peaches was taken from a warehouse and the contents of each can measured for net weight. The sample mean was 486 g with standard deviation 6.2 g. What proportion of the cans might lie within: a) 1 standard dev. from the mean b) 3 standard dev. from the mean Approximately 68% of the cans would have contents within 1 standard deviation of the mean. So about 68% lie within: b) Approximately 99.7% of the cans would have contents within 3 standard deviations of the mean. So nearly all lie within:
Use the STAT PLOT menu to graph the box-and-whisker plot Using your calculator, enter the following data into L1: 5,2,3,3,6,4,5,3,7,5,7,1,8,9,5 Use the STATS menu- 1-Var Stats to get mean, mode, standard deviation, etc. Use the STAT PLOT menu to graph the box-and-whisker plot for this data. Be sure you know how to use your calculator to do these!