Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website
Knowledge Objectives What is meant by a resistant measure? Two reasons why we use squared deviations rather just average deviations from the mean What is meant by degrees of freedom”
Construction Objectives Identify situations in which the mean is the most appropriate measure of center and situations in which the median is the most appropriate measure Given a data set: –Find the quartiles –Find the five-number summary –Compute the mean and median as measures of center –Compute the interquartile range (IQR) –Use the 1.5 IQR rule to identify outliers –Compute the standard deviation and variance as measures of spread
Construction Objectives cont Identify situations in which the standard deviation is the most appropriate measure of spread and situations in which the interquartile range is the most appropriate measure Explain the effect of a linear transformation of a data set on the mean, median, and standard deviation of the set Use numerical and graphical techniques to compare two or more data sets
Vocabulary Mean – the average value Median – the middle value (in an ordered list) Resistant measure – a measure (statistic or parameter) that is not sensitive to the influence of extreme observations Mode – the most frequent data value Range – difference between the largest and smallest observations P th percentile – p percent of the observations(in an ordered list) fall below at or below this number Quartile – multiples of 25 th percentile (Q1 – 25 th ; Q2 –50 th or median; Q3 – 75 th ) Five number summary – the minimum, Q1, Median, Q3, maximum
Vocabulary cont Boxplot – graphs the five number summary and any outliers Interquartile range (IQR) – where IQR = Q3 – Q1 Outlier – a data value that lies outside the interval [Q1 – 1.5 IQR, Q IQR] Variance – the average of the squares of the deviations from the mean Standard Deviation – the square toot of the variance Degrees of freedom – the number of independent pieces of information that are included in your measurement Linear transformation – changes the data in the form of x new = a + bx
Measures of Center Numerical descriptions of distributions begin with a measure of its “center” If you could summarize the data with one number, what would it be? Mean: The “average” value of a dataset Median: The “middle” value of an ordered dataset Arrange observations in order min to max Locate the middle observation, average if needed.
Mean vs Median The mean and the median are the most common measures of center If a distribution is perfectly symmetric, the mean and the median are the same The mean is not resistant to outliers The mode, the data value that occurs the most often, is a common measure of center for categorical data You must decide which number is the most appropriate description of the center... Mean Median Applet meanmedian.html Use the mean on symmetric data and the median on skewed data or data with outliers
Distributions Parameters Skewed Left: (tail to the left) Mean substantially smaller than median (tail pulls mean toward it) Mean < Median < Mode Mode Median Mean
Distributions Parameters Symmetric: Mean roughly equal to median Mean ≈ Median ≈ Mode Mode Median Mean
Distributions Parameters Skewed Right: (tail to the right) Mean substantially greater than median (tail pulls mean toward it) Mean > Median > Mode Mode Median Mean
Central Measures Comparisons Measure of Central Tendency ComputationInterpretationWhen to use Mean μ = (∑x i ) / N x‾ = (∑x i ) / n Center of gravity Data are quantitative and frequency distribution is roughly symmetric Median Arrange data in ascending order and divide the data set into half Divides into bottom 50% and top 50% Data are quantitative and frequency distribution is skewed Mode Tally data to determine most frequent observation Most frequent observation Data are categorical or the most frequent observation is the desired measure of central tendency
Example 1 Which of the following measures of central tendency resistant? 1.Mean 2.Median 3.Mode Not resistant Resistant
Example 2 Given the following set of data: 70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51, 56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52 What is the mean? What is the median? What is the mode? What is the shape of the distribution? , 51, 56 Symmetric (tri-modal)
Example 3 Given the following types of data and sample sizes, list the measure of central tendency you would use and explain why? Sample of 50 Sample of 200 Hair color Height Weight Parent’s Income Number of Siblings Age Does sample size affect your decision? mode mean median mean Not in this case, but the larger the sample size, might allow use to use the mean vs the median
Sample Data Consider the following test scores for a small class: Plot the data and describe the SOCS: What number best describes the “center”? What number best describes the “spread’? Shape? Outliers? Center? Spread?
Day 1 Summary and Homework Summary –Three characteristics must be used to describe distributions (from histograms or similar charts) Shape (uniform, symmetric, bi-modal, etc) Center (mean, median, mode measures) Spread (variance – next lesson) –Median is resistant to outliers; mean is not! –Use Mean for symmetric data –Use Median for skewed data (or data with outliers) –Use Mode for categorical data Homework –pg 74 – 75: problems 27-31