Data Distributions
Essential Question: How can you use shape, center, and spread to characterize a data distribution?
Data Distribution A set of numerical data that you can graph using a data display that involves a number line. Ex: line plot, histogram, or box plot. The graph will reveal the shape of the distribution.
Seeing the Shape of a Data Distribution Baby Birth Month Birth Weight (kg) Mother’s age 1 5 3.3 28 2 7 3.6 31 3 11 3.5 33 4 3.4 35 10 3.7 39 6 30 29 8 3.2 9 32 12 13 14 15 34 16 3.8 17 18 19 20 Make a line plot for the distribution of birth months. Make a line plot for the distribution of birth weights. Make a line plot for the distribution of mothers’ ages.
Reflect 1a. Describe the shape of the distribution of birth months.
Reflect 1b. Describe the shape of the distribution of birth weights.
Reflect 1c. Describe the shape of the distribution of mothers’ age.
Understanding Shape, Center, and Spread Data distributions can have various shapes. These shapes have names in Statistics.
Uniform Distribution The shape is basically level. It looks like a rectangle.
Normal Distribution Mound in the middle with symmetric tails at each end. Looks bell shaped.
Skewed distribution Mounded by not symmetric because one “tail” is much longer than the other.
Other (Mixed)
Distribution Center and Spread Mean
Distribution Center and Spread Median
Distribution Center and Spread Standard Deviation
Distribution Center and Spread Interquartile Range (IQR) – tells you how spread out the middle or (50%) of data are.
Reflect 2a. Describe the shape of each distribution that you made in the Example, using the vocabulary you just learned.
Reflect 2b. When the center and the spread of a distribution are reported, they are generally given either as the mean and standard deviation or as the median and IQR. Why do these pairings make sense?
Relating Center and Spread to Shape Baby Birth Month Birth Weight (kg) Mothers’ age 1 5 3.3 28 2 7 3.6 31 3 11 3.5 33 4 3.4 35 10 3.7 39 6 30 29 8 3.2 9 32 12 13 14 15 34 16 3.8 17 18 19 20 Calculate the following for both birth weight and mothers’ ages. Mean Median Standard Deviation IQR
Reflect 3a. What do you notice about the mean and median for the symmetric distribution (baby weights) as compared with the mean and median for the skewed distribution (mothers’ ages)? Explain why this happens.
Reflect 3b. One way to compare the spread of two distributions is to find the ratio (expressed as a percent) of the standard deviation to the mean for each distribution. Another way to find the ratio (expressed as a percent) of the IQR to the median. Calculate these ratios, rounding each to the nearest percent if necessary, for the symmetric and the skewed distribution. What do you observe when you compare the corresponding ratios? Why does this make sense?
Reflect 3c. Which measures of center and spread would you report for the symmetric distribution? For the skewed distribution? Explain your reasoning.
Making and Analyzing a Histogram We will use Excel to create a histogram using the data of baby weights.
Reflect 4a. By examining the histogram, determine the percent of the data that fall within 1 Standard Deviation (s=0.14) of the mean ( 𝑥 =3.5). That is, determine the percent of the data in the interval 3.5−0.14<𝑥<3.5+0.14, or 3.36<𝑥<3.64 Explain your reasoning.
Reflect 4b. Suppose one of the baby weights is chosen at random. By examining the histogram, determine the probability that the weight is more than 1 standard deviation (s=0.14) above the mean ( 𝑥 =3.5). That is, determine the probability that the weight is in the interval 𝑥>3.5+0.14, 𝑜𝑟 𝑥>3.64. Explain your reasoning.
Making and Analyzing a Box Plot
Reflect 5a How does the box plot show the distribution is skewed right?
Reflect 5b. Suppose one of the mothers’ ages is chosen at random. Based on the box plot and not the original set of data, what can you say is the approximate probability that the age falls between the median, 30.5 and the third quartile, 32.5? Explain your reasoning.
Making and Analyzing a Box Plot with outliers
Reflect 5c. A data value 𝑥 is considered to be an outlier if 𝑥< 𝑄 1 −1.5 𝐼𝑄𝑅 𝑜𝑟 𝑥> 𝑄 3 +1.5(𝐼𝑄𝑅) Explain why a mother’s age of 39 is an outlier for this data set. Redraw the box plot using the option for showing outliers. How does the box plot change?