1.3 Describing Quantitative Data with Numbers
Section 1.3 After this section, you should be able to: Measure center with mean and median Measure spread with standard deviation and interquartile range Identify outliers Construct a boxplot using the five number summary Calculate numerical summaries with technology
How long do people spend travelling to work? The answer depends on where they live. Below are the travel times in minutes for 15 workers in North Carolina chosen by random by the Census Bureau: 30 20 10 40 25 60 15 5 12 5 1 000025 2 005 3 00 4 6 What is the mean? What is the median? Mean = 22.5 Median = 20
Measuring Center: The Mean To find the mean, π₯ (pronouned x-bar) of the set of observations: Add their values Divide by the number of observations Suppose we have n observations that are π₯ 1 , π₯ 2 , π₯ 3 , β¦, π₯ π , their mean is: π₯ = π π’π ππ πππ πππ£ππ‘ππππ π = π₯ 1 + π₯ 2 , +,+β¦+ π₯ π π Compact Notation: π₯ = π₯ π π
Measuring Center: The Median The median M is the midpoint of the distribution To find the median of the distribution: Arrange all observations from smallest to largest If the number of observations is n odd, the median M is the center observation in the ordered list. If the number of observations is n even, the median M is the average of the two center observations in the ordered list.
Comparing the Mean and the Median The mean and median measure center in different ways, and both are useful Mean: βaverageβ value Median: βtypicalβ value Relationship between mean and median The mean and median of a roughly symmetric distribution are close together If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is usually farther out in the long tail than is the median.
Example: Use the data below to calculate the mean and median of commuting time (in minutes) of 20 randomly selected New York workers. 10 30 5 25 40 20 15 85 65 60 45 5 1 005555 2 0005 3 00 4 005 6 7 8 Mean = 31.25 Median = 22.5 π₯ = 10+30+5+25+β¦+45 20 = 625 20 =31.25
Measuring Spread: Range and Interquartile Range (IQR) Measures variability Single number that represents distance between maximum and the minimum values Max β Min = range βThe data values range from 5 mins to 85 mins.β β NO, do no write this! Instead, βThe data values vary from 5 mins to 85 mins and the range is 80 mins.β
Interquartile Range (IQR)
Interquartile Range (IQR) How to calculate quartiles: Arrange the observations in increasing order and locate median M. The first quartile πΈ π , is the median of the observations located to the left of median in ordered list. The third quartile πΈ π , is the median of the observations located to the right of median in ordered list. The interquartile range (IQR) is defined as: πΌππ = π 3 β π 1
Identifying Outliers 1.5 x IQR Rule for Outliers Interquartile range is a rule of thumb for identifying outliers. 1.5 x IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.
North Carolina Workers Example 5 10 10 10 10 12 15 20 20 25 30 30 40 40 60 π 1 π 3 Median πΌππ =π 3 β π 1 =30β10=20 1.5 ΓπΌππ =1.5 20 =30 Any value greater than this is an outlier π 3 +1.5(πΌππ )=30+30=60 Any value less than this is an outlier. π 1 β1.5(πΌππ )=10β30=β20