Basic Practice of Statistics - 3rd Edition Chapter 2 Describing Distributions with Numbers BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Numerical Summaries Center of distribution mean median Spread of distribution five-point summary (& interquartile range) standard deviation (& variance) BPS - 3rd Ed. Chapter 2 Chapter 2
Mean (Arithmetic Average) Basic Practice of Statistics - 3rd Edition Mean (Arithmetic Average) Traditional measure of center Notation (“xbar”): Sum the values and divide by the sample size (n) BPS - 3rd Ed. Chapter 2 Chapter 2
Mean Illustrative Example: “Metabolic Rate” Basic Practice of Statistics - 3rd Edition Mean Illustrative Example: “Metabolic Rate” Data: Metabolic rates, 7 men (cal/day) : 1792 1666 1362 1614 1460 1867 1439 BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Median (M) Half of the ordered values are less than or equal to the median value Half of the ordered values are greater than or equal to the median value If n is odd, the median is the middle ordered value If n is even, the median is the average of the two middle ordered values BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Median Example 1 data: 2 4 6 Median (M) = 4 Example 2 data: 2 4 6 8 Median = 5 (average of 4 and 6) Example 3 data: 6 2 4 Median 2 (order the values: 2 4 6 , so Median = 4) BPS - 3rd Ed. Chapter 2 Chapter 2
Location of the Median L(M) Basic Practice of Statistics - 3rd Edition Location of the Median L(M) Location of the median: L(M) = (n+1)÷2 , where n = sample size. Example: If 25 data values are recorded, the Median is located at position (25+1)/2 = 13 in ordered array. BPS - 3rd Ed. Chapter 2 Chapter 2
Median Illustrative Example Basic Practice of Statistics - 3rd Edition Median Illustrative Example Data: Metabolic rates, n = 7: 1792 1666 1362 1614 1460 1867 1439 L(M) = (7 + 1) / 2 = 4 Ordered array: 1362 1439 1460 1614 1666 1792 1867 median Value of median = 1614 BPS - 3rd Ed. Chapter 2 Chapter 2
Comparing the Mean & Median Basic Practice of Statistics - 3rd Edition Comparing the Mean & Median Mean = median when data are symmetrical Mean median when data skewed or have outlier (mean ‘pulled’ toward tail) while the median is more resistant If we switch this: 1362 1439 1460 1614 1666 1792 1867 to this: 1362 1439 1460 1614 1666 1792 9867 the median is still 1614 but the mean goes from 1600 to 2742.9 BPS - 3rd Ed. Chapter 2 Chapter 2
Question The average salary at a high tech company is $250K / year The median salary is $60K. How can this be? Answer: There are some very highly paid executives, but most of the workers make modest salaries BPS - 3rd Ed. Chapter 2
Basic Practice of Statistics - 3rd Edition Spread = Variability Variability the amount values spread above and below the center Can be measured in several ways: range (rarely used) 5-point summary & inter-quartile range variance and standard deviation BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Range Based on smallest (minimum) and largest (maximum) values in the data set: Range = max min The range is not a reliable measure of spread (affected by outliers, biased) BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Quartiles Three numbers which divide the ordered data into four equal sized groups. Q1 has 25% of the data below it. Q2 has 50% of the data below it. (Median) Q3 has 75% of the data below it. BPS - 3rd Ed. Chapter 2 Chapter 2
Obtaining the Quartiles Basic Practice of Statistics - 3rd Edition Obtaining the Quartiles Order the data. Find the median This is Q2 Look at the lower half of the data (those below the median) The “median” of this lower half = Q1 Look at the upper half of the data The “median” of this upper half = Q3 BPS - 3rd Ed. Chapter 2 Chapter 2
Illustrative example: 10 ages AGE (years) values, ordered array (n = 10): 05 11 21 24 27 | 28 30 42 50 52 Q1 Q2 Q3 Q1 = 21 Q2 = average of 27 and 28 = 27.5 Q3 = 42 BPS - 3rd Ed. Chapter 2
Basic Practice of Statistics - 3rd Edition Weight Data: Sorted n = 53 Median: L(M)=(53+1)/2=27 placing it at 165 L(Q1)=(26+1)/2=13.5 placing it between 127 and 128 (127.5) L(Q3) = 13.5 from the top placing it between 185 and 185 Q1 = 127.5 Q2 = 165 Q3 = 185 BPS - 3rd Ed. Chapter 2 Chapter 2
Weight Data: Quartiles Basic Practice of Statistics - 3rd Edition 10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Weight Data: Quartiles Q1 = 127.5 Q2 = 165 Q3 = 185 BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Five-Number Summary minimum = 100 Q1 = 127.5 M = 165 Q3 = 185 maximum = 260 Interquartile Range (IQR) = Q3 Q1 = 57.5 IQR gives spread of middle 50% of the data BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Boxplot Central box spans Q1 and Q3. A line in the box marks the median M. Lines extend from the box out to the minimum and maximum. BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Weight Data: Boxplot Q1 M Q3 min max 100 125 150 175 200 225 250 275 Weight BPS - 3rd Ed. Chapter 2 Chapter 2
Quartile extrapolation Basic Practice of Statistics - 3rd Edition Quartile extrapolation Quartile divides data set into 4 segment: bottom, bottom middle, top middle, upper With small data sets extrapolate values Illustrative data: 2, 4, 6, 8 2 | 4 | 6 | 8 Q1 Q2 Q3 Q1 = average of 2 and 4, which is 3 Q2 = average of 4 and 5, which is 5 Q3 = average of 6 and 8, which is 7 BPS - 3rd Ed. Chapter 2 Chapter 2
Boxplots useful for comparing two groups (text p. 39) BPS - 3rd Ed. Chapter 2
Variances & Standard Deviation Basic Practice of Statistics - 3rd Edition Variances & Standard Deviation The most common measures of spread Based on deviations around the mean Each data value has a deviation, defined as BPS - 3rd Ed. Chapter 2 Chapter 2
Fig 2. 3: Metabolic Rate for 7 men, with their mean ( Fig 2.3: Metabolic Rate for 7 men, with their mean (*) and two deviations shown BPS - 3rd Ed. Chapter 2
Basic Practice of Statistics - 3rd Edition Variance Find the mean Find the deviation of each value Square the deviations Sum the squared deviations: we call this the sum of squares, or SS Divide the SS by n-1 (gives typical squared deviation from mean) BPS - 3rd Ed. Chapter 2 Chapter 2
Basic Practice of Statistics - 3rd Edition Variance Formula BPS - 3rd Ed. Chapter 2 Chapter 2
Standard Deviation Square root of the variance Basic Practice of Statistics - 3rd Edition Standard Deviation Square root of the variance BPS - 3rd Ed. Chapter 2 Chapter 2
Variance and Standard Deviation Illustrative Example Basic Practice of Statistics - 3rd Edition Variance and Standard Deviation Illustrative Example Data: Metabolic rates, 7 men (cal/day) : 1792 1666 1362 1614 1460 1867 1439 BPS - 3rd Ed. Chapter 2 Chapter 2
Variance and Standard Deviation Illustrative Example (cont.) Basic Practice of Statistics - 3rd Edition Variance and Standard Deviation Illustrative Example (cont.) Observations Deviations Squared deviations 1792 17921600 = 192 (192)2 = 36,864 1666 1666 1600 = 66 (66)2 = 4,356 1362 1362 1600 = -238 (-238)2 = 56,644 1614 1614 1600 = 14 (14)2 = 196 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 sum = 0 SS = 214,870 BPS - 3rd Ed. Chapter 2 Chapter 2
Variance and Standard Deviation Illustrative Example (cont.) Basic Practice of Statistics - 3rd Edition Variance and Standard Deviation Illustrative Example (cont.) Notes: (1) Use standard deviation s for descriptive purposes (2) Variance & standard deviation calculated by calculator or computer in practice BPS - 3rd Ed. Chapter 2 Chapter 2
Summary Statistics Two main measures of central location Mean ( ) Median (M) Two main measures of spread Standard deviation (s) 5-point summary (interquartile range) BPS - 3rd Ed. Chapter 2
Choosing Summary Statistics Basic Practice of Statistics - 3rd Edition Choosing Summary Statistics Use the mean and standard deviation for reasonably symmetric distributions that are free of outliers. Use the median and IQR (or 5-point summary) when data are skewed or when outliers are present. BPS - 3rd Ed. Chapter 2 Chapter 2
Example: Number of Books Read Basic Practice of Statistics - 3rd Edition Example: Number of Books Read L(M)=(52+1)/2=26.5 M BPS - 3rd Ed. Chapter 2 Chapter 2
Illustrative example: “Books read” Basic Practice of Statistics - 3rd Edition Illustrative example: “Books read” 5-point summary: 0, 1, 3, 5.5, 99 Note highly asymmetric distribution 0 10 20 30 40 50 60 70 80 90 100 Number of books “xbar” = 7.06 s = 14.43 The mean and standard deviation give false impression with asymmetric data BPS - 3rd Ed. Chapter 2 Chapter 2