Download presentation
Presentation is loading. Please wait.
1
Summary Statistics 9/23/2018 Summary Statistics Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread. 9/23/2018 Summary Statistics HS 167
2
Main Summary Statistics by Type
Central location Mean Median Mode Spread Variance and standard deviation Quartiles and Inter Quartile Range (IQR) Shape Statistical measures of spread (e.g., skewness and kurtosis) are available but are seldom used in practice (not covered) 9/23/2018 Summary Statistics
3
Notation n sample size X variable xi value of individual i
sum all values (capital sigma) Illustrative example (sample.sav), data: n = 10 X = age x1= 21, x2= 42, …, x10= 52 x = … + 52 = 290 9/23/2018 Summary Statistics
4
Sample Mean Illustrative example: n = 10 (data & intermediate calculations on prior slide) 9/23/2018 Summary Statistics
5
Population Mean Same operation as sample mean, but based on entire population (N = population size) Not available in practice, but important conceptually 9/23/2018 Summary Statistics
6
Interpretation of xbar
Sample mean used to predict an observation drawn at random from a sample an observation drawn at random from the population the population mean Gravitational center (balance point) 9/23/2018 Summary Statistics
7
Median – a different kind of average
“Middle value” Covered last week Order data Depth of median is (n+1) / 2 When n is odd middle value When n is even average two middle values Illustrative example, n = 10 median has depth (10+1) / 2 = 5.5 median = average of 27 and 28 = 27.5 9/23/2018 Summary Statistics
8
Median is “robust” Robust resistant to skews and outliers
Summary Statistics 9/23/2018 Median is “robust” Robust resistant to skews and outliers This data set has a mean (xbar) of 1600: This data set has an outlier and a mean of 2743: Outlier The median is 1614 in both instances. The median was not influenced by the outlier. 9/23/2018 Summary Statistics HS 167
9
Mode Mode value with greatest frequency
e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7 Used only in very large data sets 9/23/2018 Summary Statistics
10
Mean, Median, Mode Symmetrical data: mean = median
positive skew: mean > median [mean gets “pulled” by tail] negative skew: mean < median 9/23/2018 Summary Statistics
11
Summary Statistics 9/23/2018 Spread = Variability Variability amount values spread above and below the average Measures of spread Range and inter-quartile range Standard deviation and variance (this week) 9/23/2018 Summary Statistics HS 167
12
Summary Statistics 9/23/2018 Range = max – min The range is rarely used in practice b/c it tends to underestimate population range and is not robust 9/23/2018 Summary Statistics HS 167
13
Standard deviation Deviation = Sum of squared deviations =
Summary Statistics 9/23/2018 Standard deviation Most common descriptive measure of spread Deviation = Sum of squared deviations = Sample variance = Sample standard deviation = 9/23/2018 Summary Statistics HS 167
14
Standard deviation (formula)
Sample standard deviation s is the unbiased estimator of population standard deviation . Population standard deviation is rarely known in practice. 9/23/2018 Summary Statistics
15
Summary Statistics 9/23/2018 New data set (“Metabolic Rates”) This example is not in your lecture notes Metabolic rates (cal/day), n = 7 9/23/2018 Summary Statistics HS 167
16
Metabolic rates showing mean (
Metabolic rates showing mean (*) and deviations of first two observations 9/23/2018 Summary Statistics
17
Standard Deviation Calculation metabolic.sav – introduced slide 15
Summary Statistics 9/23/2018 Standard Deviation Calculation metabolic.sav – introduced slide 15 Observations Deviations Squared deviations 1792 1792 1600 = 192 (192)2 = 36,864 1666 1666 1600 = 66 (66)2 = 4,356 1362 1362 1600 = -238 (-238)2 = 56,644 1614 1614 1600 = 14 (14)2 = 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 SUMS 0* SS = 214,870 * Sum of deviations will always equal zero 9/23/2018 Summary Statistics HS 167
18
Standard Deviation Metabolic data (cont.)
Summary Statistics 9/23/2018 Standard Deviation Metabolic data (cont.) Variance (s2) Standard deviation (s) 9/23/2018 Summary Statistics HS 167
19
General rule for rounding means and standard deviations
Report mean to one additional decimals above that of the data To achieve accuracy, intermediate calculations should carry still an additional decimals Illustrative example Suppose data is recorded with one decimal accuracy (i.e., xx.x) Report mean with two decimal accuracy (i.e., xx.xx) Carry all intermediate calculations with at least three decimal accuracy (i.e., xx.xxx) Even more important: Always use common sense and judgment. 9/23/2018 Summary Statistics
20
TI-30XIIS – about $12 In practice, we often use software or a calculator to check our standard deviation 9/23/2018 Summary Statistics
21
Interpretation of Standard Deviation
Larger standard deviation greater variability s1 = 15 and s2 = 10 group 1 has more variability rule – Normal data only 68% of data with 1 SD of mean, 95% within 2 SD from mean, and 99.7% within 3 SD of mean e.g., if mean = 30 and SD = 10, then 95% of individuals are in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50) Chebychev’s rule – All data at least 75% data within 2 SD of mean e.g., mean = 30 and SD = 10, then at least 75% of individuals in range 30 ± (2)(10) = (10 to 50) 9/23/2018 Summary Statistics
22
Summary Statistics 9/23/2018 Quartiles and IQR Quartiles divide the ordered data into four equally-sized groups Q0 = minimum Q1 = 25th %ile Q2 = 50th %ile (Median) Q3 = 75th %ile Q4 = maximum 9/23/2018 Summary Statistics HS 167
23
gives spread of middle 50% of the data
Summary Statistics 9/23/2018 Rule for quartiles Find the median Q2 Middle of lower half of data set Q1 Middle of upper half of the data Q3 Bottom half | Top half | Q Q Q3 IQR = Q3 – Q1 = 42 – 21 = 21 gives spread of middle 50% of the data 9/23/2018 Summary Statistics HS 167
24
5-Point Summary (sample.sav)
Summary Statistics 9/23/2018 5-Point Summary (sample.sav) Q0 = 5 (minimum) Q1 = 21 (lower hinge) Q2 = 27.5 (median) Q3 = 42 (upper hinge) Q4 = 52 (maximum) Best descriptive statistics for skewed data 9/23/2018 Summary Statistics HS 167
25
Illustrative example (metabolic.sav)
Summary Statistics 9/23/2018 Illustrative example (metabolic.sav) median Bottom half : Q1 = ( ) / 2 = Top half: Q3 = ( ) / 2 = 1729 5-point summary: 1362, , 1614, 1729, 1867 9/23/2018 Summary Statistics HS 167
26
Box-and-whiskers plot (boxplot)
5 point summary + “outside values” Procedure Determine 5-point summary Draw box from Q1 to Q3 Draw Q2 Calculate IQR = Q3 – Q1 Calculate fences FLower = Q1 – 1.5(IQR) FUpper = Q (IQR) Determine if any outside values? If so, plot separately Determine inside values and draw whiskers from box to inside values 9/23/2018 Summary Statistics
27
Boxplot example 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21
5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 FU = 42 + (1.5)(21) = 73.5 No outside above (outside) Upper inside value = 52 FL = 21 – (1.5)(21) = –10.5 No values below (outside) Lower inside value = 5 60 50 40 30 20 10 Upper inside = 52 Q3 = 42 Q1 = 21 Lower inside = 5 Q2 = 27.5 9/23/2018 Summary Statistics
28
Boxplot example 2 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7
5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 FU = 29 + (1.5)(7) = 39.5 One outside (51) Inside value = 31 FL = 22 – (1.5)(7) = 11.5 One outside (3) Inside value = 21 9/23/2018 Summary Statistics
29
Boxplot example 3 (metabolic.sav)
5-point: 1362, , 1614, 1729, 1867 (slide 30) IQR = 1729 – = 279.5 FU = (1.5)(279.5) = None outside Upper inside = 1867 FL = – (1.5)(279.5) = Lower inside = 1362 9/23/2018 Summary Statistics
30
Interpretation of boxplots
Location Position of median Position of box Spread Hinge-spread (box length) = IQR Whisker-to-whisker spread (range or range minus the outside values) Shape Symmetry of box Size of whiskers Outside values (potential outliers) 9/23/2018 Summary Statistics
31
Side-by-side boxplots
Boxplots are especially useful for comparing groups: 9/23/2018 Summary Statistics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.