Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 41. 2 In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries.

Similar presentations


Presentation on theme: "Chapter 41. 2 In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries."— Presentation transcript:

1 Chapter 41

2 2 In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries to look at central location and spread.

3 Chapter 43 Summary Statistics Central location statistics –Mean –Median –Mode Spread statistics –Range –Interquartile range (IQR) –Variance and standard deviation Shape statistics exist but are seldom used in practice (not covered)

4 Chapter 44 Notation n  sample size X  variable (e.g., ages of subjects) x i  value for individual i   sum all values (“capital sigma”) Example: Let X = AGE (n = 10): 21 42 5 11 30 50 28 27 24 52 x 1 = 21, x 2 = 42, …, x 10 = 52  x i = x 1 + x 2 + … + x 10 = 21 + 42 + … + 52 = 290

5 Chapter 45 Central Location: Sample Mean For the data on the previous slide: Most common measure of central location

6 Chapter 46 Example: Sample Mean The mean is the gravitational center of a batch of numbers

7 Chapter 47 Gravitational Center A skew tips the distribution causing the mean to shift toward the tail

8 Chapter 48 Uses of the Sample Mean Predicts value of an observation drawn at random from the sample Predicts value of an observation drawn at random from the population Predicts population mean µ

9 Chapter 49 Population Mean Same operation as sample mean applied to entire population (N ≡ population size) Not readily (never?) available in practice but conceptually important

10 Chapter 410 Central Location: Median The value with a depth of (n+1) / 2 When n is even → median is obvious When n is even → average the two middle values Example (below): Depth (M) = (10+1) / 2 = 5.5 Median = Average (27 and 28) = 27.5 05 11 21 24 27 28 30 42 50 52  median Average the adjacent values: M = 27.5

11 Chapter 411 More Examples Example A: 2 4 6 Median = 4 Example B: 2 4 6 8 Median = 5 (average of 4 and 6) Example C: 6 2 4 Median  2 (Values must be ordered first)

12 Chapter 412 The Median is Robust This data set has x-bar = 1636: 1362 1439 1460 1614 1666 1792 1867 The median is 1614 in both instances, The median is resistant to skews and outlier Same data set with a data entry error highlighted. 1362 1439 1460 1614 1666 1792 9867 This data has x-bar = 2743

13 Chapter 413 Mode Most frequent value in the dataset This data set has a mode of 7 {4, 7, 7, 7, 8, 8, 9} This data set has no mode {4, 6, 7, 8} (each point appears once) The mode is useful only in large data sets with repeating values

14 Chapter 414 Comparison of Mean, Median, Mode Mean gets “pulled” by tail: mean = median → symmetrical mean > median → positive skew mean < median → negative skew

15 Chapter 415 Spread ≡ extent to which data vary around middle point Site 1| |Site 2 --------------- 42|2| 8|2| 2|3|234 86|3|6689 2|4|0 |4| |5| |5| |6| 8|6| ×10 particulates in air (μg/m 3 ) Sites have similar central locations medians = 36 and 38, respectively Site 1 exhibits much greater spread (visually)

16 Chapter 416 Spread: Range Range = maximum – minimum Illustrative example: Site 1 range = 68 – 22 = 46 Site 2 range = 40 – 32 = 8 The sample range is not a good measure of spread: tends to underestimate population range Always supplement the range with at least one addition measure of spread Site 1| |Site 2 ---------------- 42|2| 8|2| 2|3|234 86|3|6689 2|4|0 |4| |5| |5| |6| 8|6| ×10

17 Chapter 417 Spread: Interquartile Range Quartile 1 (Q1): marks bottom quarter of data = middle of the lower half of the data set Quartile 3 (Q3): marks top quarter of data = middle of the top half of data set Interquartile Range (IQR) = Q3 – Q1 covers middle 50% of \distribution 05 11 21 24 27 28 30 42 50 52    Q1 median Q3 Q1 = 21, Q3 = 42, and IQR = 42 – 21 = 21

18 Chapter 418 Five-Point Summary Q0 (the minimum) Q1 (25 th percentile) Q2 (median) Q3 (75 th percentile) Q4 (the maximum) 05 11 21 24 27 28 30 42 50 52    Q1 median Q3 5 point summary = 5, 21, 27.5, 42, 52

19 Chapter 419 Quartiles: Tukey’s Hinges Data: metabolic rates (cal/day), n = 7 1362 1439 1460 1614 1666 1792 1867  median When n is odd, include the median in both “halves” Bottom half: 1362 1439 1460 1614 Top half: 1614 1666 1792 1867 Q1 = 1449.5 Q3 = 1729

20 Chapter 420 §4.6 Boxplots 1.Draw box from Q1 to Q3 2.Draw line for median. 3.Calculate fences: Fence Lower = Q1 – 1.5(IQR) Fence Upper = Q3 + 1.5(IQR) 4.Do not draw fences 5.Any values outside the fences (outside values). are plotted separately. 6.Determine most extreme values still inside the fences (inside values) 7.Draw whiskers quartiles to inside values

21 Chapter 421 Example 1: Boxplot 1.5 pt summary: {5, 21, 27.5, 42, 52}; Box from 21 to 42 with line @ 27.5 2.IQR = 42 – 21 = 21. F U = Q3 + 1.5(IQR) = 42 + (1.5)(21) = 73.5 F L = Q1 – 1.5(IQR) = 21 – (1.5)(21) = –10.5 3.No values above upper fence None values below lower fence 4.Upper inside value = 52 Lower inside value = 5 Draws whiskers Data: 05 11 21 24 27 28 30 42 50 52 60 50 40 30 20 10 0 Upper inside = 52 Q3 = 42 Q1 = 21 Lower inside = 5 Q2 = 27.5

22 Chapter 422 Example 2: Boxplot Data: 3 21 22 24 25 26 28 29 31 51 1.5-point summary: 3, 22, 25.5, 29, 51; hinges at 22 and 29 2.IQR = 29 – 22 = 7 F U = Q3 + 1.5(IQR) = 29 + (1.5)(7) = 39.5 F L = Q1 – 1.5(IQR) = 22 – (1.5)(7) = 11.6 3.One upper outside value (51) One lower outside value (3) 4.Upper inside value is 31 Lower inside value is 21 Draw whiskers

23 Chapter 423 Example 3: Boxplot Seven metabolic rates (cal / day): 1362 1439 1460 1614 1666 1792 1867 1.5-point summary: 1362, 1449.5, 1614, 1729, 1867 2.IQR = 1729 – 1449.5 = 279.5 F U = Q3 + 1.5(IQR) = 1729 + 1.5(279.5) = 2148.25 F L = Q1 – 1.5(IQR) = 1449.5 –1.5(279.5) = 1030.25 3. None outside 4. Inside values: 1867 and 1362

24 Chapter 424 Boxplots: Interpretation Central location: position of median and box (IQR) Spread: Hinge-spread (IQR), whisker spread, range Shape: symmetry of median within box and box within whiskers, tail length (kurtosis), outside values

25 Chapter 425 Spread: Standard Deviation The standard deviation is the most popular measure of spread σ ≡ population standard deviation s = sample standard deviation Based on deviations around the mean

26 Chapter 426 Deviations Deviation = distance from the mean = This example shows a deviation of −3 for the data point 33 It show a deviation of 4 for data point 40

27 Chapter 427 “Sum of squares” ObservationDeviationsSq. deviations 36 36  36 = 0 0 2 = 0 38 38  36 = 2 2 2 = 4 39 39  36 = 3 3 2 = 9 40 40  36 = 4 4 2 = 16 36 36  36 = 0 0 2 = 0 34 34  36 =  2  2 2 = 4 33 33  36 =  3  3 2 = 9 32 32  36 =  4  4 2 = 16 SUMS  0 SS = 58

28 Chapter 428 Sum of Squares (SS), variance (s 2 ), Standard Deviation (s)

29 Chapter 429 Variance & Standard Deviation Sample variance Standard deviation

30 Chapter 430 Interpretation of Sample Standard Deviation s Measure spread Estimator of population standard deviation  68-95-99.7 rule (Normal distributions) Chebychev’s rule (all distributions)

31 Chapter 431 68-95-99.7 Rule Applies to Normal distributions only! 68% of values within μ ± σ 95% within μ ± 2σ 99.7% within μ ± 3σ Example: Normal distribution with μ = 30 and σ = 10 : 68% of values in 30 ± 10 = 20 to 40 95% in 30 ± (2)(10) = 10 to 50 99.7% in 30 ± (3)(10) = 0 to 60

32 Chapter 432 Chebychev’s Rule Applies to all distributions At least 75% of the values within μ ± 2σ Example: Distribution with μ = 30 and σ = 10 has at least 75% of the values within 30 ± (2)(10) = 30 ± 20 = 10 to 50

33 Chapter 433 Rounding There is no single rule for rounding. The number of significant digits should reflect the precision of the measurement U se judgment and “be kind to your reader” Rough guide: carry at least four significant digits during calculations & round as final stepsignificant digits

34 Chapter 434 Choosing Summary Statistics Always report a measure of central location, a measure of spread, and the sample size Symmetrical distributions  report the mean and standard deviation Asymmetrical distributions  report the 5- point summaries (or median and IQR)

35 Chapter 435 Software and Calculators Use ‘em


Download ppt "Chapter 41. 2 In Chapter 3… … we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical summaries."

Similar presentations


Ads by Google