Download presentation
Presentation is loading. Please wait.
Published byAmi Wilkins Modified over 9 years ago
1
M07-Numerical Summaries 1 1 Department of ISM, University of Alabama, 1995-2003 Lesson Objectives Learn when each measure of a “typical value” is appropriate. Also called “central tendency” or “location.” Learn when each measure of a “variation” are appropriate. Also called “scatter” or “dispersion.” See how these measures relate to statistical inference, which will covered later in the course.
2
M07-Numerical Summaries 1 2 Department of ISM, University of Alabama, 1995-2003 Statistics is the science of collecting organizing summarizing interpreting DATA for making decisions.
3
M07-Numerical Summaries 1 3 Department of ISM, University of Alabama, 1995-2003 Organize / Summarize Data Graphical Graphical Numerical Numerical
4
M07-Numerical Summaries 1 4 Department of ISM, University of Alabama, 1995-2003 Key Features of Data Distributions Shape Typical Value Spread Outliers This section covers these two.
5
M07-Numerical Summaries 1 5 Department of ISM, University of Alabama, 1995-2003 Measures of Location Give “middle” or “typical” values or “central tendency.” Measures of Variation Describe “spread” or “scatter” or “dispersion” in the data.
6
M07-Numerical Summaries 1 6 Department of ISM, University of Alabama, 1995-2003 Measures of Location 1. Mean the “center of gravity” of the data (histogram).
7
M07-Numerical Summaries 1 7 Department of ISM, University of Alabama, 1995-2003 formula for mean Sample Mean = Sum of observations divided by sample size X i n X = X 1 + X 2 + ··· +X n n =
8
M07-Numerical Summaries 1 8 Department of ISM, University of Alabama, 1995-2003 The mean is ________________ to extreme values (outliers).
9
M07-Numerical Summaries 1 9 Department of ISM, University of Alabama, 1995-2003 2. Median - midpoint of distribution At least half of the observations are at or less than the median, and at least half are at or greater than the median.
10
M07-Numerical Summaries 1 10 Department of ISM, University of Alabama, 1995-2003 Note: For n observations, the median is located at the n + 1 2 in the ordered sample. -th observation
11
M07-Numerical Summaries 1 11 Department of ISM, University of Alabama, 1995-2003 Example 1 Data: 14, 18, 20, 12, 24, 15, 14 (n = 7 “odd”) 7 + 1 2 = 4 th location of median Median is the middle value of the “ordered” data. At least half the values are at or greater; at least half are at or lower.
12
M07-Numerical Summaries 1 12 Department of ISM, University of Alabama, 1995-2003 median example Data: 14, 18, 20, 12, 24, 15, 14 (n = 7 “odd”) 94 (outlier) Original, X = with outlier, X = Example 2 Median is still the middle value. Median is resistant to outliers.
13
M07-Numerical Summaries 1 13 Department of ISM, University of Alabama, 1995-2003 Data: 14, 18, 20, 12, 24, 15, 14, 214 (n = 8 “even,” outlier) Median is the average of the two middle values. Exactly half the values are greater, half lower. Example 3 8 + 1 2 = 4.5 th location of median
14
M07-Numerical Summaries 1 14 Department of ISM, University of Alabama, 1995-2003 1. Order the data. 2. For odd n, the median is the center observation. 3. For even n, the median is the average of the two center observations. Summary for finding Median
15
M07-Numerical Summaries 1 15 Department of ISM, University of Alabama, 1995-2003 3. Mode - most frequently occurring number In a histogram, modal class is the one having largest frequency, i.e., highest bar.
16
M07-Numerical Summaries 1 16 Department of ISM, University of Alabama, 1995-2003 If categorical, use the mode. “Average” is meaningless; look at “percentages” of occurrences. If variable is quantitative, first look at a graph: Skewed or outliers? More or less symmetric? Skewed or outliers? More or less symmetric? Use median. Use mean. When should each estimator be used? What type of variable is it?
17
M07-Numerical Summaries 1 17 Department of ISM, University of Alabama, 1995-2003 Numerical Summary LocationVariation Mean Median Mode Range Std. Deviation IQR
18
M07-Numerical Summaries 1 18 Department of ISM, University of Alabama, 1995-2003 Mountain Climbing Rope. Two suppliers; sample and test three ropes from each. “Snap Breaking Strength” Why does variation matter?
19
M07-Numerical Summaries 1 19 Department of ISM, University of Alabama, 1995-2003 Measures of Variation 1. Range 2. Variance & Standard Deviation 3. Mean Absolute Deviation (Mad) 4. Interquartile Range (IQR)
20
M07-Numerical Summaries 1 20 Department of ISM, University of Alabama, 1995-2003 Highest minus lowest value in the sample. 1. Range
21
M07-Numerical Summaries 1 21 Department of ISM, University of Alabama, 1995-2003 Example 4: 3, 4, 1, 7, 4, 5 1 2 3 4 5 6 7 Example 5: 1, 1, 1, 7, 7, 7 1 2 3 4 5 6 7 Range =
22
M07-Numerical Summaries 1 22 Department of ISM, University of Alabama, 1995-2003 Advantage: _________ _________________. Disadvantage: _______ most of the data. ______________ to outliers. Range
23
M07-Numerical Summaries 1 23 Department of ISM, University of Alabama, 1995-2003 How far are the data from the middle, on average? 2. Variance & Standard Deviation Sample Variance = s 2 Sample Std. Dev. = s Population Variance = 2 Population Std. Dev. = Notation:
24
M07-Numerical Summaries 1 24 Department of ISM, University of Alabama, 1995-2003 Example 4: 3, 4, 1, 7, 4, 5 1 2 3 4 5 6 7
25
M07-Numerical Summaries 1 25 Department of ISM, University of Alabama, 1995-2003 We need to keep the negatives from canceling the positives. We can do this by 1. _____________, ______ 2. _____________, _____ Note: The average of the deviations from the mean will always be zero.
26
M07-Numerical Summaries 1 26 Department of ISM, University of Alabama, 1995-2003 Equation for Variance: 2 = N (X i - ) 2 (see page 88) For a population: s 2 = n - 1 (X i - X) 2 For a sample:
27
M07-Numerical Summaries 1 27 Department of ISM, University of Alabama, 1995-2003 Equation for Variance: s 2 = n - 1 (X i - X) 2 (see page 88) = = units? Example 4 data: (3-4) 2 + (4-4) 2 + (1-4) 2 + (7-4) 2 + (4-4) 2 + (5-4) 2 6 - 1 =
28
M07-Numerical Summaries 1 28 Department of ISM, University of Alabama, 1995-2003 Equations for Variance: (see page 88) s 2 = n - 1 (X i - X) 2 (see page 90) s X X n n1 2 i 2 2 = s X ( X ) n n1 2 i 2 i 2 = 1. 2. 3.
29
M07-Numerical Summaries 1 29 Department of ISM, University of Alabama, 1995-2003 Example 4: 3, 4, 1, 7, 4, 5 X 3 4 1 7 4 5 24 X 9 16 1 49 16 25 116 2 X = X 2 = X = X 2 =
30
M07-Numerical Summaries 1 30 Department of ISM, University of Alabama, 1995-2003 s X (X) n n1 2 i 2 i 2 = s2s2 6 - 1 = 4.0
31
M07-Numerical Summaries 1 31 Department of ISM, University of Alabama, 1995-2003 Both equations should give the same answer. First is easier when data and the mean are integers. Second is easier for larger data sets, or data not integer. More chance of round-off error with first equation. Comments
32
M07-Numerical Summaries 1 32 Department of ISM, University of Alabama, 1995-2003 Advantage: ________________; ________________. Disadvantages: Units are _________. ____ resistant to outliers. Variance
33
M07-Numerical Summaries 1 33 Department of ISM, University of Alabama, 1995-2003 Standard Deviation S = S 2 “The square root of the variance.” = 4.0 = 2.0 Advantage: Easier to interpret than variance, Units same as data.
34
M07-Numerical Summaries 1 34 Department of ISM, University of Alabama, 1995-2003 3. Mean Absolute Deviation, MAD MAD = x i – N This will be used extensively in OM 300 for population data MAD = x i – x n for sample data (see page 87)
35
M07-Numerical Summaries 1 35 Department of ISM, University of Alabama, 1995-2003 IQR = Q - Q 13 IQR is the range of the middle 50% of the data. Observations more than 1.5 IQR’s beyond quartiles are considered outliers. 4. Interquartile Range (IQR) More on this later.
36
C07-Numerical Summaries 1 36 Department of ISM, University of Alabama, 1995-2003 Statistical Inference Generalizing from a sample to a population, by using a statistic to estimate a parameter.
37
C07-Numerical Summaries 1 37 Department of ISM, University of Alabama, 1995-2003 Parameter Statistic Mean: Standard deviation: Proportion: s X estimates p from sample from entire population
38
C07-Numerical Summaries 1 38 Department of ISM, University of Alabama, 1995-2003 Descriptive NumericalGraphical Statistics
39
C07-Numerical Summaries 1 39 Department of ISM, University of Alabama, 1995-2003 Estimate the true mean net weight of 16 oz. bags of Golden Flake Potato Chips with a 95% confidence interval. 16.05 16.01 15.92 15.68 16.10 16.01 15.72 15.80 16.21 15.70 15.95 16.24 16.02 15.90 16.07 16.05 16.18 15.45 16.04 16.05 Measured Weights in ounces. Use Minitab Is the filling machine doing what it should be doing? (Not real data) Example 5:
40
C07-Numerical Summaries 1 40 Department of ISM, University of Alabama, 1995-2003 Data window name of worksheet file Most commonly used features. Session window
41
C07-Numerical Summaries 1 41 Department of ISM, University of Alabama, 1995-2003 “Stat” “Basic Statistics ” “Display descriptive statistics”
42
C07-Numerical Summaries 1 42 Department of ISM, University of Alabama, 1995-2003
43
C07-Numerical Summaries 1 43 Department of ISM, University of Alabama, 1995-2003 Results for: c07 Weight of chips.MTW Descriptive Statistics: Weights Variable N Mean Median TrMean StDev SE Mean Weights 20 15.958 16.015 15.970 0.199 0.045 Variable Minimum Maximum Q1 Q3 Weights 15.450 16.240 15.825 16.065 Executing from file: C:\Program Files\MTBWIN\MACROS\Describe.MAC Descriptive Statistics Graph: Weights “Session Window” results “Five number” summary
44
Histogram with Normal distribution curve superimposed Box plot “95% Confidence Interval” for the population mean.
45
____, because 16.000 is a plausible value for the true population mean. A confidence interval gives the limits of the plausible values of the true population mean, . Our sample mean was 15.957 oz. This is less than 16.000. Should we be concerned? “95% Confidence Interval” for the population mean.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.