1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681 Standard deviation: 10.85 inches Minimum: 36 inches Maximum: 78 inches First quartile: 51.63 inches Third quartile: 67.38 inches Count: 58 bears Sum: 3438.1 inches 0 10 20 304050607080 Frequency Length in Inches Black Bears

2 ES Chapter Goals Learn how to present and describe sets of data Learn measures of central tendency, measures of dispersion (spread), measures of position, and types of distributions Learn how to interpret findings so that we know what the data is telling us about the sampled population

3 ES Measures of Central Tendency Numerical values used to locate the middle of a set of data, or where the data is clustered The term average is often associated with all measures of central tendency

4 ES Mean Mean: The type of average with which you are probably most familiar. The mean is the sum of all the values divided by the total number of values, n: The population mean, , (lowercase mu, Greek alphabet), is the mean of all x values for the entire population Notes: We usually cannot measure  but would like to estimate its value A physical representation: the mean is the value that balances the weights on the number line x n x n xxx in   11 12 () 

5 ES Example Example:The following data represents the number of accidents in each of the last 6 years at a dangerous intersection. Find the mean number of accidents: 8, 9, 3, 5, 2, 6, 4, 5: x  1 8 89352645525(). Solution:  In the data above, change 6 to 26: Note: The mean can be greatly influenced by outliers x  1 8 893522645775(). Solution:

6 ES Median Median: The value of the data that occupies the middle position when the data are ranked in order according to size Notes: Denoted by “x tilde”: The population median,  (uppercase mu, Greek alphabet), is the data value in the middle position of the entire population To find the median: 1.Rank the data 2.Determine the depth of the median: 3.Determine the value of the median

7 ES Example Example:Find the median for the set of data: Solution: 1.Rank the data: 2, 2, 3, 3, 4, 8, 8, 9, 11 2.Find the depth: 3.The median is the fifth number from either end in the ranked data: Suppose the data set is {4, 8, 3, 8, 2, 9, 2, 11, 3, 15}: 1.Rank the data: 2, 2, 3, 3, 4, 8, 8, 9, 11, 15 2.Find the depth: 3.The median is halfway between the fifth and sixth observations: {4, 8, 3, 8, 2, 9, 2, 11, 3}

8 ES Mode & Midrange Mode: The mode is the value of x that occurs most frequently Midrange: The number exactly midway between a lowest value data L and a highest value data H. It is found by averaging the low and the high values: Note:If two or more values in a sample are tied for the highest frequency (number of occurrences), there is no mode

9 ES Example:Consider the data set {12.7, 27.1, 35.6, 44.2, 18.0} Midrange      LH 2 127442 2 2845... When rounding off an answer, a common rule-of-thumb is to keep one more decimal place in the answer than was present in the original data To avoid round-off buildup, round off only the final answer, not intermediate steps Notes: Example

10 ES Measures of Dispersion Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency. Measures of dispersion are used to describe the spread, or variability, of a distribution Common measures of dispersion: range, variance, and standard deviation

11 ES Range Range: The difference in value between the highest-valued (H) and the lowest-valued (L) pieces of data: Other measures of dispersion are based on the following quantity Deviation from the Mean: A deviation from the mean,, is the difference between the value of x and the mean

12 ES Example Example:Consider the sample {12, 23, 17, 15, 18}. Find 1) the range and 2) each deviation from the mean. Solutions: 1) -5 6 0 -2 1 DataDeviation from Mean _________________________ 12 23 17 15 18 2)

13 ES Sample Variance: The sample variance, s 2, is the mean of the squared deviations, calculated using n  1 as the divisor: Standard Deviation: The standard deviation of a sample, s, is the positive square root of the variance: Sample Variance & Standard Deviation s n xx 22 1 1     () where n is the sample size  SS()()xxxx n x   22 2 1 Note:The numerator for the sample variance is called the sum of squares for x, denoted SS(x): where

14 ES Example:Find the 1) variance and 2) standard deviation for the data {5, 7, 1, 3, 8}: Example 2.8)8.32( 4 1 2  s 1) 86.22.8  s 2) x xx  ()xx  2 0.2 2.2 -3.8 -1.8 3.2 0.04 4.84 14.44 3.24 10.24 5 7 1 3 8 24032.08Sum: Solutions: x  1 5 5713848(). First:

15 ES Notes The shortcut formula for the sample variance: The unit of measure for the standard deviation is the same as the unit of measure for the data

16 ES Measures of Position Measures of position are used to describe the relative location of an observation Quartiles and percentiles are two of the most popular measures of position An additional measure of central tendency, the midquartile, is defined using quartiles Quartiles are part of the 5-number summary

17 ES Ranked data, increasing order Quartiles 1.The first quartile, Q 1, is a number such that at most 25% of the data are smaller in value than Q 1 and at most 75% are larger 2.The second quartile, Q 2, is the median 3.The third quartile, Q 3, is a number such that at most 75% of the data are smaller in value than Q 3 and at most 25% are larger Quartiles: Values of the variable that divide the ranked data into quarters; each set of data has three quartiles

18 ES Percentiles:Values of the variable that divide a set of ranked data into 100 equal subsets; each set of data has 99 percentiles. The kth percentile, P k, is a value such that at most k% of the data is smaller in value than P k and at most (100  k)% of the data is larger. Percentiles ~ xQP  250 Notes: The 1st quartile and the 25th percentile are the same: Q 1 = P 25 The median, the 2nd quartile, and the 50th percentile are all the same:

19 ES Finding P k (and Quartiles) Procedure for finding P k (and quartiles): 1.Rank the n observations, lowest to highest 2.Compute A = (nk)/100 3.If A is an integer: –d(P k ) = A.5 (depth) –P k is halfway between the value of the data in the Ath position and the value of the next data If A is a fraction: –d(P k ) = B, the next larger integer –P k is the value of the data in the Bth position

20 ES 1) k = 25: (20) (25) / 100 = 5, depth = 5.5, Q 1 = 6 Example Example:The following data represents the pH levels of a random sample of swimming pools in a California town. Find: 1) the first quartile, 2) the third quartile, and 3) the 37th percentile: 2) k = 75: (20) (75) / 100 = 15, depth = 15.5, Q 3 = 6.95 3) k = 37: (20) (37) / 100 = 7.4, depth = 8, P 37 = 6.2 Solutions:

21 ES 5-Number Summary: The 5-number summary is composed of: 1.L, the smallest value in the data set 2.Q 1, the first quartile (also P 25 ) 3., the median (also P 50 and 2nd quartile) 4.Q 3, the third quartile (also P 75 ) 5.H, the largest value in the data set 5-Number Summary Notes: The 5-number summary indicates how much the data is spread out in each quarter The interquartile range is the difference between the first and third quartiles. It is the range of the middle 50% of the data

22 ES Box-and-Whisker Display Box-and-Whisker Display: A graphic representation of the 5-number summary: The five numerical values (smallest, first quartile, median, third quartile, and largest) are located on a scale, either vertical or horizontal The box is used to depict the middle half of the data that lies between the two quartiles The whiskers are line segments used to depict the other half of the data One line segment represents the quarter of the data that is smaller in value than the first quartile The second line segment represents the quarter of the data that is larger in value that the third quartile

23 ES 63 64 76 76 81 83 85 86 88 89 90 91 92 93 93 93 94 97 99 99 99 101 108 109 112 Example: A random sample of students in a sixth grade class was selected. Their weights are given in the table below. Find the 5-number summary for this data and construct a boxplot: Solution: Example

24 ES Boxplot for Weight Data Weights from Sixth Grade Class 11010090807060 Weight

25 ES z-Score z-Score: The position a particular value of x has relative to the mean, measured in standard deviations. The z-score is found by the formula: Notes: Typically, the calculated value of z is rounded to the nearest hundredth The z-score measures the number of standard deviations above/below, or away from, the mean z-scores typically range from -3.00 to +3.00 z-scores may be used to make comparisons of raw scores

26 ES Example:A certain data set has mean 35.6 and standard deviation 7.1. Find the z-scores for 46 and 33: Example Solutions: 46 is 1.46 standard deviations above the mean z xx s      33356 71 37... 33 is -0.37 below standard deviations below the mean. 0

27 ES Interpreting & Understanding Standard Deviation Standard deviation is a measure of variability, or spread Two rules for describing data rely on the standard deviation: –Empirical rule: applies to a variable that is normally distributed –Chebyshev’s theorem: applies to any distribution

28 ES Notes: The empirical rule is more informative than Chebyshev’s theorem since we know more about the distribution (normally distributed) Also applies to populations Can be used to determine if a distribution is normally distributed 1.Approximately 68% of the observations lie within 1 standard deviation of the mean 2.Approximately 95% of the observations lie within 2 standard deviations of the mean 3.Approximately 99.7% of the observations lie within 3 standard deviations of the mean Empirical Rule: If a variable is normally distributed, then: Empirical Rule

29 ES 68% 95% 99.7% Illustration of the Empirical Rule

30 ES 1)What percentage of weights fall between 5.7 and 7.3? 2)What percentage of weights fall above 7.7? Example Example:A random sample of plum tomatoes was selected from a local grocery store and their weights recorded. The mean weight was 6.5 ounces with a standard deviation of 0.4 ounces. If the weights are normally distributed: Solutions: (,)(.(0.),.(0.))(.,.)xsxs  22652465245773 Approximately 95% of the weights fall between 5.7and 7.3 1) (,)(.(0.),.(0.))(.,.)xsxs  33653465345377 Approximately 99.7%of the weights fallbetween 5.3 and 7.7 Approximately 0.3% of the weights fall outside (5.3,7.7) Approximately (0.3/2)=0.15% of the weights fall above 7.7 2)

31 ES A Note about the Empirical Rule 1.Find the mean and standard deviation for the data 2.Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean 3.Compare these actual proportions with those given by the empirical rule 4.If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed Note:The empirical rule may be used to determine whether or not a set of data is approximately normally distributed

32 ES Chebyshev’s Theorem: The proportion of any distribution that lies within k standard deviations of the mean is at least 1  (1/k 2 ), where k is any positive number larger than 1. This theorem applies to all distributions of data. Illustration: Chebyshev’s Theorem

33 ES  Chebyshev’s theorem is very conservative and holds for any distribution of data Important Reminders!  Chebyshev’s theorem also applies to any population  The two most common values used to describe a distribution of data are k = 2, 3  The table below lists some values for k and 1 - (1/k 2 ):

34 ES Example:At the close of trading, a random sample of 35 technology stocks was selected. The mean selling price was 67.75 and the standard deviation was 12.3. Use Chebyshev’s theorem (with k = 2, 3) to describe the distribution. Example Using k=3:At least 89% of the observations lie within 3 standard deviations of the mean: Solutions: Using k=2:At least 75% of the observations lie within 2 standard deviations of the mean:

1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

Similar presentations

Presentation on theme: "1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

Similar presentations

Presentation on theme: "1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681."— Presentation transcript:

Similar presentations

About project

Feedback