Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3.

Similar presentations


Presentation on theme: "Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3."— Presentation transcript:

1 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

2

3 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Measures of Central Tendency 3.1

4 3-4 The arithmetic mean of a variable is computed by adding all the values in the data set and then dividing by the number of observations: “N” or “n”.

5 3-5 The population mean, μ (mew), is computed using all the individuals in a population. The population mean is a parameter. The sample mean, (x-bar), is computed using sample data. It is a statistic.

6 3-6 If x 1, x 2, …, x N are the N observations of a variable from a population, then the population mean, µ, is:

7 3-7 If x 1, x 2, …, x n are the n observations of a variable from a sample, then the sample mean,, is

8 3-8 EXAMPLE Computing a Population Mean and a Sample Mean The following data represent the travel times (min) to work for all seven employees of a company. 23, 36, 23, 18, 5, 26, 43 (a)Compute the population mean of this data. (b)Then, take a simple random sample of n = 3 employees. Compute the sample mean. (c) Then, take a second simple random sample of n = 3 employees. Again compute the sample mean.

9 3-9 EXAMPLEComputing a Population Mean (a)

10 3-10 EXAMPLEComputing a Sample Mean (b) Obtain a simple random sample of size n = 3 from the population of seven employees. Use this simple random sample to determine a sample mean. Find a second simple random sample and determine that sample mean. 1 2 3 4 5 6 7 23, 36, 23, 18, 5, 26, 43 Hint: Use Calc RandInt Recall, pop mean = 24.9 min

11 3-11 The median of a variable is the value that lies in the middle of the data when arranged (sorted) in ascending order (Calc:Stat:SortA) It is the value for which there are an equal number of actual data pieces above and below it. The median may/may not be an actual piece of data..

12 3-12 Finding the Median of a Data Set 1Enter the data into the Calc as a List (Stat) 2 Sort the data in ascending order. Assign each data piece a Rank starting at Min. 3Determine the number of observations, n. 4Determine the observation in the middle of the data set: Rank: (n+1)/2

13 3-13 Steps in Finding the Median of a Data Set If n is odd, then the median is the actual data value in the middle of the data set. If n is even, then the median is the mean of the two middle observations.

14 3-14 EXAMPLEComputing a Median of a Data Set The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company. 23, 36, 23, 18, 5, 26, 43 Determine the median of this data. Step 1: Sort (A): 5, 18, 23, 23, 26, 36, 43 Step 2: So, the med is Rank = 4 5, 18, 23, 23, 26, 36, 43

15 3-15 EXAMPLEComputing a Median of a Data Set Suppose the start-up company hires a new employee. The travel time of the new employee is 70 minutes. Determine the median of the “new” data set. 23, 36, 23, 18, 5, 26, 43, 70 5, 18, 23, 23, 26, 36, 43, 70 med is Rank 4 ½ piece of data

16 3-16 EXAMPLEComputing a Median of a Data Set The following data represent the travel times (in minutes) to work for all seven employees of a start-up company. 23, 36, 23, 18, 5, 26, 43 Suppose a new employee is hired who has a 130 min commute. How does this impact the value of the mean and median? Mean before new hire: 24.9 minutes Mean after new hire: 38 minutes Median before new hire: 23 minutes Median after new hire: 24.5 minutes

17 3-17 A numerical summary of data (mean, median, etc) is said to be resistant if “extreme” data values (very large or very small) do not affect its value significantly.

18 3-18

19 3-19 EXAMPLE Describing the Shape of the Distribution The following data represent the asking price ($) of homes for sale in Lincoln, NE. Source: http://www.homeseekers.com 79,995128,950149,900189,900 99,899130,950151,350203,950 105,200131,800154,900217,500 111,000132,300159,900260,000 120,000134,950163,300284,900 121,700135,500165,000299,900 125,950138,500174,850309,900 126,900147,500180,000349,900

20 3-20 The mean asking price is $168,320 and the median asking price is $148,700. Therefore, we would conjecture (estimate) that the distribution is skewed right.

21 3-21

22 3-22 The mode of a variable is the most frequently occurring observation of the variable (if there is one.) If no data piece occurs more than once, we say the data have no mode. A set of data can have no mode, one mode, or more than one mode.

23 3-23 EXAMPLE Finding the Mode of a Data Set The data on the next slide represent the Vice Presidents of the United States and their state of birth. Find the mode, if there is one.

24 3-24 Joe Biden Pennsylvani a

25 3-25 The mode is New York.

26 3-26 Tally data to determine most frequent observation

27

28 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Measures of Dispersion (Variation) 3.2

29 3-29 The range, R, of a variable is the difference between the largest data value and the smallest data value. Range = R = max - min value

30 3-30 EXAMPLEFinding the Range of a Set of Data The following data represent the travel times (min) to work for all seven employees of a start- up company: 23, 36, 23, 18, 5, 26, 43 Find the range of the data. Range = (max – min) 43 – 5 = 38 minutes

31 3-31 The population standard deviation of a variable is the square root of the mean of the squared deviations from the population mean. The population standard deviation is symbolically represented by “σ” (lowercase Greek letter sigma).

32 3-32 Computing Population Standard Deviation The following data represent the travel times (min) to work for all seven employees of a start-up company. 23, 36, 23, 18, 5, 26, 43 Compute the population standard deviation of this data. Hint: First, put the data into a TI-84 List, then find the mean = 24.85714 min

33 3-33 xixi μ x i – μ(x i – μ) 2 2324.85714-1.857143.44898 3624.8571411.14286124.1633 2324.85714-1.857143.44898 1824.85714-6.8571447.02041 524.85714-19.8571394.3061 2624.857141.1428571.306122 4324.8571418.14286329.1633 902.8571

34 3-34 The sample standard deviation, “s” of a variable uses the same computation except we divide by (n – 1) instead of N. “n” is the sample size and “N” is the population size of the data set.

35 3-35 EXAMPLEComputing a Sample Standard Deviation Here are the results of a random sample of three times taken from the travel times (min) to work for all seven employees of a start-up company: 5, 26, 36 Find the sample standard deviation “s”.

36 3-36 Recall, the Pop std dev (using all 7 times) was 11.36 min.

37 If you have final exam grades for two different math classes (A & B), and if the data is more dispersed (larger Range) for one class (A) than the other, then the standard deviation of that class (A) will be larger than the other.

38 3-38 The variance of a variable is the square of the standard deviation. The math symbol for the population variance is σ 2, and for the sample variance is s 2

39 3-39 EXAMPLE Computing a Population Variance The following data represent the travel times (min) to work for all seven employees of a start-up web development company. 23, 36, 23, 18, 5, 26, 43 Compute the population and sample variance of this data.

40 3-40 EXAMPLE Computing a Population Variance Recall from earlier that the pop standard deviation was σ = 11.36 min, so the pop variance is σ 2 = 129.05 From before, the sample std dev was s = 15.82 min, so the sample variance is s 2 = 250.27

41 TI-84 The calculator will give two std devs: “σ ” for a Pop “σ ” for a Pop and “s” for a sample. It will not give any variance or range statistics.

42 3-42 The Empirical Rule 68 – 95 - 99.7 If a distribution is roughly bell shaped, then: 1. Approx 68% of the data will lie within 1 standard deviation of the mean. 2. Approx 95% of the data will lie within 2 standard deviations of the mean. 3. Approx 99.7% of the data will lie between 3 std dev’s of the mean.

43 3-43

44 3-44 EXAMPLE Using the Empirical Rule The following data represent the blood HDL cholesterol levels of all 54 female patients of Dr. Dracula. 414843383537444444 627577588239855554 676969706572747474 606060616263646464 545455565656575859 454747484850525253

45 3-45 Using a TI-84 we find:

46 3-46 22.3 34.0 45.7 57.4 69.1 80.8 92.5 Actually, 45 of the 54, or 83.3% of his patients have HDL between 34.0 and 69.1. According to the Empirical Rule, 99.7% of the patients will have HDL within 3 standard deviations of the mean. 13.5% + 34% + 34% = 81.5% of all patients will have HDL between 34.0 and 69.1 according to the Empirical Rule.

47 3-47 Chebychev’s Theorem For any type distribution (Normal or not), at least of the data lie within “k” std devs of the mean, (k >1).

48 3-48 EXAMPLE Using Chebychev’s Theorem Using the data from the HDL blood example, use Chebychev’s Theorem to determine the percentage of patients that have HDL levels within 3 std dev (SD) of the mean. at least (b) the actual percentage of his patients that had HDL between 34 and 80.8 (within 3 SD of mean). 52/54 ≈ 0.96 ≈ 96%

49 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Measures of Central Tendency /Dispersion from Grouped Data 3.3

50 3-50 We have discussed computing statistics from raw data, but often the only available data have already been summarized into frequency distributions called “grouped data”. We cannot find exact values of the mean/std dev without raw data, but we can approximate these measures using the following techniques….

51 3-51 1. Approximate the Mean of a Variable from a Frequency Distribution where x i is the midpoint or value of the i th class f i is the frequency of the i th class n is the number of classes

52 3-52 Hours01-56-1011-1516-2021-2526-3031-35 Freq01302502301801006050 The National Survey of Student Engagement is a 2007 survey asking freshman college students how much time they spend preparing for class each week. Approximate the mean number of hours spent preparing for class each week. EXAMPLEApproximating the Mean from a Frequency Distribution

53 3-53 Time (hr)FrequencyMP = x i x i f i 0000 1 - 51303390 6 - 1025082000 11 - 15230132990 16 - 20180183240 21 - 25100232300 26 – 3060281680 31 – 3550331650

54 3-54 2. The weighted mean, of a variable is found by multiplying each value of the variable by its corresponding weight, adding these products, and dividing by the sum of the weights. where w is the weight of the i th observation x i is the value of the i th observation

55 3-55 EXAMPLE Computed a Weighted Mean Kayla goes to the Nut store and creates her own snack mix. She combines 1 pound of raisins, 2 pounds of chocolate-covered peanuts, and 1.5 pounds of cashews. The raisins cost $1.25 per pound, the chocolate covered peanuts cost $3.25 per pound, and the cashews cost $5.40 per pound. What is the mean cost per pound of this mix?

56 3-56 Approximate the Standard Deviation of a Variable from a Frequency Distribution Sample Standard Deviation Population Standard Deviation where x i is the midpoint of the i th class f i is the frequency of the i th class

57 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Measures of Position and Outliers 3.4

58 3-58 A z-score describes how many std dev’s a data piece is from the mean (above or below). There is both a population z-score and a sample z-score: Sample z-score Population z-score The z-score has mean of 0 and standard deviation of 1.

59 3-59 EXAMPLE Using Z-Scores The mean height of adult males is 69.1 inches with a standard deviation of 2.8 inches. The mean height of adult females is 63.7 inches with a standard deviation of 2.7 inches. Who is relatively taller: a man whose height is 83 inches, or a woman whose height is 76 inches? (In other words, which one is further from the mean of their gender?)

60 3-60 The man’s height is 4.96 standard deviations above the male mean. The woman’s height is 4.56 standard deviations above the female mean. The man is relatively taller because his height is further above the mean height of men than hers is above that of women.

61 3-61 The kth percentile, denoted, P k, of a set of data is a value such that “k” percent of the observations are less than or equal to the value.

62 3-62 EXAMPLE Interpret a Percentile The Graduate Record Examination (GRE) is an exam required for admission to many U.S. graduate schools. The University of Pittsburgh Graduate School of Public Health requires a GRE score no less than the 70th percentile for admission into their Human Genetics Master of Science program. Interpret this P 70 admissions requirement.

63 3-63 EXAMPLE Interpret a Percentile The 70 th percentile is the score such that 70% of the individuals who took the exam scored worse, and 30% of the individuals scored the same or better. In order to be admitted to this program, an applicant must score higher than 70% of the people who take the GRE. Or, the individual’s score must be in the top 30%.

64 3-64 EXAMPLE Percentile The following are scores on a Stats exam: 42,50,59,62,68,73,76,81,86,90,94,100 What is the percentile value of the 81 score? 7/12 = 0.58 … and 94 is the score

65 3-65 “Quartiles” divide data into fourths, or four equal parts. The 1 st quartile, Q 1, divides the bottom 25% the data from the top 75%. The 1 st quartile is equivalent to the 25 th percentile. divides the bottom 50% of the data from the top 50%. It is equivalent to the 50 th percentile, which is equivalent to the median. divides the bottom 75% of the data from the top 25%. It is equivalent to the 75 th percentile.

66 3-66 Finding Quartiles Step 1Arrange the data in ascending order. Step 2Determine the median, M, or second quartile, Q 2. Step 3Divide the data set into halves: the observations below (to the left of) M and the observations above M. The first quartile, Q 1, is the median of the bottom half, and the third quartile, Q 3, is the median of the top half.

67 3-67 A group of Brigham Young University students collected data on the speed of vehicles traveling through a construction zone on a state highway, where the posted speed was 25 mph. The recorded speed of 14 randomly selected vehicles is given below: 20, 24, 27, 28, 29, 30, 32, 33, 34, 36, 38, 39, 40, 40 Find and interpret the quartiles for speed in the construction zone. EXAMPLE Finding and Interpreting Quartiles

68 3-68 EXAMPLE Finding and Interpreting Quartiles n = 14 observations Median = = 32.5 mph The median of the bottom half of the data is Q 1 20, 24, 27, 28, 29, 30, 32 The median of these seven observations is 28. Therefore, Q 1 = 28. The median of the top half of the data is the third quartile, Q 3 and Q 3 = 38.

69 3-69 Interpretation: 25% of the speeds are less than or equal to 28 mph and 75% of the speeds are greater than 28 mph. 50% of the speeds are less than or equal to the median, 32.5 mph, and 50% of the speeds are greater. 75% of the speeds are less than or equal to 38 mph, and 25% of the speeds are greater.

70 3-70 The Interquartile range, IQR, is the range of the middle 50% of the data observations. The IQR is the difference between the third and first quartiles: IQR = Q 3 – Q 1 In the vehicle speed problem, the IQR = 38-28 = 10 mph, and 50% of the observed speeds lie between 28 and 38 mph.

71 3-71 EXAMPLE Determining and Interpreting the Interquartile Range Determine and interpret the interquartile range of the speed data. Q 1 = 28 Q 3 = 38 The range of the middle 50% of car speeds traveling through the construction zone is 10 miles per hour.

72 3-72 Suppose a 15 th car travels through the construction zone at 100 mph. How does this value impact the mean, median, standard deviation, and interquartile range? Without 15 th carWith 15 th car Mean32.1 mph36.7 mph Median32.5 mph33 mph Standard deviation6.2 mph18.5 mph IQR10 mph11 mph

73 3-73 Checking for “Outliers” by Using Quartiles 1. Compute the interquartile range. 2. Determine the fences. Fences serve as cutoff points for determining outliers. Lower Fence = Q 1 – 1.5(IQR): 28-15 = 13 mph Upper Fence = Q 3 + 1.5(IQR): 38+15 = 53 mph 3. Any data value outside the fence is called an “outlier” (asterisked) and does not qualify as a min/max value.

74 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section The 5-Number Summary and Boxplots 3.5

75 3-75 The Five-number summary of a data set consists of the Min data value, Q1, the Median, Q3, and the Max data value as follows:

76 3-76 EXAMPLEObtaining the Five-Number Summary Every six months, the US Federal Reserve Board conducts a survey of credit card bank plans in the U.S. The following data are the interest rates charged by 10 randomly selected banks who issue credit cards for the July 2005 survey. Determine the Five-number summary of the data.

77 3-77 EXAMPLEObtaining the Five-Number Summary InstitutionRate Pulaski Bank and Trust Company6.5% Rainier Pacific Savings Bank12.0% Wells Fargo Bank NA14.4% Firstbank of Colorado14.4% Lafayette Ambassador Bank14.3% Infibank13.0% United Bank, Inc.13.3% First National Bank of The Mid-Cities13.9% Bank of Louisiana9.9% Bar Harbor Bank and Trust Company14.5% Source: http://www.federalreserve.gov/pubs/SHOP/survey.htm

78 3-78 EXAMPLEObtaining the Five-Number Summary Enter the % data into a TI-84 List: 12.0, 13.3, 13.9, 14.3,14.4, 14.5, 9.9, 6.5, 13.0,14.4 1-Var Stats will give the 5-Number Summary as follows: Five-number Summary: 6.5% 12.0% 13.6% 14.4% 14.5%

79 3-79 The interquartile range (IQR) is 14.4% - 12% = 2.4% The lower and upper fences are: Lower Fence = Q 1 – 1.5(IQR) = 12 – 1.5(2.4) = 8.4% Upper Fence = Q 3 + 1.5(IQR) = 14.4 + 1.5(2.4) = 18.0% 5-N: 6.5% 12.0% 13.6% 14.4% 14.5% [ ] *

80 3-80 The bank credit card rate boxplot indicates that the distribution is skewed left. Use a boxplot and quartiles to describe the shape of a distribution.

81 END CHAP 3 CHAP 3 Summarizing Numerical Data

82


Download ppt "Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3."

Similar presentations


Ads by Google