Chapter 2: Descriptive Statistics Lesson 2.1: Frequency Distributions and their Graphs (Part 1)
Frequency Distributions A frequency distribution is a table that shows the classes or intervals of data entries with the number of entries in each class. Example: 20, 21, 23, 25, 25, 27, 27, 27, 31, 33, 35, 35, 35, 38, 39, 39, 39, 40, 42, 42 All the classes MUST have the same class width. Here the class width is 5. Class Frequency 20-24 3 25-29 5 30-34 2 35-39 7 40-44
Relative Frequency Relative frequency: frequency of the class divided by the total frequency (or sample size) Class Frequency Relative frequency Cumulative frequency Class Boundaries Class Midpoint 20-24 3 3/20 = 0.15 25-29 5 5/20 = 0.25 30-34 2 2/20 = 0.1 35-39 7 7/20 = 0.35 40-44
Cumulative Frequency Cumulative frequency: the sum of the frequencies of that class and all previous classes Class Frequency Relative frequency Cumulative frequency Class Boundaries Class Midpoint 20-24 3 3/20 = 0.15 25-29 5 5/20 = 0.25 3+5 = 8 30-34 2 2/20 = 0.1 8+2 = 10 35-39 7 7/20 = 0.35 10+7 = 17 40-44 17+3 = 20
Class Boundaries Class boundaries: The numbers that separate classes without forming gaps between them. If the class values are integers subtract 0.5 from the lower class value and add 0.5 to the upper class value. Class Frequency Relative frequency Cumulative frequency Class Boundaries Class Midpoint 20-24 3 3/20 = 0.15 19.5 – 24.5 25-29 5 5/20 = 0.25 3+5 = 8 24.5 – 29.5 30-34 2 2/20 = 0.1 8+2 = 10 29.5 – 34.5 35-39 7 7/20 = 0.35 10+7 = 17 34.5 – 39.5 40-44 17+3 = 20 39.5 – 44.5
Class Midpoint Class midpoint: The middle of the class. This can be found by averaging the lower and upper class values Class Frequency Relative frequency Cumulative frequency Class Boundaries Class Midpoint 20-24 3 3/20 = 0.15 19.5 – 24.5 22 25-29 5 5/20 = 0.25 3+5 = 8 24.5 – 29.5 27 30-34 2 2/20 = 0.1 8+2 = 10 29.5 – 34.5 32 35-39 7 7/20 = 0.35 10+7 = 17 34.5 – 39.5 37 40-44 17+3 = 20 39.5 – 44.5 42
Chapter 2: Descriptive Statistics Lesson 2.1: Frequency Distributions and their Graphs (Part 2)
Frequency Histograms Class ƒ Boundaries 67-78 3 66.5-78.5 79-90 5 78.5-90.5 91-102 8 90.5-102.5 103-114 9 102.5-114.5 115-126 114.5-126.5 Class ƒ Midpoint 67-78 3 72.5 79-90 5 84.5 91-102 8 96.5 103-114 9 108.5 115-126 120.5
Relative Frequency Histograms Time on Phone Relative frequency minutes Class Frequency Relative Frequency Cumulative Frequency 67-78 3 0.10 79-90 5 0.17 8 91-102 0.27 16 103-114 9 0.30 25 115-126 30
Frequency Polygons How to construct a frequency Polygon: Time on Phone 9 Class ƒ Midpoint 67-78 3 72.5 79-90 5 84.5 91-102 8 96.5 103-114 9 108.5 115-126 120.5 8 7 6 5 4 3 2 1 60.5 72.5 84.5 96.5 108.5 120.5 132.5 minutes How to construct a frequency Polygon: Plot the midpoint and frequency. Connect consecutive midpoints. Extend the graph to the axis by one class.
Ogives (Cumulative Frequency Graphs) Class Frequency Cumulative Frequency Cumulative Rel. Frequency 67-78 3 0.1 79-90 5 8 0.267 91-102 16 0.534 103-114 9 25 0.834 115-126 30 1.001
Classwork Using the data on page 37, fill in the following table: On a sheet of graph paper construct the following: Histogram Frequency Polygon Ogive Class Frequency Relative F. Cumulative F. Midpoints Boundaries 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89
Chapter 2: Descriptive Statistics Lesson 2.2: More Graphs and Displays
Constructing Stem & Leaf Plots Example: 102, 124, 108, 86, 103, 82, 71, 104, 112, 118, 87, 95, 103, 116, 85, 122, 87, 100, 105, 97, 107, 67, 78, 125, 109, 99, 105, 99, 101, 92 Stem Leaf 6 7 7 1 7 8 8 2 8 5 6 7 7 9 2 9 5 7 9 9 10 0 1 2 3 3 4 10 5 5 7 8 9 11 2 11 6 8 12 2 4 12 5 STEM LEAF 6 7 7 1 8 8 2 5 6 7 7 9 2 5 7 9 9 10 0 1 2 3 3 4 5 5 7 8 9 11 2 6 8 12 2 4 5 Key: 6 | 7 means 67
Interpreting Stem & Leaf Plots Suppose the following stem-and-leaf plot set represents the scores on a statistics quiz (out of 50). a) How many students took the quiz? b) How many students scored a perfect score of 50? c) What is the lowest score on the quiz? d) What score occurs the most frequently (the “mode”) e) What advantage(s) does a stem-and-leaf plot have compared to a histogram?
Side-by-side Stem & Leaf Plots What is the highest grade a man earned on the quiz? How many women were in the class? What percent of the class is male? If a passing grade is 35 or higher, what percent of the females passed? Overall do you think women or men did better on this quiz?
Dotplots 66 76 86 96 106 116 126 minutes Duplicates: 87, 99, 103, 105 102 124 108 86 103 82 71 104 112 118 87 95 103 116 85 122 87 100 105 97 107 67 78 125 109 99 105 99 101 92
Interpreting Dotplots The following is a dotplot for a different group of students taking the 50-point statistics quiz a) How many students took the quiz? b) How many students scored a perfect score of 50? c) What is the lowest score on the quiz? d) What score represents the mode?
Bar Graphs and Pareto Charts A Bar chart is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency (or relative frequency) on the other axis. A Pareto chart is a bar graph whose bars are drawn in decreasing order .
Pie Charts A pie chart is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category. What percentage of the M&M’s are green? What color was the most common M&M color? Suppose 17% of the M&Ms are blue. If you have a bag with 400 M&M’s. How many of them would be blue?
Constructing Pie Charts Category Budget Degrees Human Space Flight 5.7 143 Technology 5.9 149 Mission Support 2.7 68 Total: 14.3 360 Pie charts help visualize the relative proportion of each category. Find the relative frequency for each category and multiply it by 360 degrees to find the central angle. Central Angle for each segment = # in category X 360 degrees total number
Chapter 2: Descriptive Statistics Lesson 2.3: Measures of Central Tendency
Practice Mean, Median, & Mode Find the mean, median and mode: 20 20 20 20 20 20 21 21 21 21 22 22 22 23 23 23 23 24 24 65 Mean: Median: Mode:
Shapes of distributions Bell-shaped Symmetric Uniform Symmetric When describing the center of a data set use the mean for symmetrical distributions and the median for skewed distributions Mean = Median Mean = Median Skewed Left Skewed Right Mean < Median Mean > Median
Weighted Mean 1 Example: You are taking AP Biology and your grade is determined by five sources: 50% test, 15% midterm, 20% final exam, 10% lab work, and 5% homework. You scores are 86 (test), 96 (midterm), 82 (final), 98 (lab work), and 100 (homework). What is your weighted mean? . Source X W XW Tests Midterm Final Labs Homework
Weighted Mean 2 The average starting salaries (by degree attained) for 20 employees at a company are given below: 2 with a doctorate: $105,000 7 with a masters degree: $63,000 11 with a bachelors degree: $41,000 What is the mean starting salary for these employees? Source X W XW Σw = Σxw =
Mean of a frequency distribution Class x f xf 7 – 18 12.5 6 19 – 30 24.5 10 31 – 42 36.5 13 43 – 54 48.5 8 55 – 66 60.5 5 67 – 78 72.5 79 – 90 84.5 2 n = 50 *If you are given a class, use the midpoint for x.
Practice Example: Use the frequency distribution to approximate the mean age of the residents of Bow, Wyoming. Age x f xf 0 – 9 57 10 – 19 68 20 – 29 36 30 – 39 55 40 – 49 71 50 – 59 44 60 – 69 70 – 79 14 80 – 89 8 n = Σ xf =
Chapter 2: Descriptive Statistics Lesson 2.4: Measures of Variation (Part 1)
Compare Team 1 & Team 2 Measures of Central Tendency: Mean: 75 Median: 76 Mode: 76 Measures of Variation: Range Variance Standard Deviation Team 1 Team 2 72 67 73 76 78 84
Range Simplest measure of variation Range = maximum – minimum Team 1 Range: 6 inches Team 2 Range: 17 inches Disadvantages of the Range Ignores the ____________________________ Only uses _____________________________ Sensitive to __________________
Deviations To learn to calculate measures of variation that use every value in the data set, you first want to know about deviations. The deviation for each value x is the difference between the value of x and the mean of the data set. In a population, the deviation for each value x is: In a sample, the deviation for each value x is:
Variance Population Variance: Sample Variance: If team 1 were a population what is the variance? Team 1 (x - µ)^2 72 (72 – 75)^2 = 73 (73 – 75)^2 = 76 (76 – 75)^2 = 78 (78 – 75)^2 = Variance:
Standard Deviation Population Standard deviation: Sample Standard deviation: Therefore team 1 has a standard deviation of: SQRT(24/5) = 2.19 inches What is the standard deviation of team 2? Team 2 (x - µ)^2 67 72 76 84
Using the Calculator Enter Data into a list Calculate values “STAT” -> Edit menu 67, 72, 76, 76, 84 Calculate values “STAT” ->Calc menu -> 1:1-Var Stats OUTPUT 𝑥 = 75 mean Sx = 2.449 sample standard deviation 𝜎 𝑥 = 2.19 population standard deviation n = 5 sample size minX = 72 minimum value in the data set Q1 = 72.5 quartile 1 Med = 76 median = quartile 2 Q3 = 77 quartile 3 MaxX = 78 maximum value in the data set
Empirical Rule About 68% of the data lies within __ standard deviation of the mean About 95% of the data lies within __ standard deviations of the mean About 99.7% of the data lies within __ standard deviations of the mean
Chapter 2: Descriptive Statistics Lesson 2.4: Measures of Variation (Part 2)
Measures of Variation for Grouped Data Population Variance: Sample Variance: Take the square root of variance to get standard deviation 𝜎 2 = 𝑥−𝜇 2 𝑓 𝑁
Sample Standard Deviation x f xf 𝒙− 𝒙 𝟐 𝒇 10 1 13 2 8 3 5 4 First find the ______. Do this by _____________________ _______________________. Then create a column for the ______________________________________________. Find variance by dividing the value in step 2 by _______. Lastly get standard deviation by taking the __________ of variance Mean = ∑xf/∑f = Variance = Standard Deviation =
Population Standard Deviation Class x f xf 𝒙−𝝁 𝟐 𝒇 0-9 4.5 3 13.5 675 10-19 14.5 8 116 200 20-29 24.5 6 147 150 30-39 34.5 2 450 40-49 44.5 1 89 625 First find the ______. Do this by _____________________ _______________________. Then create a column for the ______________________________________________. Find variance by dividing the value in step 2 by _______. Lastly get standard deviation by taking the __________ of variance Mean = ∑xf/∑f = Variance = Standard Deviation =
Time between eruptions (min) Classwork In studying the behavior of Old Faithful geyser in Yellowstone National Park, geologists collect data for the time (in minutes) between eruptions. The table below summarizes actual data that were obtained. (a) What is the mean time between eruptions? (b) What is the standard deviation for the time between eruptions? Time between eruptions (min) Frequency 40-49 8 50-59 44 60-69 23 70-79 6 80-89 107 90-99 11 100-109 1
Chapter 2: Descriptive Statistics Lesson 2.5: Measures of Position
Fractiles Fractiles are numbers that divide an ordered data set into equal parts. Common fractiles are: Quartiles: Divide a data set into 4 equal parts Q1, Q2, Q3 Deciles: Divide a data set into 10 equal parts D1, D2, D3, …, D9 Percentiles: Divide a data set into 100 equal parts P1, P2, P3, …, P99 D3 means ___% of the data falls _____________.
Finding Quartiles You are managing a store. The average sale for each of 27 randomly selected days in the last year is given. Find Q1, Q2, and Q3. The data in ranked order are: 17 19 20 23 27 28 30 33 35 37 37 38 39 42 42 43 43 44 45 45 45 46 47 48 48 51 55 The median = Q2 = ___ Q1 is ___ and Q3 is ___ The Interquartile range is Q3 – Q1 = _____________ Upper half Lower half
Box and Whisker Plots A box and whisker plot uses 5 key values to describe a set of data. Q1, Q2 and Q3, the minimum value and the maximum value. 5 Number Summary Minimum value First Quartile Q1 The median Q2 Third Quartile Q3 Maximum value
Box and Whisker Plot Practice Construct a box and whisker plot using the following data: 51 67 72 73 73 75 83 85 85 88 89 90 91 92 96 97 98 100 5 Number Summary Minimum value: First Quartile Q1: The median Q2: Third Quartile Q3: Maximum value: 90 80 70 60 50 100 Interquartile Range =____________
The Standard Score (z-score) The z-score represents the number of standard deviations a value x falls from the mean z-scores are calculated using the formula: z-scores can be positive, negative, or 0 A ________ z-score means the data value falls above the mean A ________ z-scores means the data value falls below the mean A z-score of _____ means the data value equals the mean z-scores that fall between ________ are considered usual
z-score practice The average height for males is 70 inches with a standard deviation of 1.5 inches The average height for females is 64 inches with a standard deviation of 2 inches. Would it be unusual to find a man who is 72 inches tall? Mrs. Hallbach is 69 inches tall. Is this unusual? Respective to gender, who is “taller” a man who is 72 inches tall or a woman who is 69 inches tall?