Download presentation
Presentation is loading. Please wait.
Published byCornelius Blankenship Modified over 8 years ago
1
Module 8 Test Review
2
Now is a chance to review all of the great stuff you have been learning in Module 8! Statistical Questioning Measurement of Data Dot Plots and Histograms Box Plots Summarizing Data Data with Outliers
3
Collecting Data Data: Collected information on a given topic data can help someone learn more about a particular topic and make conclusions. Ways to collect data: Surveys Research Interviews
4
Statistical Process Statistical process: A four-step process that helps someone collect, organize, display, and analyze data on a given topic. Steps of the Statistical Process 1. Form a question that can be answered by data. 2. Design and implement a plan that collects the appropriate data. 3. Analyze the data using graphical and numerical methods. 4. Interpret and compare the data.
5
Statistical Process Step 1 of the statistical process: Form a question that can be answered by data Statistical question: A question that guides research by defining the population of a given topic and that anticipates variability in the collected data. **3 Characteristics of a Statistical Question** Allows for clear responses Includes a specific population Anticipates variability of the data
6
Characteristics of a Statistical Question Allows for clear responses There are many different types of data that can be collected from statistical questions. Two of these types are categorical data and numerical data. In the following examples, notice participants had to select an answer from a list, allowing for clear responses
7
Characteristics of a Statistical Question Includes a specific population A statistical question must include a population. A population defines the group being studied. When you collect data, you can either survey the entire population or a sample of the population. Bad example: How many hours of chores were done last night? Survey example: How many hours of chores did you do last night? Best example: How many hours of chores did students at Timber Middle School do last night?
8
Characteristics of a Statistical Question Anticipates variability of the data A statistical question must expect variability in data. This means it should result in a variety of responses, not just one or two responses. This is most important. Yes or no questions do not provide variability. Bad example: How many letters are in Mr. Smith’s last name? Bad example: Do you know how many letters are in your teacher’s last name? Good example: On average, how many letters are in sixth-grade teachers’ last names?
9
Statistical Questions - Examples Let’s take a look at a few questions to determine whether they are statistical questions. This question is not a statistical question because it has a single answer. Either Sarah or Tim is taller. Because there is only one response, this question does not have variability. This is a statistical question. The population is all the students in the class. You can collect data on the height of each person in the class. Because each person could be a different height, the question anticipates variability.
10
How Does Data Look? Whenever data are collected to answer a statistical question, the data have a distribution. The distribution of data simply shows how often each value in the data set occurs. The type of question, who you ask, and how you ask it can all affect the distribution of the data collected. A distribution is described by its center, spread, or shape.
11
Center Center: When describing a distribution, the average or middle of a data set. The center of the data is around the values of 1 and 2. Remember, the data center is not always located in the middle of the table.
12
Spread Spread: When describing a distribution, the variability of a data set. The data are spread between the values of 0 and 8.
13
shape Shape: When describing a distribution, the visual look of a data set. The shape of these data rises at the lower values and falls around the higher values.
14
Measurements of Center Center refers to a data sets Mean or Median Mean: A measure of center found by dividing the sum of the data set by the size of the data set. Median: A measure of center found by determining the middle number in a data set arranged in numerical order.
15
Finding the Mean and Median To calculate the mean of a data set: Add all the numbers in the data set. Determine the size of the data set by counting how many numbers there are. Divide the sum of the data by the size. To find the median of a data set: Write the numbers in order from least to greatest. Determine whether the data set contains an odd number count or an even number count: For a data set with an odd number count, the median is the middle number. For a data set with an even number count, the median is the mean of the two middle numbers.
16
Finding the Mean and Median Mean First, add all of the values in the data set. 7 + 7 + 8 + 8 + 9 + 9 + 9 + 9 + 10 + 11 + 12 = 99 Next, determine the size of the data set: There are 11 total values in the data set. Lastly, calculate the mean by dividing the sum by the size: 99 ÷ 11 = 9
17
Finding the Mean and Median Median
18
Center of the Data Set In this data set, the mean and median are the same value. This makes 9 a great representation of the center of the data. So, you can conclude that the team’s total scores for the season level out to 9 for each of their games. You can also say that at least half the games played resulted in a score greater than or equal to 9 points.
19
Try It!
20
Check your work
21
Measurements of Spread Spread refers to a data sets Range or Mean Absolute Deviation Range: A measure of variability found by calculating the difference between the greatest and least values in a data set. Mean absolute deviation: A measure of variability found by calculating the mean of the distances of each data point from the mean of a data set.
22
Range and Mean Absolute Deviation To calculate the Range: The range of a data set is calculated by subtracting the smallest value in the data set from the largest. To calculate the mean absolute deviation of a data set: Calculate the mean of the data set. Find the distance each point is from the mean. Calculate the mean of the distances.
23
Range - Example Locate the least value and the greatest value of the data set. Because the numbers are ordered, this task is easier. Least value: 7 Greatest value: 12 Subtract the least from the greatest: 12 − 7 = 5
24
mean absolute deviation - Example
25
Dot Plots Dot plot: A graphical display of data using dots to show the frequency of each data value. To create a dot plot: Draw a horizontal line using an appropriate range (or category). Place a dot over the data value for each frequency in the data set. Label the horizontal line and title the graph.
26
Example of a Numerical Dot Plot Notice, we placed a dot about the number on the number line, more than 1 dot over a number would represent how many times the number appeared
27
Example of a Categorical Dot Plot
28
Histograms Histogram: A graph that uses vertical columns, or bars, to show the frequency of data, or intervals of data. Usually used with a large data set To create a histogram: Group the data values into appropriate bin intervals and determine the frequency. Draw and label the horizontal and vertical axes. Draw bars to a certain height based on the frequency of each bin. Do not put spaces between the bars. Remember to title the graph.
29
Trends A trend is the general drift or tendency in a set of data. Trend can help determine if the data are symmetric or asymmetric. They can even identify clusters, peaks, and gaps within the distribution of the data. Peak: The value in the data set that occurs the most often. Cluster: A group of data points gathered around a specific value. Gap: A large space between data points. Symmetry: A distribution that can be divided at the center so each half is a mirror image of the other Asymmetric A distribution that has values occurring at various frequencies.
30
Examples with trends This dot plot is symmetric with a peak at 10. Because the data are symmetric and the peak is toward the center, they create a bell-shaped curve. This is known as “normal distribution of a data set.”
31
Examples with trends This data set is asymmetric and has a peak at 9. It also contains two clusters on each end, where the majority of the data values is grouped. This forms a gap between the two clusters where no data points are located.
32
Examples with trends In this graph, the data set is spread equally across the range of distribution. There are no unique peaks, gaps, or clusters. This type of distribution is called a “uniform distribution.“ Uniform: When each value in the distribution occurs the same amount.
33
Box Plot Box plot: Graph showing minimum, maximum, and quartile values for a data set. There are five pieces of information shown on a box plot: Minimum: the smallest value in the data set Lower Quartile (Q1): the middle value in the lower half of the data set Median (Q2): the middle value in the data set Upper Quartile (Q3): the middle value of the upper half of the data set Maximum: the largest value in the data set
34
Box Plot Minimum: Smallest number in a data set Lower quartile (Q1): Middle value in the lower half of a data set. Median: Middle number of the data set Upper Quartile (Q3): Middle value of the upper half of a data set. Maximum: Biggest number in a data set
35
Box Plots Box plots also show what is called interquartile range, or IQR. Interquartile range (IQR): The difference between Q3 and Q1; about half the numbers in a data set fall in the interquartile range.
36
Five-number summary: To create a box plot, you need the five-number summary for your data set. Five-number summary: A summary of the values in a data set; made up of the minimum, lower quartile, median, upper quartile, and maximum.
37
Five-Number Summary Minimum: 12 Lower quartile: 16 Median: 21 Upper quartile: 28 Maximum: 34
38
Reading a Box Plot You know that box plots can be created from a five-number summary of data. Well, did you know that you can do the opposite and create a five- number summary by reading a box plot? Let’s use the box plot above to find the five-number summary
39
Reading a Box Plot - example First you need the minimum, which is the smallest value. Look at where the left whisker ends. In the plot, this value is twelve. To find the maximum value, look at the right whisker, which ends at thirty- four. minimumMaximum
40
Reading a Box Plot - example The lower quartile is found at the left side of the box. For the old fitness scores, the lower quartile is fourteen. The right side of the box ends at thirty, which is the upper quartile. lower quartile upper quartile
41
Reading a Box Plot - example The vertical line inside the box marks the median. The middle number for the old fitness scores is twenty-one. median From this, you can also see that the interquartile range is sixteen. The middle fifty percent of values in the data set are between fourteen and thirty.
42
Reading a Box Plot - example
43
Summarizing Data – Dot Plot A dot plot is a graphical display of data that uses dots to show the frequency of each data value. This graph is best used to display small data sets. Because all data points in the data set are shown, the mean, median, range, and mean absolute deviation can be calculated.
44
Summarizing Data – Dot Plot - example The title of this graph tells you the display is about a class activity. The unit of measure is the number of dandelions. This is the unit for each number shown on the number line. There are 11 dots on this graph, and they represent the total number of observations. An observation is another way to describe a recorded data point or occurrence. The data has a range of 8 to 13. Also, 10 is the peak, with four occurrences. All this information allows you to summarize that, for this class activity, the majority of the observations were 10 dandelions, with four occurrences.
45
Summarizing Data – Histogram A histogram is a graph that uses vertical columns or bars to show the frequency of data or intervals of data. This visual is helpful for larger data sets. Because the actual data points are not displayed, only general conclusions can be made from a histogram. No exact measurement of data can be calculated. However, you can determine the size of the data set.
46
Summarizing Data – Histogram - example The title of this graph tells you the display is about various heights of sixth-grade students. The unit of measurement is height, in inches. Each bin contains intervals of three. The frequency allows you to determine there are 28 total observations in the data set. Note that you can also combine columns. For example, eight students have a height between 58 inches and 63 inches, but you don’t know the exact values. The data appear symmetric with a peak near the center, at 64 to 66 inches, creating a bell-shaped curve. All this information allows you to summarize that, for sixth-grade students, the majority are between 64 inches and 66 inches tall.
47
Summarizing Data – Box Plot A box plot is a type of display in which the data are divided into four parts, called "quartiles." The middle 50% of the data is found in the "box" and the lower and upper 25% of the data is found in the "whiskers." This graph is used for larger data sets and to analyze the distribution of data as a whole. In a box plot, no individual data points are seen. Therefore, only the ranges, median, and quartiles can be determined.
48
Summarizing Data – Box Plot - example Remember, each quartile represents 25%, or one-fourth, of the data. So, if the total number of participants is 32, then there are eight data values per quartile. This means 16 data values are located in the interquartile range. The unit of measure for the data is the points of the score. You can state that all students scored at least 12 points on the fitness test, but no more than 34 points. This is the spread of the data. The median of the data set is 21. Therefore, at least half the scores are 21 points or fewer.
49
Making Calculations Based on your previous work, you can calculate various measurements of data from the displays. Here is a helpful chart to help review how to find the measurements of mean, median, range and mean absolute deviation
50
Summarizing Data We can summarize data by using measurements When summarizing data keep the following questions in mind: How many observations are reported in the graph? What is the spread of the data set? What is the center of the data set? What is the shape of the data set?
51
Summarizing Data – Dot Plot Observations The title tells us the observations are about the amount of time spent studying for a math test. The unit of measure is hours. There are 15 dots in the plot, which means 15 students were surveyed. The data set from this graph is 1, 2, 2, 2, 3, 3, 3, 3, 6, 7, 7, 7, 7, 8, 8.
52
Summarizing Data – Dot Plot Center Because you are able to write out the entire data set from the graph, you can calculate the center. The mean of the data set is 4.6 hours. The median is 3 hours. Although these data are not seen easily in the plot, they can be determined. Notice where the mean and median are in relation to each other on the graph.
53
Summarizing Data – Dot Plot Spread The range of the data is 7 hours and the interquartile range is 5 hours. The mean absolute deviation of the data is about 2.4 hours. For this data set, there is a large spread. The interquartile range is almost as large as the actual range. Shape The graph is not symmetric. It contains two clusters and a gap. The gap between the clusters tells you students studied either less than 4 hours or more than 5 hours.
54
Summarizing Data – Box Plot Observations The title tells you there were 36 participants in the memory game. The unit of measure is the number of objects recalled during the game. Because the box represents 50% of the observations, there are 18 observations located in the interquartile range. You can also state that nine students recalled at least eight objects or less, which is the amount of the first quartile. This is 25% of the total.
55
Summarizing Data – Box Plot Center Remember, the only center value shown in a box plot is the median. The median of the data is nine. This means half the students recalled at least nine objects or more. Spread The minimum value is seven and maximum value is 15. So, there is a range of eight objects recalled. This means the participant who recalled the most amount of objects remembered eight more objects than the participant who recalled the least amount of objects. The interquartile range is four objects. The middle 50% varied by four objects recalled. Shape The left side of the box is smaller than the right side. Because the median is not dividing the box equally, the middle half of the data are not symmetric. You can also see the box itself is located closer to the minimum value.
56
Maintaining the Average You already know how to calculate the mean by dividing the sum of the data values by the size of the data set: Let's learn how to use this equation to determine a missing number from a data set when given the mean.
57
Maintaining the Average Each school newspaper is rated on a 50-point scale. Tiffany’s goal is for the club to average a score of 40 points this year. The first three editions have scored 38 points, 42 points, and 32 points. She wants to know what score is needed for the last edition to earn this average. Determine the sum To determine a missing value in a data set when given the mean, you must first calculate the desired sum of the data set. The desired mean is 40. This indicates that if the total possible points are distributed equally among the 4 observations
58
Maintaining the Average Determine the missing value Next, you must calculate the sum of three known values 38 + 42 + 32 = 112 This indicates the club has one-hundred twelve points out of the one- hundred sixty that are needed. Since we know the total points needed is 160, we can find the difference to determine the missing value 160 − 112 = 48 48 points is needed on the last rating to make sure they have a mean of 40
59
Key Terms Outlier : A data value that is much larger (or smaller) than the rest of the data values in the set. Skew : When a data set has a long tail in its shape on one side of the distribution.
60
Examples of Outliers Notice the numbers that lie “outside” the rest of the data
61
Examples of Skew Notice how the data “tails” off to one side
62
Effects of Outliers It is important to identify outliers. They can affect the measurements of center and spread because of their pull. Does an outlier affect all measurements? Let’s see exactly how outliers affect measurements of data. This will help you know the best measurement of center and spread to use when outliers are present.
63
Effects of Outliers - example Determine the effects of the outlier on the measures of center and spread in the data set shown here: 26, 5, 1, 2, 5, 4, 6, 3, 6, 4, 4 MEAN The outlier is the value 26. Here is the mean of the data: With Outlier Sum of the data = 66 Data size = 11 Mean = 6 Without Outlier Sum of the data = 40 Data size = 10 Mean = 4 In this case, with the outlier, the mean is six. Without the outlier, the mean is only four. The outlier pulls the mean to the right, toward the outlier. The mean value including the outlier gives a misleading picture of the center of the data set.
64
Effects of Outliers - example Determine the effects of the outlier on the measures of center and spread in the data set shown here: 26, 5, 1, 2, 5, 4, 6, 3, 6, 4, 4 MEDIAN In this case, the median is four with and without the outlier With an outlier, the median still gives the same idea about the center of the data Median is a better measure of center than the mean when there are outliers present.
65
Effects of Outliers - example Determine the effects of the outlier on the measures of center and spread in the data set shown here: 26, 5, 1, 2, 5, 4, 6, 3, 6, 4, 4 Mean Absolute Deviation The mean absolute deviation of a data set gives you an idea of how far the values in a data set are from the mean of that data set. An outlier has two opportunities to distort this calculation. First, you already know an outlier changes the mean of a data set. Second, when you calculate the distance from the outlier to the new mean, you create an outlier in the data set of distances from the mean. The mean absolute deviation for this particular data set with an outlier is three times as big as the mean absolute deviation for the data set without an outlier. Just like with the mean, if a data set has an outlier, the mean absolute deviation is not a good summary of the spread.
66
Effects of Outliers - example Determine the effects of the outlier on the measures of center and spread in the data set shown here: 26, 5, 1, 2, 5, 4, 6, 3, 6, 4, 4 Interquartile Range With Outlier image: Median of data set is circled at the two values of 26 and 27. The 24 and 32 are also circled. With the outlier, the lower quartile is 24 and the upper quartile is 32. The interquartile range is 8. Without Outlier image: Median of data set is circled at the two values of at 27. There is a circle around the pairs 24 and 25 and around 32 and 33. When working with the interquartile range, the presence of an outlier does not make as much of a difference as it does with the mean absolute deviation
67
You have now had a chance to review all of the great stuff you learned in Module 8! Statistical Questioning Measurement of Data Dot Plots and Histograms Box Plots Summarizing Data Data with Outliers Have you completed all assessments in module 8? Have you completed your Module 8 DBA? Now you are ready to move forward and complete your module 8 test. Please make sure you are ready to complete your test before you enter the test session.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.