Download presentation
Presentation is loading. Please wait.
Published byJulius Holland Modified over 9 years ago
1
Check roster below the chat area for your name to be sure you get credit! Audio will start at class time. Previously requested topics will be gone over first. Feel free to put topics in the chat area, I will try to get to them before the end of seminar. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 1
2
Percentiles Are normally used with lots of data. We divide the number of data values by 100, and that will tell us how many data values are in each percent. The following example has the grocery bills for 300 families for a week. There will be 3 data values to each percent, or 30 values for each 10 %. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 2
3
3 The Central Limit Theorem Suppose we take many random samples of size n for a variable with any distribution (not necessarily a normal distribution) and record the distribution of the means of each sample. Then, 1.The distribution of means will be approximately a normal distribution for large sample sizes. 2.The mean of the distribution of means approaches the population mean, µ, for large sample sizes. 3.The standard deviation of the distribution of means approaches σ/√n for large sample sizes, where σ is the standard deviation of the population. Page 217
4
4 Figure 5.26 As the sample size increases (n = 5, 10, 30), the distribution of sample means approaches a normal distribution, regardless of the shape of the original distribution. The larger the sample size, the smaller is the standard deviation of the distribution of sample means.
5
You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is = 400 with a standard deviation of = 70. Assume the scores are normally distributed. a. What is the likelihood that one of your eighth-graders, selected at random, will score below 375 on the exam? Solution: a.In dealing with an individual score, we use the method of standard scores discussed in Section 5.2. Given the mean of 400 and standard deviation of 70, a score of 375 has a standard score of z = = = -0.36 EXAMPLE 1 Predicting Test Scores data value – mean standard deviation 375 – 400 70
6
6
7
According to Table 5.1, a standard score of -0.36 corresponds to about the 36th percentile— that is, 36% of all students can be expected to score below 375. Thus, there is about a 0.36 chance that a randomly selected student will score below 375. Notice that we need to know that the scores have a normal distribution in order to make this calculation, because the table of standard scores applies only to normal distributions. EXAMPLE 1 Predicting Test Scores Solution: (cont.)
8
You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is = 400 with a standard deviation of = 70. Assume the scores are normally distributed. b. Your performance as a principal depends on how well your entire group of eighth-graders scores on the exam. What is the likelihood that your group of 100 eighth-graders will have a mean score below 375? Solution: b. The question about the mean of a group of students must be handled with the Central Limit Theorem. According to this theorem, if we take random samples of size n = 100 students and compute the mean test score of each group, the distribution of means is approximately normal. EXAMPLE 1 Predicting Test Scores
9
Moreover, the mean of this distribution is = 400 and its standard deviation is = 70/ 100 = 7. With these values for the mean and standard deviation, the standard score for a mean test score of 375 is EXAMPLE 1 Predicting Test Scores Solution: (cont.) data value – mean standard deviation 375 – 400 7 z = = = -0.357 Table 5.1 shows that a standard score of -3.5 corresponds to the 0.02th percentile, and the standard score in this case is even lower. In other words, fewer than 0.02% of all random samples of 100 students will have a mean score of less than 375.
10
10
11
Therefore, the chance that a randomly selected group of 100 students will have a mean score below 375 is less than 0.0002, or about 1 in 5,000. Notice that this calculation regarding the group mean did not depend on the individual scores’ having a normal distribution. EXAMPLE 1 Predicting Test Scores Solution: (cont.) This example has an important lesson. The likelihood of an individual scoring below 375 is more than 1 in 3 (36%), but the likelihood of a group of 100 students having a mean score below 375 is less than 1 in 5,000 (0.02%). In other words, there is much more variation in the scores of individuals than in the means of groups of individuals.
12
Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 12 Some topics from Unit 5/Chapter 7 Correlation & Causality
13
Figure 7.3 Types of correlation seen on scatter diagrams. Types of Correlation Page 289 13
14
Linear Correlation Coefficient Page 294
15
The line of best fit (regression line or the least squares line) is the line that best fits the data, i.e. it is closer to the data than any other line. This line can be calculated as: y = mx + b, where Slope, m = r(s y /s x ), with s y is the standard deviation of y & s x is the standard deviation of x Y-intercept, b = y – (m * x), with y as the mean of the y’s and x as the mean of the x’s. (again, StatCrunch or another program is handy) Page 313 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 15
16
State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: No one exercises 18 hours per day on an ongoing basis, so this much exercise must be beyond the bounds of any data collected. Therefore, a prediction about someone who exercises 18 hours per day should not be trusted. EXAMPLE 1 Valid Predictions? You’ve found a best-fit line for a correlation between the number of hours per day that people exercise and the number of calories they consume each day. You’ve used this correlation to predict that a person who exercises 18 hours per day would consume 15,000 calories per day. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 16
17
State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Historical data have shown a strong negative correlation between national birth rates and affluence. That is, countries with greater affluence tend to have lower birth rates. These data predict a high birth rate in Russia. We cannot automatically assume that the historical data still apply today. In fact, Russia currently has a very low birth rate, despite also having a low level of affluence. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 17
18
State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? A study in China has discovered correlations that are useful in designing museum exhibits that Chinese children enjoy. A curator suggests using this information to design a new museum exhibit for Atlanta-area school children. The suggestion to use information from the Chinese study for an Atlanta exhibit assumes that predictions made from correlations in China also apply to Atlanta. However, given the cultural differences between China and Atlanta, the curator’s suggestion should not be considered without more information to back it up. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 18
19
State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Scientific studies have shown a very strong correlation between children’s ingesting of lead and mental retardation. Based on this correlation, paints containing lead were banned. Given the strength of the correlation and the severity of the consequences, this prediction and the ban that followed seem quite reasonable. In fact, later studies established lead as an actual cause of mental retardation, making the rationale behind the ban even stronger. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 19
20
State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Based on a large data set, you’ve made a scatter diagram for salsa consumption (per person) versus years of education. The diagram shows no significant correlation, but you’ve drawn a best-fit line anyway. The line predicts that someone who consumes a pint of salsa per week has at least 13 years of education. Because there is no significant correlation, the best-fit line and any predictions made from it are meaningless. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 20
21
Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. The square of the correlation coefficient, or r 2, is the proportion of the variation in a variable that is accounted for by the best-fit line. The use of multiple regression allows the calculation of a best-fit equation that represents the best fit between one variable (such as price) and a combination of two or more other variables (such as weight and color). The coefficient of determination, R 2, tells us the proportion of the scatter in the data accounted for by the best-fit equation. 21
22
Political scientists are interested in knowing what factors affect voter turnout in elections. One such factor is the unemployment rate. Data collected in presidential election years since 1964 show a very weak negative correlation between voter turnout and the unemployment rate, with a correlation coefficient of about r = -0.1. Based on this correlation, should we use the unemployment rate to predict voter turnout in the next presidential election? Note that there is a scatter diagram of the voter turnout data on page 312. Solution: The square of the correlation coefficient is r 2 = (-0.1) 2 = 0.01, which means that only about 1% of the variation in the data is accounted for by the best-fit line. Nearly all of the variation in the data must therefore be explained by other factors. We conclude that unemployment is not a reliable predictor of voter turnout. EXAMPLE 4 Voter Turnout and Unemployment Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 22
23
Some StatCrunch Videos Some of mine are at: http://www.screencast.com/users/TamaraEyster/folders/MM207 %20Videos http://www.screencast.com/users/TamaraEyster/folders/MM207 %20Videos http://www.ramshillfarm.com/Math/Math207/index.html Some videos made by other instructors: – Use StatCrunch to find correlation between two variables http://screencast.com/t/rAbGVY5We8 – Find a Confidence Interval for a population mean using StatCrunch http://screencast.com/t/rTYiEKGo3ww – Find a Confidence Interval for a population proportion using StatCrunch http://screencast.com/t/eISO0FRlQu Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.