Check roster below the chat area for your name to be sure you get credit! Audio will start at class time. Previously requested topics will be gone over.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Regression and correlation methods
Lesson 10: Linear Regression and Correlation
Section 5.3 ~ The Central Limit Theorem Introduction to Probability and Statistics Ms. Young ~ room 113.
Chapter 4 The Relation between Two Variables
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Simple Linear Regression Analysis
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 16 Correlation and Coefficient of Correlation
Linear Regression and Correlation
Correlation and Linear Regression
Correlation and Regression
Section 7.3 ~ Best-Fit Lines and Prediction Introduction to Probability and Statistics Ms. Young.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Inferences for Regression
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Elementary Review over GRAPHS!!! Seriously…students seem to forget this stuff. Outcome 5, Component 2.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation & Regression
Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Chapter 6.3 The central limit theorem. Sampling distribution of sample means A sampling distribution of sample means is a distribution using the means.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Found StatCrunch Resources
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Statistics Unit 9 only requires us to do Sections 1 & 2. * If we have time, there are some topics in Sections 3 & 4, that I will also cover. They tie in.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 1 Welcome to Unit.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Some Reminders: Check the Roster below the chat area to make sure you are listed, especially if it says you left! Audio starts on the hour. Active on-topic.
Section 8.1 Sampling Distributions Page Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions.
Stats Methods at IC Lecture 3: Regression.
Copyright © Cengage Learning. All rights reserved.
Welcome to the Unit 5 Seminar Kristin Webster
Regression and Correlation
Regression Analysis.
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
5.3 The Central Limit Theorem
Chapter 5 STATISTICS (PART 4).
7.3 Best-Fit Lines and Prediction
Elementary Statistics
BUS173: Applied Statistics
7.3 Best-Fit Lines and Prediction
Sampling Distributions
Correlation and Regression
5.3 The Central Limit Theorem
Correlation and Causality
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Check roster below the chat area for your name to be sure you get credit! Audio will start at class time. Previously requested topics will be gone over first. Feel free to put topics in the chat area, I will try to get to them before the end of seminar. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 1

Percentiles Are normally used with lots of data. We divide the number of data values by 100, and that will tell us how many data values are in each percent. The following example has the grocery bills for 300 families for a week. There will be 3 data values to each percent, or 30 values for each 10 %. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 2

3 The Central Limit Theorem Suppose we take many random samples of size n for a variable with any distribution (not necessarily a normal distribution) and record the distribution of the means of each sample. Then, 1.The distribution of means will be approximately a normal distribution for large sample sizes. 2.The mean of the distribution of means approaches the population mean, µ, for large sample sizes. 3.The standard deviation of the distribution of means approaches σ/√n for large sample sizes, where σ is the standard deviation of the population. Page 217

4 Figure 5.26 As the sample size increases (n = 5, 10, 30), the distribution of sample means approaches a normal distribution, regardless of the shape of the original distribution. The larger the sample size, the smaller is the standard deviation of the distribution of sample means.

You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is  = 400 with a standard deviation of  = 70. Assume the scores are normally distributed. a. What is the likelihood that one of your eighth-graders, selected at random, will score below 375 on the exam? Solution: a.In dealing with an individual score, we use the method of standard scores discussed in Section 5.2. Given the mean of 400 and standard deviation of 70, a score of 375 has a standard score of z = = = EXAMPLE 1 Predicting Test Scores data value – mean standard deviation 375 –

6

According to Table 5.1, a standard score of corresponds to about the 36th percentile— that is, 36% of all students can be expected to score below 375. Thus, there is about a 0.36 chance that a randomly selected student will score below 375. Notice that we need to know that the scores have a normal distribution in order to make this calculation, because the table of standard scores applies only to normal distributions. EXAMPLE 1 Predicting Test Scores Solution: (cont.)

You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is  = 400 with a standard deviation of  = 70. Assume the scores are normally distributed. b. Your performance as a principal depends on how well your entire group of eighth-graders scores on the exam. What is the likelihood that your group of 100 eighth-graders will have a mean score below 375? Solution: b. The question about the mean of a group of students must be handled with the Central Limit Theorem. According to this theorem, if we take random samples of size n = 100 students and compute the mean test score of each group, the distribution of means is approximately normal. EXAMPLE 1 Predicting Test Scores

Moreover, the mean of this distribution is  = 400 and its standard deviation is = 70/ 100 = 7. With these values for the mean and standard deviation, the standard score for a mean test score of 375 is EXAMPLE 1 Predicting Test Scores Solution: (cont.) data value – mean standard deviation 375 – z = = = Table 5.1 shows that a standard score of -3.5 corresponds to the 0.02th percentile, and the standard score in this case is even lower. In other words, fewer than 0.02% of all random samples of 100 students will have a mean score of less than 375.

10

Therefore, the chance that a randomly selected group of 100 students will have a mean score below 375 is less than , or about 1 in 5,000. Notice that this calculation regarding the group mean did not depend on the individual scores’ having a normal distribution. EXAMPLE 1 Predicting Test Scores Solution: (cont.) This example has an important lesson. The likelihood of an individual scoring below 375 is more than 1 in 3 (36%), but the likelihood of a group of 100 students having a mean score below 375 is less than 1 in 5,000 (0.02%). In other words, there is much more variation in the scores of individuals than in the means of groups of individuals.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 12 Some topics from Unit 5/Chapter 7 Correlation & Causality

Figure 7.3 Types of correlation seen on scatter diagrams. Types of Correlation Page

Linear Correlation Coefficient Page 294

The line of best fit (regression line or the least squares line) is the line that best fits the data, i.e. it is closer to the data than any other line. This line can be calculated as: y = mx + b, where Slope, m = r(s y /s x ), with s y is the standard deviation of y & s x is the standard deviation of x Y-intercept, b = y – (m * x), with y as the mean of the y’s and x as the mean of the x’s. (again, StatCrunch or another program is handy) Page 313 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 15

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: No one exercises 18 hours per day on an ongoing basis, so this much exercise must be beyond the bounds of any data collected. Therefore, a prediction about someone who exercises 18 hours per day should not be trusted. EXAMPLE 1 Valid Predictions? You’ve found a best-fit line for a correlation between the number of hours per day that people exercise and the number of calories they consume each day. You’ve used this correlation to predict that a person who exercises 18 hours per day would consume 15,000 calories per day. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 16

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Historical data have shown a strong negative correlation between national birth rates and affluence. That is, countries with greater affluence tend to have lower birth rates. These data predict a high birth rate in Russia. We cannot automatically assume that the historical data still apply today. In fact, Russia currently has a very low birth rate, despite also having a low level of affluence. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 17

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? A study in China has discovered correlations that are useful in designing museum exhibits that Chinese children enjoy. A curator suggests using this information to design a new museum exhibit for Atlanta-area school children. The suggestion to use information from the Chinese study for an Atlanta exhibit assumes that predictions made from correlations in China also apply to Atlanta. However, given the cultural differences between China and Atlanta, the curator’s suggestion should not be considered without more information to back it up. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 18

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Scientific studies have shown a very strong correlation between children’s ingesting of lead and mental retardation. Based on this correlation, paints containing lead were banned. Given the strength of the correlation and the severity of the consequences, this prediction and the ban that followed seem quite reasonable. In fact, later studies established lead as an actual cause of mental retardation, making the rationale behind the ban even stronger. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 19

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Based on a large data set, you’ve made a scatter diagram for salsa consumption (per person) versus years of education. The diagram shows no significant correlation, but you’ve drawn a best-fit line anyway. The line predicts that someone who consumes a pint of salsa per week has at least 13 years of education. Because there is no significant correlation, the best-fit line and any predictions made from it are meaningless. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 20

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. The square of the correlation coefficient, or r 2, is the proportion of the variation in a variable that is accounted for by the best-fit line. The use of multiple regression allows the calculation of a best-fit equation that represents the best fit between one variable (such as price) and a combination of two or more other variables (such as weight and color). The coefficient of determination, R 2, tells us the proportion of the scatter in the data accounted for by the best-fit equation. 21

Political scientists are interested in knowing what factors affect voter turnout in elections. One such factor is the unemployment rate. Data collected in presidential election years since 1964 show a very weak negative correlation between voter turnout and the unemployment rate, with a correlation coefficient of about r = Based on this correlation, should we use the unemployment rate to predict voter turnout in the next presidential election? Note that there is a scatter diagram of the voter turnout data on page 312. Solution: The square of the correlation coefficient is r 2 = (-0.1) 2 = 0.01, which means that only about 1% of the variation in the data is accounted for by the best-fit line. Nearly all of the variation in the data must therefore be explained by other factors. We conclude that unemployment is not a reliable predictor of voter turnout. EXAMPLE 4 Voter Turnout and Unemployment Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 22

Some StatCrunch Videos Some of mine are at: %20Videos %20Videos Some videos made by other instructors: – Use StatCrunch to find correlation between two variables – Find a Confidence Interval for a population mean using StatCrunch – Find a Confidence Interval for a population proportion using StatCrunch Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 23