Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company.

Slides:



Advertisements
Similar presentations
Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Advertisements

Inference for Regression
Objectives (BPS chapter 24)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 23 Multiple Regression (Sections )
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
Correlation and Regression Analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
Active Learning Lecture Slides
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
STA291 Statistical Methods Lecture 27. Inference for Regression.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Inference for Regression Find your notes from last week, Put # of beers in L1 and BAC in L2, then find LinReg(ax+b)
Inference for Regression BPS chapter 23 © 2010 W.H. Freeman and Company.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 15 Inference for Regression
Ch 15 – Inference for Regression. Example #1: The following data are pulse rates and heights for a group of 10 female statistics students. Height
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Correlation & Regression
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Confidence Intervals: The Basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Regression BPS chapter 5 © 2010 W.H. Freeman and Company.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
Simple linear regression Tron Anders Moger
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
We will use the 2012 AP Grade Conversion Chart for Saturday’s Mock Exam.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Inference for Linear Regression
23. Inference for regression
Regression and Correlation
AP Statistics Chapter 14 Section 1.
Inference for Regression
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 26: Inference for Regression
Review for Exam 2 Some important themes from Chapters 6-9
Inference for Regression
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
Inferences for Regression
Presentation transcript:

Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company

Linear regression Which point represents “a” in our least-squares regression equation? a) Point Q b) Point S c) Point R d) Point T

Linear regression (answer) Which point represents “a” in our least-squares regression equation? a) Point Q b) Point S c) Point R d) Point T

Correlation If two quantitative variables, X and Y, have a correlation coefficient r = 0.80, which graph could be a scatterplot of the two variables? a) Plot A b) Plot B c) Plot C

Correlation (answer) If two quantitative variables, X and Y, have a correlation coefficient r = 0.80, which graph could be a scatterplot of the two variables? a) Plot A b) Plot B c) Plot C

Correlation Which of the following statements is true? a) r Plot A > r Plot B b) r Plot C > r Plot A c) r Plot C > r Plot B d) The correlation coefficient is the same in all plots.

Correlation (answer) Which of the following statements is true? a) r Plot A > r Plot B b) r Plot C > r Plot A c) r Plot C > r Plot B d) The correlation coefficient is the same in all plots.

Residual The following scatterplot shows the number of gold medals earned by countries in 1992 versus how many earned in Which of the points would have the smallest residual? a) Point A b) Point B c) Point C d) Point D

Residual (answer) The following scatterplot shows the number of gold medals earned by countries in 1992 versus how many earned in Which of the points would have the smallest residual? a) Point A b) Point B c) Point C d) Point D

Regression line In the previous question about gold medals, the least-squares regression equation is: Where x is the number of medals earned in 1992 and is the predicted number of medals earned in What is the best interpretation of b in this example? a) Countries that earned ten medals in the 1992 Olympics are predicted to earn an average of nine medals in b) For all countries participating in the 1992 Olympics, 89% earned medals in c) If a country earned zero medals in 1992, they would have an 89% chance of earning one in d) All countries who earned medals in 1992 had an 89% probability of earning a medal in 1996.

Regression line (answer) In the previous question about gold medals, the least-squares regression equation is: Where x is the number of medals earned in 1992 and is the predicted number of medals earned in What is the best interpretation of b in this example? a) Countries that earned ten medals in the 1992 Olympics are predicted to earn an average of nine medals in b) For all countries participating in the 1992 Olympics, 89% earned medals in c) If a country earned zero medals in 1992, they would have an 89% chance of earning one in d) All countries who earned medals in 1992 had an 89% probability of earning a medal in 1996.

Appropriate analysis Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. If he wanted to investigate if there was a linear relationship between the distance and the velocity, what type of analysis did he perform? a) Two-sample t-test on means b)  2 analysis on proportions c) Linear regression analysis d) Matched pairs experiment

Appropriate analysis (answer) Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. If he wanted to investigate if there was a linear relationship between the distance and the velocity, what type of analysis did he perform? a) Two-sample t-test on means b)  2 analysis on proportions c) Linear regression analysis d) Matched pairs experiment

Linear regression Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. He used the following model: where x represents the distance the galaxy is from the earth (in megaparsecs) and represents the mean velocity (in km/sec) for all galaxies at that distance. What does  represent in this problem? a) The average velocity for a galaxy that is extremely close to earth. b) The average change in velocity for a one-megaparsec increase in distance for those galaxies in the sample. c) The average velocity for all galaxies in the universe. d) The average change in velocity for a one-megaparsec increase in distance of all galaxies.

Linear regression (answer) Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. He used the following model: where x represents the distance the galaxy is from the earth (in megaparsecs) and represents the mean velocity (in km/sec) for all galaxies at that distance. What does  represent in this problem? a) The average velocity for a galaxy that is extremely close to earth. b) The average change in velocity for a one-megaparsec increase in distance for those galaxies in the sample. c) The average velocity for all galaxies in the universe. d) The average change in velocity for a one-megaparsec increase in distance of all galaxies.

Linear regression Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. Summarizing his data with a scatterplot and generating the least-squares regression line gave the following table: Based on the information in the table, what is the correct equation for the least-squares regression line? a) b) c) d) e)

Linear regression (answer) Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. Summarizing his data with a scatterplot and generating the least-squares regression line gave the following table: Based on the information in the table, what is the correct equation for the least-squares regression line? a) b) c) d) e)

Residuals Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. By looking at the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for performing the linear regression? a) Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met. b) The residual plot implies that the data violate the assumption of normality. c) The histogram of the residuals shows that the data are extremely right- skewed. d) Neither plot tells us anything about the assumptions for doing inference for regression. e) The residual plot implies that the data violate the assumption of linearity.

Residuals (answer) Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. By looking at the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for performing the linear regression? a) Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met. b) The residual plot implies that the data violate the assumption of normality. c) The histogram of the residuals shows that the data are extremely right- skewed. d) Neither plot tells us anything about the assumptions for doing inference for regression. e) The residual plot implies that the data violate the assumption of linearity.

Linear relationship Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. If the researchers want to test whether there is a positive linear relationship between the distance and velocity, what hypotheses could be used? a) b) c) d)

Linear relationship (answer) Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. If the researchers want to test whether there is a positive linear relationship between the distance and velocity, what hypotheses could be used? a) b) c) d)

Linear regression Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. For a confidence interval for  we use the general form for a confidence interval: estimate  (table value) (SE of the estimate) According to the printout above, what value should we use for the standard error of the estimate? a) b) c) d)

Linear regression (answer) Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. For a confidence interval for  we use the general form for a confidence interval: estimate  (table value) (SE of the estimate) According to the printout above, what value should we use for the standard error of the estimate? a) b) c) d)

Confidence interval Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. If a 95% confidence interval for  is (298.12, ), what conclusion could be made about  at a significance level of  = 0.05? a) We have sufficient evidence to conclude that there is no linear relationship between velocity and distance. b) We have sufficient evidence to conclude that there is a linear relationship between velocity and distance. c) There is insufficient evidence to conclude that there is a linear relationship between velocity and distance. d) The confidence interval does not give us enough information to answer this question.

Confidence interval (answer) Edwin Hubble collected data on the distance a galaxy is from the earth and the velocity with which it appears to be receding. If a 95% confidence interval for  is (298.12, ), what conclusion could be made about  at a significance level of  = 0.05? a) We have sufficient evidence to conclude that there is no linear relationship between velocity and distance. b) We have sufficient evidence to conclude that there is a linear relationship between velocity and distance. c) There is insufficient evidence to conclude that there is a linear relationship between velocity and distance. d) The confidence interval does not give us enough information to answer this question.

Prediction intervals For a house of size 1500 ft 2, the 95% prediction interval for its selling price will be _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft 2 ? a) Wider than b) The same as c) Narrower than d) Not comparable with

Prediction intervals (answer) For a house of size 1500 ft 2, the 95% prediction interval for its selling price will be _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft 2 ? a) Wider than b) The same as c) Narrower than d) Not comparable with

Prediction intervals True or false: If we give a prediction interval for one home whose size is 1500 ft 2, this interval estimates the mean selling prices for all homes whose size is 1500 ft 2. a) True b) False

Prediction intervals (answer) True or false: If we give a prediction interval for one home whose size is 1500 ft 2, this interval estimates the mean selling prices for all homes whose size is 1500 ft 2. a) True b) False

Prediction intervals True or false: If we compute a prediction interval for one home whose size is 1100 ft 2 and a 95% confidence interval for the mean selling prices of all homes whose size is 1100 ft 2, the centers of the intervals will be the same. a) True b) False

Prediction intervals (answer) True or false: If we compute a prediction interval for one home whose size is 1100 ft 2 and a 95% confidence interval for the mean selling prices of all homes whose size is 1100 ft 2, the centers of the intervals will be the same. a) True b) False

Hypothesis tests Researchers at The Ohio State University wanted to know if they could use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). The following scatterplot shows the data. In order to know if the number of beers consumed was a good predictor of BAC, they tested. From the following table, what is the test statistic for performing this test? a) b) c) d) e)

Hypothesis tests (answer) Researchers at The Ohio State University wanted to know if they could use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). The following scatterplot shows the data. In order to know if the number of beers consumed was a good predictor of BAC, they tested. From the following table, what is the test statistic for performing this test? a) b) c) d) e)

Hypothesis tests Researchers at The Ohio State University wanted to know if they could use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). In order to know if the number of beers consumed was a good predictor of BAC, they tested. What can we conclude from the following table? a) Because the P-value is , there is a significant linear relationship between the number of beers consumed and BAC. b) Because the P-value is , there is a significant linear relationship between the number of beers consumed and BAC. c) Because the P-value is , there is no significant linear relationship between the number of beers consumed and BAC. d) Because the P-value is , there is no significant linear relationship between the number of beers consumed and BAC.

Hypothesis tests (answer) Researchers at The Ohio State University wanted to know if they could use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). In order to know if the number of beers consumed was a good predictor of BAC, they tested. What can we conclude from the following table? a) Because the P-value is , there is a significant linear relationship between the number of beers consumed and BAC. b) Because the P-value is , there is a significant linear relationship between the number of beers consumed and BAC. c) Because the P-value is , there is no significant linear relationship between the number of beers consumed and BAC. d) Because the P-value is , there is no significant linear relationship between the number of beers consumed and BAC.

Prediction Researchers at The Ohio State University wanted to know if they could use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). We want to predict the mean BAC for students who have had seven beers. Should we use the 95% confidence interval for, which is (0.0976, ), or the 95% prediction interval for Y for X = x * which is (0.0667, )? a) Confidence interval b) Prediction interval

Prediction (answer) Researchers at The Ohio State University wanted to know if they could use the number of beers consumed by a student to predict the student’s blood alcohol content (BAC). We want to predict the mean BAC for students who have had seven beers. Should we use the 95% confidence interval for, which is (0.0976, ), or the 95% prediction interval for Y for X = x * which is (0.0667, )? a) Confidence interval b) Prediction interval

Conclusions The following scatterplot shows a linear regression analysis of the relationship between the time (in seconds), y, to run a marathon versus the year the marathon was run, x. A statistics student used the regression equation y = 337,047 – x to predict how fast the marathon would be run in She got an answer of 5022 seconds, or about 1 hour and 24 minutes. This conclusion is: a) Believable because the results came from the regression equation. b) Believable because looking at the graph you can see that the time to run a marathon is indeed decreasing. c) Unbelievable because no one will ever be able to run a marathon that quickly. d) Unbelievable because using 2004 to predict the running time would be considered extrapolation.

Conclusions (answer) The following scatterplot shows a linear regression analysis of the relationship between the time (in seconds), y, to run a marathon versus the year the marathon was run, x. A statistics student used the regression equation y = 337,047 – x to predict how fast the marathon would be run in She got an answer of 5022 seconds, or about 1 hour and 24 minutes. This conclusion is: a) Believable because the results came from the regression equation. b) Believable because looking at the graph you can see that the time to run a marathon is indeed decreasing. c) Unbelievable because no one will ever be able to run a marathon that quickly. d) Unbelievable because using 2004 to predict the running time would be considered extrapolation.

Conclusions An article in a newspaper said that students who major in subjects that have higher expected incomes after graduation are more likely to be married. This conclusion is: a) Correct because the data were collected in a scientific way. b) Incorrect because the results are likely biased due to lurking variables. c) Not reliable because it does not sound plausible.

Conclusions (answer) An article in a newspaper said that students who major in subjects that have higher expected incomes after graduation are more likely to be married. This conclusion is: a) Correct because the data were collected in a scientific way. b) Incorrect because the results are likely biased due to lurking variables. c) Not reliable because it does not sound plausible.

Relationships The following plot shows a person’s score on a sobriety test versus their blood alcohol content. Which statement is NOT true about this plot? a) An outlier is present in the dataset. b) A relationship exists between BAC and the test score. c) The relationship could be modeled with a straight line. d) There is a positive relationship between the two variables.

Relationships (answer) The following plot shows a person’s score on a sobriety test versus their blood alcohol content. Which statement is NOT true about this plot? a) An outlier is present in the dataset. b) A relationship exists between BAC and the test score. c) The relationship could be modeled with a straight line. d) There is a positive relationship between the two variables.

Conclusions The average height of people in the United States has been increasing for decades. Similarly there is evidence that the number of plant species is decreasing over these decades. An appropriate conclusion to draw from these observations would be that a) Even though they appear to be associated, we could not conclude association. b) Growing adults are causing the number of plant species to decrease. c) There is a positive relationship between the two variables.

Conclusions (answer) The average height of people in the United States has been increasing for decades. Similarly there is evidence that the number of plant species is decreasing over these decades. An appropriate conclusion to draw from these observations would be that a) Even though they appear to be associated, we could not conclude association. b) Growing adults are causing the number of plant species to decrease. c) There is a positive relationship between the two variables.