Inference for Regression

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Copyright © 2010 Pearson Education, Inc. Slide
Objectives (BPS chapter 24)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Inference for Regression BPS chapter 24 © 2006 W.H. Freeman and Company.
Inference for Regression BPS chapter 23 © 2010 W.H. Freeman and Company.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
We will use the 2012 AP Grade Conversion Chart for Saturday’s Mock Exam.
Inference for Linear Regression
Chapter 14: More About Regression
CHAPTER 12 More About Regression
23. Inference for regression
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Regression.
AP Statistics Chapter 14 Section 1.
Regression Inferential Methods
Inferences for Regression
Inference for Regression
Chapter 11: Simple Linear Regression
CHAPTER 12 More About Regression
Regression.
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 26: Inference for Regression
Chapter 12 Regression.
Linear Regression/Correlation
Regression.
Regression.
Ch. 15: Inference for Regression Part I AP STAT Ch. 15: Inference for Regression Part I EQ: How do you determine if there is a significant relationship.
BA 275 Quantitative Business Methods
Regression.
Regression Chapter 8.
Regression.
Correlation and Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Regression.
Linear Regression and Correlation
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
Linear Regression and Correlation
CHAPTER 12 More About Regression
Inferences for Regression
Inference for Regression
Presentation transcript:

Inference for Regression BPS 7e Chapter 26 © 2015 W. H. Freeman and Company

Linear Relationship True or False: Because the scatterplot shows a roughly linear (straight-line) pattern, the correlation describes the direction and strength of the relationship. True False

Linear Relationship (answer) True or False: Because the scatterplot shows a roughly linear (straight-line) pattern, the correlation describes the direction and strength of the relationship. True False

Conditions for Inference Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. What does  represent? the average velocity for a galaxy that is extremely close to Earth the change in mean velocity for a one-megaparsec increase in distance for Hubble’s sample of galaxies the slope of the least-squares regression line for Hubble’s sample the mean velocity for all galaxies in the universe the change in mean velocity for a one-megaparsec increase in distance for all galaxies in the universe

Conditions for Inference (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. What does  represent? the average velocity for a galaxy that is extremely close to Earth the change in mean velocity for a one-megaparsec increase in distance for Hubble’s sample of galaxies the slope of the least-squares regression line for Hubble’s sample the mean velocity for all galaxies in the universe the change in mean velocity for a one-megaparsec increase in distance for all galaxies in the universe

Conditions for Inference Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. To use regression inference, he also had to assume that the galaxy velocities were: exactly equal to my. Normally distributed around my with a different standard deviation for each value of x. Normally distributed around my with the same standard deviation for all values of x. Normally distributed around my with very small standard deviation.

Conditions for Inference (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. To use regression inference, he also had to assume that the galaxy velocities were: exactly equal to my. Normally distributed around my with a different standard deviation for each value of x. Normally distributed around my with the same standard deviation for all values of x. Normally distributed around my with very small standard deviation.

Conditions for Inference True or False: The standard deviation σ determines whether the points fall close to the population regression line (small σ) or are widely scattered (large σ). True False

Conditions for Inference (answer) True or False: The standard deviation σ determines whether the points fall close to the population regression line (small σ) or are widely scattered (large σ). True False

Conditions for Inference In an experiment, 20 batches of product were randomly assigned to four processing temperatures. It is believed that the mean yield has a straight-line relationship to temperature. Should regression inference be used to draw conclusions about the relationship between temperature and yield? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if yields are Normally distributed with the same standard deviation at each temperature yes, because it’s a designed experiment

Conditions for Inference (answer) In an experiment, 20 batches of product were randomly assigned to four processing temperatures. It is believed that the mean yield has a straight-line relationship to temperature. Should regression inference be used to draw conclusions about the relationship between temperature and yield? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if yields are Normally distributed with the same standard deviation at each temperature yes, because it’s a designed experiment

Conditions for Inference For an SRS of college students, eye color and height were determined. Should regression inference be used to draw conclusions about the relationship between eye color and height for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if eye colors are Normally distributed with the same standard deviation at each height yes, if mean eye color has a straight-line relationship to height

Conditions for Inference (answer) For an SRS of college students, eye color and height were determined. Should regression inference be used to draw conclusions about the relationship between eye color and height for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if eye colors are Normally distributed with the same standard deviation at each height yes, if mean eye color has a straight-line relationship to height

Conditions for Inference In an observational study, height and weight were measured for 30 men from 15 families (two brothers per family). Should regression inference be used to draw conclusions about the relationship between height and weight for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because it’s not a designed experiment yes, if heights are Normally distributed with the same standard deviation at each weight yes, if mean height has a straight-line relationship to height

Conditions for Inference (answer) In an observational study, height and weight were measured for 30 men from 15 families (two brothers per family). Should regression inference be used to draw conclusions about the relationship between height and weight for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because it’s not a designed experiment yes, if heights are Normally distributed with the same standard deviation at each weight. yes, if mean height has a straight line relationship to height

Estimating the Parameters Consider this scatterplot with its associated least-squares regression line. What is the best guess for s? 2 10 25 40

Estimating the Parameters (answer) Consider this scatterplot with its associated least-squares regression line. What is the best guess for s? 2 10 25 40

Estimating the Parameters If s = 10, is it also true that s = 10? yes, because s is an unbiased estimate of s yes, because s is the standard error no, because s cannot be a whole number no, but it’s a good guess of s

Estimating the Parameters (answer) If s = 10, is it also true that s = 10? yes, because s is an unbiased estimate of s yes, because s is the standard error no, because s cannot be a whole number no, but it’s a good guess of s

Estimating the Parameters Consider this scatterplot with its associated least-squares regression line. What is the best description of the slope of this line? the increase in mean blood pressure per year of increase in age for all middle-aged men the increase in mean blood pressure per year of increase in age for a sample of middle-aged men the mean blood pressure for all middle-aged men

Estimating the Parameters (answer) Consider this scatterplot with its associated least-squares regression line. What is the best description of the slope of this line? the increase in mean blood pressure per year of increase in age for all middle-aged men the increase in mean blood pressure per year of increase in age for a sample of middle-aged men the mean blood pressure for all middle-aged men

Estimating the Parameters Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the equation for the least-squares regression line? ŷ = 454.16 + 75.24 x ŷ = −40.78 + 83.44 x ŷ = 83.44 + 454.16 x ŷ = −40.78 + 454.16 x ŷ = 83.44 + 75.24 x

Estimating the Parameters (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the equation for the least-squares regression line? ŷ = 454.16 + 75.24 x ŷ = −40.78 + 83.44 x ŷ = 83.44 + 454.16 x ŷ = −40.78 + 454.16 x ŷ = 83.44 + 75.24 x

Estimating the Parameters The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers: ŷ = 6 + 1.5 x And the P-value is p < 0.001. For every one cigarette pack a person smokes a day, the model predicts that, on average, a factory worker will miss ___ day(s) from work. 6 1.5 7.2 4.8

Estimating the Parameters (answer) The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers. ŷ = 6 + 1.5 x And the P-value is p <0.001. For every one cigarette pack a person smokes a day, the model predicts that, on average, a factory worker will miss ___ day(s) from work. 6 1.5 7.2 4.8

Estimating the Parameters The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers. ŷ = 6 + 1.5 x And the P-value is p <0.001. A factory worker who smokes two packs a day is predicted to miss ____ day(s) from work annually. 6 1.5 7.2 9

Estimating the Parameters (answer) The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers. ŷ = 6 + 1.5 x And the P-value is p <0.001. A factory worker who smokes two packs a day is predicted to miss ____ day(s) from work annually. 6 1.5 7.2 9

Testing the Hypothesis Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. To test whether there is a positive relationship between distance and velocity, what hypotheses are used? H0: b = 0; Ha: b < 0 H0: a = 0; Ha: a ≠ 0 H0: b = 0; Ha: b > 0

Testing the Hypothesis (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. To test whether there is a positive relationship between distance and velocity, what hypotheses are used? H0: b = 0; Ha: b < 0 H0: a = 0; Ha: a ≠ 0 H0: b = 0; Ha: b > 0

Testing the Hypothesis Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What do you conclude regarding H0: b = 0 vs. Ha: b > 0? There is strong evidence that mean velocity increases as distance from Earth increases. There is not strong evidence that mean velocity increases as distance from Earth increases. There is strong evidence that mean velocity decreases as distance from Earth increases. There is strong evidence that mean velocity does not change as distance from Earth increases.

Testing the Hypothesis (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What do you conclude regarding H0: b = 0 vs. Ha: b > 0? There is strong evidence that mean velocity increases as distance from Earth increases. There is not strong evidence that mean velocity increases as distance from earth increases. There is strong evidence that mean velocity decreases as distance from Earth increases. There is strong evidence that mean velocity does not change as distance from Earth increases.

Lack of Correlation The population correlation is 0 when: b = 1. b = b. a = 0 and b = 1.

Lack of Correlation (answer) The population correlation is 0 when: b = 1. b = 0. b = b. a = 0 and b = 1.

Confidence Interval for the Slope Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He collected data on 24 galaxies. The formula for a confidence interval for the population slope b is b ± t* SEb What is the degrees of freedom for the critical value t* in this example? 20 21 22 23 24

Confidence Interval for the Slope (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He collected data on 24 galaxies. The formula for a confidence interval for the population slope b is b ± t* SEb What is the degrees of freedom for the critical value t* in this example? 20 21 22 23 24

Confidence Interval for the Slope Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the standard error of b? 83.4389 75.2371 −40.7836 454.2584

Confidence Interval for the Slope (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the standard error of b? 83.4389 75.2371 −40.7836 454.2584

Confidence Interval for the Slope Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is a 95% confidence interval for b? 454.1584 ± 75.2371 454.1584 ± (2.074) (75.2371) 454.1584 ± (6.0364) (75.2371) 454.1584 ± (75.2371 / 22)

Confidence Interval for the Slope (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is a 95% confidence interval for b? 454.1584 ± 75.2371 454.1584 ± (2.074) (75.2371) 454.1584 ± (6.0364) (75.2371) 454.1584 ± (75.2371 / 22)

Confidence Interval for the Slope Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A 95% confidence interval for  is (298.12, 610.20). What is the correct interpretation of this confidence interval? The change in recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec. The change in average recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec. With 95% confidence, the change in average recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec. The change in recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec for 95% of all galaxies.

Confidence Interval for the Slope (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A 95% confidence interval for  is (298.12, 610.20). What is the correct interpretation of this confidence interval? The change in recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec. The change in average recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec. With 95% confidence, the change in average recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec. The change in recession velocity per unit distance from Earth is between 298.12 and 610.20 km/sec/megaparsec for 95% of all galaxies.

Inference About Prediction Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: Therefore, 131.2941 = 59.0916 + 1.6045 × 45 is the: Estimate Std. Error t value Pr(>|t|) (Intercept) 59.0916 12.8163 4.611 <.0001 age 1.6045 0.2387 6.721 predicted mean systolic blood pressure for all 45-year-old men. predicted systolic blood pressure of a specific 45-year-old individual man. Both of the above

Inference About Prediction (answer) Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: Therefore, 131.2941 = 59.0916 + 1.6045 × 45 is the: Estimate Std. Error t value Pr(>|t|) (Intercept) 59.0916 12.8163 4.611 <.0001 age 1.6045 0.2387 6.721 predicted mean systolic blood pressure for all 45-year-old men. predicted systolic blood pressure of a specific 45-year-old individual man. Both of the above

Inference About Prediction Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: One of the following is the 95% confidence interval for my at 45 years old, and the other is the 95% prediction interval for y of a specific 45-year-old individual. Which is the confidence interval? Estimate Std. Error t value Pr(>|t|) (Intercept) 59.0916 12.8163 4.611 <.0001 age 1.6045 0.2387 6.721 131.2941 ± 2.042 × 2.5592 131.2941 ± 2.042 × 9.5927

Inference About Prediction (answer) Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: One of the following is the 95% confidence interval for my at 45 years old, and other is the 95% prediction interval for y of a specific 45-year-old individual. Which is the confidence interval? Estimate Std. Error t value Pr(>|t|) (Intercept) 59.0916 12.8163 4.611 <.0001 age 1.6045 0.2387 6.721 131.2941 ± 2.042 × 2.5592 131.2941 ± 2.042 × 9.5927

Inference About Prediction Linear regression of selling price on house size was carried out using data for a sample of recent sales. For a house of size 1500 ft2, the 95% prediction interval for its selling price is _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft2. wider than the same as narrower than not comparable with

Inference About Prediction (answer) Linear regression of selling price on house size was carried out using data for a sample of recent sales. For a house of size 1500 ft2, the 95% prediction interval for its selling price is _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft2. wider than the same as narrower than not comparable with

Checking the Conditions The following plot shows a person’s score on a sobriety test versus their blood alcohol content. Which statement is NOT true about this plot? An outlier is present in the dataset. A relationship exists between BAC and the test score. The relationship could be modeled with a straight line. There is a positive relationship between the two variables.

Checking the Conditions (answer) The following plot shows a person’s score on a sobriety test versus their blood alcohol content. Which statement is NOT true about this plot? An outlier is present in the dataset. A relationship exists between BAC and the test score. The relationship could be modeled with a straight line. There is a positive relationship between the two variables.

Checking the Conditions Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. Based on the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for regression inference? Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met. The residual plot implies that the data violate the assumption of Normality. The histogram of the residuals shows that the data are extremely right- skewed. Neither plot tells us anything about the assumptions for doing inference for regression. The residual plot implies that the data violate the assumption of linearity.

Checking the Conditions (answer) Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. Based on the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for regression inference? Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met. The residual plot implies that the data violate the assumption of Normality. The histogram of the residuals shows that the data are extremely right- skewed. Neither plot tells us anything about the assumptions for doing inference for regression. The residual plot implies that the data violate the assumption of linearity.