Download presentation
Presentation is loading. Please wait.
1
Inference for Regression
BPS 7e Chapter 26 © 2015 W. H. Freeman and Company
2
Linear Relationship True or False: Because the scatterplot shows a roughly linear (straight-line) pattern, the correlation describes the direction and strength of the relationship. True False
3
Linear Relationship (answer)
True or False: Because the scatterplot shows a roughly linear (straight-line) pattern, the correlation describes the direction and strength of the relationship. True False
4
Conditions for Inference
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. What does represent? the average velocity for a galaxy that is extremely close to Earth the change in mean velocity for a one-megaparsec increase in distance for Hubble’s sample of galaxies the slope of the least-squares regression line for Hubble’s sample the mean velocity for all galaxies in the universe the change in mean velocity for a one-megaparsec increase in distance for all galaxies in the universe
5
Conditions for Inference (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. What does represent? the average velocity for a galaxy that is extremely close to Earth the change in mean velocity for a one-megaparsec increase in distance for Hubble’s sample of galaxies the slope of the least-squares regression line for Hubble’s sample the mean velocity for all galaxies in the universe the change in mean velocity for a one-megaparsec increase in distance for all galaxies in the universe
6
Conditions for Inference
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. To use regression inference, he also had to assume that the galaxy velocities were: exactly equal to my. Normally distributed around my with a different standard deviation for each value of x. Normally distributed around my with the same standard deviation for all values of x. Normally distributed around my with very small standard deviation.
7
Conditions for Inference (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He assumed my = a + b x where x is the distance from Earth (megaparsecs) and my is the mean velocity (km/sec) for all galaxies at that distance. To use regression inference, he also had to assume that the galaxy velocities were: exactly equal to my. Normally distributed around my with a different standard deviation for each value of x. Normally distributed around my with the same standard deviation for all values of x. Normally distributed around my with very small standard deviation.
8
Conditions for Inference
True or False: The standard deviation σ determines whether the points fall close to the population regression line (small σ) or are widely scattered (large σ). True False
9
Conditions for Inference (answer)
True or False: The standard deviation σ determines whether the points fall close to the population regression line (small σ) or are widely scattered (large σ). True False
10
Conditions for Inference
In an experiment, 20 batches of product were randomly assigned to four processing temperatures. It is believed that the mean yield has a straight-line relationship to temperature. Should regression inference be used to draw conclusions about the relationship between temperature and yield? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if yields are Normally distributed with the same standard deviation at each temperature yes, because it’s a designed experiment
11
Conditions for Inference (answer)
In an experiment, 20 batches of product were randomly assigned to four processing temperatures. It is believed that the mean yield has a straight-line relationship to temperature. Should regression inference be used to draw conclusions about the relationship between temperature and yield? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if yields are Normally distributed with the same standard deviation at each temperature yes, because it’s a designed experiment
12
Conditions for Inference
For an SRS of college students, eye color and height were determined. Should regression inference be used to draw conclusions about the relationship between eye color and height for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if eye colors are Normally distributed with the same standard deviation at each height yes, if mean eye color has a straight-line relationship to height
13
Conditions for Inference (answer)
For an SRS of college students, eye color and height were determined. Should regression inference be used to draw conclusions about the relationship between eye color and height for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because one of the variables is categorical yes, if eye colors are Normally distributed with the same standard deviation at each height yes, if mean eye color has a straight-line relationship to height
14
Conditions for Inference
In an observational study, height and weight were measured for 30 men from 15 families (two brothers per family). Should regression inference be used to draw conclusions about the relationship between height and weight for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because it’s not a designed experiment yes, if heights are Normally distributed with the same standard deviation at each weight yes, if mean height has a straight-line relationship to height
15
Conditions for Inference (answer)
In an observational study, height and weight were measured for 30 men from 15 families (two brothers per family). Should regression inference be used to draw conclusions about the relationship between height and weight for this data set? no, because the responses are not independent no, because the slope and intercept are unknown no, because it’s not a designed experiment yes, if heights are Normally distributed with the same standard deviation at each weight. yes, if mean height has a straight line relationship to height
16
Estimating the Parameters
Consider this scatterplot with its associated least-squares regression line. What is the best guess for s? 2 10 25 40
17
Estimating the Parameters (answer)
Consider this scatterplot with its associated least-squares regression line. What is the best guess for s? 2 10 25 40
18
Estimating the Parameters
If s = 10, is it also true that s = 10? yes, because s is an unbiased estimate of s yes, because s is the standard error no, because s cannot be a whole number no, but it’s a good guess of s
19
Estimating the Parameters (answer)
If s = 10, is it also true that s = 10? yes, because s is an unbiased estimate of s yes, because s is the standard error no, because s cannot be a whole number no, but it’s a good guess of s
20
Estimating the Parameters
Consider this scatterplot with its associated least-squares regression line. What is the best description of the slope of this line? the increase in mean blood pressure per year of increase in age for all middle-aged men the increase in mean blood pressure per year of increase in age for a sample of middle-aged men the mean blood pressure for all middle-aged men
21
Estimating the Parameters (answer)
Consider this scatterplot with its associated least-squares regression line. What is the best description of the slope of this line? the increase in mean blood pressure per year of increase in age for all middle-aged men the increase in mean blood pressure per year of increase in age for a sample of middle-aged men the mean blood pressure for all middle-aged men
22
Estimating the Parameters
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the equation for the least-squares regression line? ŷ = x ŷ = − x ŷ = x ŷ = − x ŷ = x
23
Estimating the Parameters (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the equation for the least-squares regression line? ŷ = x ŷ = − x ŷ = x ŷ = − x ŷ = x
24
Estimating the Parameters
The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers: ŷ = x And the P-value is p < For every one cigarette pack a person smokes a day, the model predicts that, on average, a factory worker will miss ___ day(s) from work. 6 1.5 7.2 4.8
25
Estimating the Parameters (answer)
The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers. ŷ = x And the P-value is p <0.001. For every one cigarette pack a person smokes a day, the model predicts that, on average, a factory worker will miss ___ day(s) from work. 6 1.5 7.2 4.8
26
Estimating the Parameters
The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers. ŷ = x And the P-value is p <0.001. A factory worker who smokes two packs a day is predicted to miss ____ day(s) from work annually. 6 1.5 7.2 9
27
Estimating the Parameters (answer)
The following is the least-squares regression line that captures the relationship between smoking (number of packs a day) and number of days missed from work annually for factory workers. ŷ = x And the P-value is p <0.001. A factory worker who smokes two packs a day is predicted to miss ____ day(s) from work annually. 6 1.5 7.2 9
28
Testing the Hypothesis
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. To test whether there is a positive relationship between distance and velocity, what hypotheses are used? H0: b = 0; Ha: b < 0 H0: a = 0; Ha: a ≠ 0 H0: b = 0; Ha: b > 0
29
Testing the Hypothesis (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. To test whether there is a positive relationship between distance and velocity, what hypotheses are used? H0: b = 0; Ha: b < 0 H0: a = 0; Ha: a ≠ 0 H0: b = 0; Ha: b > 0
30
Testing the Hypothesis
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What do you conclude regarding H0: b = 0 vs. Ha: b > 0? There is strong evidence that mean velocity increases as distance from Earth increases. There is not strong evidence that mean velocity increases as distance from Earth increases. There is strong evidence that mean velocity decreases as distance from Earth increases. There is strong evidence that mean velocity does not change as distance from Earth increases.
31
Testing the Hypothesis (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What do you conclude regarding H0: b = 0 vs. Ha: b > 0? There is strong evidence that mean velocity increases as distance from Earth increases. There is not strong evidence that mean velocity increases as distance from earth increases. There is strong evidence that mean velocity decreases as distance from Earth increases. There is strong evidence that mean velocity does not change as distance from Earth increases.
32
Lack of Correlation The population correlation is 0 when: b = 1.
b = b. a = 0 and b = 1.
33
Lack of Correlation (answer)
The population correlation is 0 when: b = 1. b = 0. b = b. a = 0 and b = 1.
34
Confidence Interval for the Slope
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He collected data on 24 galaxies. The formula for a confidence interval for the population slope b is b ± t* SEb What is the degrees of freedom for the critical value t* in this example? 20 21 22 23 24
35
Confidence Interval for the Slope (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. He collected data on 24 galaxies. The formula for a confidence interval for the population slope b is b ± t* SEb What is the degrees of freedom for the critical value t* in this example? 20 21 22 23 24
36
Confidence Interval for the Slope
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the standard error of b? −
37
Confidence Interval for the Slope (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is the standard error of b? −
38
Confidence Interval for the Slope
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is a 95% confidence interval for b? ± ± (2.074) ( ) ± (6.0364) ( ) ± ( / 22)
39
Confidence Interval for the Slope (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A regression program gave this output: What is a 95% confidence interval for b? ± ± (2.074) ( ) ± (6.0364) ( ) ± ( / 22)
40
Confidence Interval for the Slope
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A 95% confidence interval for is (298.12, ). What is the correct interpretation of this confidence interval? The change in recession velocity per unit distance from Earth is between and km/sec/megaparsec. The change in average recession velocity per unit distance from Earth is between and km/sec/megaparsec. With 95% confidence, the change in average recession velocity per unit distance from Earth is between and km/sec/megaparsec. The change in recession velocity per unit distance from Earth is between and km/sec/megaparsec for 95% of all galaxies.
41
Confidence Interval for the Slope (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. A 95% confidence interval for is (298.12, ). What is the correct interpretation of this confidence interval? The change in recession velocity per unit distance from Earth is between and km/sec/megaparsec. The change in average recession velocity per unit distance from Earth is between and km/sec/megaparsec. With 95% confidence, the change in average recession velocity per unit distance from Earth is between and km/sec/megaparsec. The change in recession velocity per unit distance from Earth is between and km/sec/megaparsec for 95% of all galaxies.
42
Inference About Prediction
Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: Therefore, = × 45 is the: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.611 <.0001 age 1.6045 0.2387 6.721 predicted mean systolic blood pressure for all 45-year-old men. predicted systolic blood pressure of a specific 45-year-old individual man. Both of the above
43
Inference About Prediction (answer)
Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: Therefore, = × 45 is the: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.611 <.0001 age 1.6045 0.2387 6.721 predicted mean systolic blood pressure for all 45-year-old men. predicted systolic blood pressure of a specific 45-year-old individual man. Both of the above
44
Inference About Prediction
Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: One of the following is the 95% confidence interval for my at 45 years old, and the other is the 95% prediction interval for y of a specific 45-year-old individual. Which is the confidence interval? Estimate Std. Error t value Pr(>|t|) (Intercept) 4.611 <.0001 age 1.6045 0.2387 6.721 ± × ± ×
45
Inference About Prediction (answer)
Here is regression output for the relationship between systolic blood pressure and age for a sample of middle-aged men: One of the following is the 95% confidence interval for my at 45 years old, and other is the 95% prediction interval for y of a specific 45-year-old individual. Which is the confidence interval? Estimate Std. Error t value Pr(>|t|) (Intercept) 4.611 <.0001 age 1.6045 0.2387 6.721 ± × ± ×
46
Inference About Prediction
Linear regression of selling price on house size was carried out using data for a sample of recent sales. For a house of size 1500 ft2, the 95% prediction interval for its selling price is _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft2. wider than the same as narrower than not comparable with
47
Inference About Prediction (answer)
Linear regression of selling price on house size was carried out using data for a sample of recent sales. For a house of size 1500 ft2, the 95% prediction interval for its selling price is _________ the 95% confidence interval for the average selling price of all homes that are 1500 ft2. wider than the same as narrower than not comparable with
48
Checking the Conditions
The following plot shows a person’s score on a sobriety test versus their blood alcohol content. Which statement is NOT true about this plot? An outlier is present in the dataset. A relationship exists between BAC and the test score. The relationship could be modeled with a straight line. There is a positive relationship between the two variables.
49
Checking the Conditions (answer)
The following plot shows a person’s score on a sobriety test versus their blood alcohol content. Which statement is NOT true about this plot? An outlier is present in the dataset. A relationship exists between BAC and the test score. The relationship could be modeled with a straight line. There is a positive relationship between the two variables.
50
Checking the Conditions
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. Based on the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for regression inference? Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met. The residual plot implies that the data violate the assumption of Normality. The histogram of the residuals shows that the data are extremely right- skewed. Neither plot tells us anything about the assumptions for doing inference for regression. The residual plot implies that the data violate the assumption of linearity.
51
Checking the Conditions (answer)
Edwin Hubble collected data on the relationship between the distance a galaxy is from Earth and the velocity with which it is receding. Based on the following residual plot and histogram of the residuals, what conclusion should be made about the conditions for regression inference? Because the residual plot shows no pattern and the histogram is approximately bell-shaped, the conditions are met. The residual plot implies that the data violate the assumption of Normality. The histogram of the residuals shows that the data are extremely right- skewed. Neither plot tells us anything about the assumptions for doing inference for regression. The residual plot implies that the data violate the assumption of linearity.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.