Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference in Linear Models

Similar presentations


Presentation on theme: "Inference in Linear Models"β€” Presentation transcript:

1 Inference in Linear Models
Section 13.1

2 Objectives State the assumptions of the linear model
Check the assumptions of the linear model Construct confidence intervals for the slope Test hypotheses about the slope

3 State the assumptions of the linear model
Objective 1 State the assumptions of the linear model

4 Least-Squares Regression Line
The following table presents the number of calories and grams of fat per 100 grams of product for a sample of 18 candy products. Product Fat (𝒙) Calories (π’š) 3 Musketeers 12.75 436 Mr. Goodbar 33.21 538 Kit Kat 25.99 518 100 Grand 19.33 468 M&M Plain 21.13 492 Baby Ruth 21.60 459 M&M Peanut 26.13 515 Bit O’Honey 7.50 375 Milky Way 17.23 456 Butterfinger 18.90 Skittles 4.37 405 Oh Henry! 23.00 462 Snickers 23.85 491 Reese’s Pieces 24.77 497 Starburst 8.36 408 Tootsie Roll 3.31 387 Twix 24.85 502 Twizzlers 2.32 350 We may find the least-squares regression line using technology. 𝑦 = π‘₯

5 Linear Model When the points on a scatterplot are a random sample from a population, we can imagine plotting every point in the population on a scatterplot. Then, if certain assumptions are met, we say that the population follows a linear model. The intercept 𝑏 0 and the slope 𝑏 1 of the least-squares regression line are then estimates of a population intercept 𝛽 0 and a population slope 𝛽 1 . We cannot determine the exact values of 𝛽 0 and 𝛽 1 , because we cannot observe the entire population. However, we can use the sample points to construct confidence intervals and test hypotheses about 𝛽 0 and 𝛽 1 . We will focus on 𝛽 1 .

6 Assumptions for the Linear Model
The mean of the 𝑦-values within a strip is denoted πœ‡ 𝑦|π‘₯ . As π‘₯ varies, the values πœ‡ 𝑦|π‘₯ follow a straight line: πœ‡ 𝑦|π‘₯ = 𝛽 0 + 𝛽 1 π‘₯. The amount of vertical spread is approximately the same in each strip, except perhaps near the ends. The 𝑦-values within a strip are approximately normally distributed. This is assumption is not necessary if the sample size is large (𝑛 > 30).

7 Linear Model Equation When the assumptions of the linear model hold, the points (π‘₯,𝑦) satisfy the following linear model equation: 𝑦= 𝛽 0 + 𝛽 1 π‘₯+πœ€ where 𝛽 0 is the 𝑦-intercept 𝛽 1 is the slope of the line πœ€ is a random error. The 𝑦-intercept 𝑏 0 and the slope 𝑏 1 of the least-squares line are estimates of 𝛽 0 and 𝛽 1 .

8 Check the assumptions of the linear model
Objective 2 Check the assumptions of the linear model

9 Checking Assumptions In practice, we do not see the entire population, so we must use the sample to check that the assumptions are satisfied. This is done with a residual plot. Given a point (π‘₯, 𝑦) and the least-squares regression line 𝑦 = 𝑏 0 + 𝑏 1 π‘₯, the residual for the point (π‘₯, 𝑦) is the difference between the observed value 𝑦 and the predicted value 𝑦 : Residual = 𝑦 βˆ’ 𝑦 A residual plot is a plot in which the residuals are plotted against the values of the explanatory variable π‘₯. In other words, the points on the residual plot are (π‘₯, 𝑦 βˆ’ 𝑦 ). Following is the residual plot for the candy products data. Fat (𝒙) Calories (π’š) 12.75 436 33.21 538 25.99 518 19.33 468 21.13 492 21.60 459 26.13 515 7.50 375 17.23 456 18.90 4.37 405 23.00 462 23.85 491 24.77 497 8.36 408 3.31 387 24.85 502 2.32 350

10 Conditions for the Residual Plots
The residual plot must satisfy the following conditions in order for the linear model assumptions to be satisfied: The residual plot must not exhibit an obvious pattern. The vertical spread of the points in the residual plot must be roughly the same across the plot. There must be no outliers. The residual plot of the candy products does satisfy all three conditions. We may assume that the linear model is valid, and we may construct confidence intervals and test hypotheses.

11 Construct confidence intervals for the slope
Objective 3 Construct confidence intervals for the slope

12 Constructing Confidence Intervals for the Slope
The slope 𝑏 1 of the least-squares regression line is a point estimate of the population slope 𝛽 1 . When the assumptions of the linear model are satisfied, we can construct a confidence interval for 𝛽 1 . We need a point estimate, a standard error, and the critical value. To compute the standard error of 𝑏 1 , we first compute a quantity called the residual standard deviation. The residual standard deviation, denoted 𝑠 𝑒 , measures the spread of the points on the scatterplot around the least-squares regression line. The formula for the residual standard deviation is given by 𝑠 𝑒 = π‘¦βˆ’ 𝑦 2 π‘›βˆ’2 . Once this is computed, we can find the standard error of 𝑏 1 using the formula 𝑠 𝑏 = 𝑠 𝑒 π‘₯βˆ’ π‘₯ 2 .

13 Example – Compute Residual Standard Deviation
Compute the residual standard deviation for the candy product data. Product Fat (𝒙) Calories (π’š) 3 Musketeers 12.75 436 Mr. Goodbar 33.21 538 Kit Kat 25.99 518 100 Grand 19.33 468 M&M Plain 21.13 492 Baby Ruth 21.60 459 M&M Peanut 26.13 515 Bit O’Honey 7.50 375 Milky Way 17.23 456 Butterfinger 18.90 Skittles 4.37 405 Oh Henry! 23.00 462 Snickers 23.85 491 Reese’s Pieces 24.77 497 Starburst 8.36 408 Tootsie Roll 3.31 387 Twix 24.85 502 Twizzlers 2.32 350

14 Solution We compute the residuals π‘¦βˆ’ 𝑦 , and the sum of squared residuals π‘¦βˆ’ 𝑦 2 = The details are shown as follows:

15 Solution Next, we substitute the number of points 𝑛 = 18, and the value π‘¦βˆ’ 𝑦 2 = into the formula for 𝑠 𝑒 : 𝑠 𝑒 = π‘¦βˆ’ 𝑦 2 π‘›βˆ’2 = βˆ’2 =

16 Example – Compute Standard Error
Compute the standard error for the candy product data. Product Fat (𝒙) Calories (π’š) 3 Musketeers 12.75 436 Mr. Goodbar 33.21 538 Kit Kat 25.99 518 100 Grand 19.33 468 M&M Plain 21.13 492 Baby Ruth 21.60 459 M&M Peanut 26.13 515 Bit O’Honey 7.50 375 Milky Way 17.23 456 Butterfinger 18.90 Skittles 4.37 405 Oh Henry! 23.00 462 Snickers 23.85 491 Reese’s Pieces 24.77 497 Starburst 8.36 408 Tootsie Roll 3.31 387 Twix 24.85 502 Twizzlers 2.32 350

17 Solution The standard error is
We’ve calculated 𝑠 𝑒 = π‘¦βˆ’ 𝑦 π‘›βˆ’2 = βˆ’2 = Next, we find that π‘₯βˆ’ π‘₯ 2 = The calculations are in the table: The standard error is 𝑠 𝑏 = 𝑠 𝑒 π‘₯βˆ’ π‘₯ = =

18 Confidence Interval for 𝛽 1
Under the assumptions of the linear model, the quantity 𝑏 1 βˆ’ 𝛽 1 𝑠 𝑏 has a Student’s 𝑑 distribution with π‘›βˆ’2 degrees of freedom. Therefore, the critical value for a level 100 1βˆ’π›Ό % confidence interval is the value 𝑑 𝛼 2 for which the area under the 𝑑 curve with π‘›βˆ’2 degrees of freedom between βˆ’ 𝑑 𝛼 2 and 𝑑 𝛼 2 is 1βˆ’π›Ό. The margin of error for a level 100 1βˆ’π›Ό % confidence interval is Margin of error = 𝑑 𝛼 2 βˆ™ 𝑠 𝑏 So, a level 100 1βˆ’π›Ό % confidence interval for 𝛽 1 is 𝑏 1 Β± 𝑑 𝛼 2 βˆ™ 𝑠 𝑏 . In other words, the confidence interval is given as: 𝑏 1 βˆ’ 𝑑 𝛼 2 βˆ™ 𝑠 𝑏 < 𝛽 1 < 𝑏 1 + 𝑑 𝛼 2 βˆ™ 𝑠 𝑏

19 Example – Construct Confidence Interval
Compute a 95% confidence interval for the slope 𝛽 1 for the candy product data. Solution: The assumptions have already been checked. The least-squares regression line is 𝑦 = π‘₯, so the point estimate is 𝑏 1 = The residual standard deviation was computed as 𝑠 𝑒 = The standard error of 𝑏 1 was computed as 𝑠 𝑏 = There are 𝑛 – 2 = 18 – 2 = 16 degrees of freedom and from Table A.3 the critical value for 95% confidence interval is 𝑑 𝛼 2 = The margin of error is 𝑑 𝛼 2 βˆ™ 𝑠 𝑏 =2.120βˆ™ = The 95% confidence interval is Point estimate Β± margin of error Β± <𝛽 1 < We are 95% confident that the mean difference in calories for items that differ by one gram in fat content is between and

20 Confidence Intervals on the TI-84 PLUS
The LinRegTInt command will construct a confidence interval for the slope of the least-squares regression line. This command is accessed by pressing STAT and highlighting the TESTS menu. We must first enter the π‘₯ and 𝑦-values into the data editor and indicate these locations in the Xlist and Ylist fields.

21 Example (TI-84 PLUS) Construct a 95% confidence interval for the slope 𝛽 1 for the candy product data. Solution: We enter the π‘₯-values into L1 and the 𝑦-values into L2. Press STAT and highlight the TESTS menu and select LinRegTInt . Enter L1 in the Xlist field and L2 in the Ylist field. Enter 0.95 in the C-Level field. Select Calculate. The confidence interval is (4.8182, ). We are 95% confident that the mean difference in calories for items that differ by one gram in fat content is between and

22 Test hypotheses about the slope
Objective 4 Test hypotheses about the slope

23 Testing Hypotheses About the Slope
We can use the values of 𝑏 1 and 𝑠 𝑏 to test hypotheses about the population slope 𝛽 1 . If 𝛽 1 = 0, then there is no linear relationship between the explanatory variable π‘₯ and the outcome variable 𝑦. For this reason, the null hypothesis most often tested is 𝐻 π‘œ : 𝛽 1 =0. If this null hypothesis is rejected, we conclude that there is a linear relationship between π‘₯ and 𝑦, and that the explanatory variable π‘₯ is useful in predicting the outcome variable 𝑦. Because the quantity 𝑏 1 βˆ’ 𝛽 1 𝑠 𝑏 has a Student’s 𝑑 distribution with 𝑛 – 2 degrees of freedom, we can construct the test statistic for testing 𝐻 π‘œ : 𝛽 1 =0 by setting 𝛽 1 = 0. The test statistic is 𝑑= 𝑏 1 𝑠 𝑏 .

24 Procedure for Testing 𝐻 π‘œ : 𝛽 1 = 0
Step 1: Compute the least-squares regression line. Verify that the assumptions of the linear model are satisfied. Step 2: State the null and alternate hypotheses. Step 3: If making a decision, choose a significance level 𝛼. Step 4: Compute the standard error of the slope 𝑠 𝑏 . Step 5: Compute the value of the test statistic 𝑑= 𝑏 1 𝑠 𝑏 and the number of degrees of freedom 𝑛 – 2.

25 The P-value Method Before we can proceed further with the procedure for testing 𝐻 π‘œ : 𝛽 1 =0, we must decide between the P-value method and the critical value method. If we select the P-value method, then the procedure for steps 6 – 8 are Step 6: Compute the P-value of the test statistic. Step 7: Interpret the P-value. If making a decision, reject 𝐻 0 if the P-value is less than or equal to the significance level 𝛼. Step 8: State a conclusion.

26 The Critical Value Method
If we select the critical value method, the procedure for steps are Step 6: Find the critical value. Step 7: Determine whether to reject 𝐻 π‘œ , as follows: Left-Tailed: Reject if 𝑑 β‰€βˆ’ 𝑑 𝛼 Right-Tailed: Reject if 𝑑 β‰₯𝑑 𝛼 Two-Tailed: Reject if 𝑑 β‰₯𝑑 𝛼 2 or 𝑑 β‰€βˆ’ 𝑑 𝛼 2 Step 8: State a conclusion.

27 Example: P-Value Method
Perform a test of 𝐻 0 : 𝛽 1 =0 vs. 𝐻 1 : 𝛽 1 >0 on the candy product data using the P-value method. Use the 𝛼 = 0.05 level of significance. Solution: The assumptions have already been checked. The least-squares regression line is 𝑦 = π‘₯, so 𝑏 1 = The standard error of 𝑏 1 was computed as 𝑠 𝑏 = The test statistic is 𝑑= 𝑏 1 𝑠 𝑏 = = We may use technology to compute the P-value to be P = Because P < 0.05, we reject 𝐻 0 and conclude that 𝛽 1 > 0 . There is a linear relationship between the amount of fat and the number of calories in candy products. Since we conclude that 𝛽 1 > 0, we conclude that products with more fat tend to have more calories.

28 Example: Critical Value Method
Perform a test of 𝐻 0 : 𝛽 1 =0 vs. 𝐻 1 : 𝛽 1 >0 on the candy product data using the critical-value method. Use the 𝛼 = 0.05 level of significance. Solution: The assumptions have already been checked. The least-squares regression line is 𝑦 = π‘₯, so 𝑏 1 = The standard error of 𝑏 1 was computed as 𝑠 𝑏 = The test statistic is 𝑑= 𝑏 1 𝑠 𝑏 = = This is a right-tailed test, so the critical value is the value 𝑑 𝛼 for which the area to the right is 𝛼 = We use Table A.3 with 16 degrees of freedom. The critical value is 𝑑 𝛼 = Because 𝑑> 𝑑 𝛼 , we reject 𝐻 0 and conclude that 𝛽 1 > 0 . There is a linear relationship between the amount of fat and the number of calories in candy products. Since we conclude that 𝛽 1 > 0, we conclude that products with more fat tend to have more calories.

29 Hypothesis Testing on the TI-84 PLUS
The LinRegTTest command will perform a hypothesis test about the slope of the least-squares regression line. This command is accessed by pressing STAT and highlighting the TESTS menu. We must first enter the π‘₯ and 𝑦-values into the data editor and indicate these locations in the Xlist and Ylist fields.

30 Example (TI-84 PLUS) Perform a test of 𝐻 π‘œ : 𝛽 1 = 0 vs. 𝐻 1 : 𝛽 1 > 0 on the candy data. Use the 𝛼 = 0.05 level of significance. Solution: We enter the π‘₯-values into L1 and the 𝑦-values into L2. Press STAT and highlight the TESTS menu and select LinRegTTest. Enter L1 in the Xlist field and L2 in the Ylist field. Since this is a right-tailed test, select the >0 option. Select Calculate. The P-value is P = Since P < 0.05, we reject 𝐻 0 . We conclude that 𝛽 1 > 0 and that there is a linear relationship between the amount of fat and the number of calories in candy products. Since we conclude that 𝛽 1 > 0, we conclude that products with more fat tend to have more calories.

31 Testing Correlation Recall that a sample correlation coefficient, r, can be computed for a sample. If we knew the entire population and computed correlation from it, we would obtain the population correlation, which is denoted by the Greek letter 𝜌 (rho). The correlation measures the strength of the linear relationship between two variables. The population correlation 𝜌 and the population slope 𝛽 1 always have the same sign. In particular, whenever one of them is equal to 0, the other is equal to 0 as well. For this reason, a test of the hypothesis 𝛽 1 =0 is also a test of the hypothesis 𝜌=0.

32 You Should Know… The assumptions of the linear model
How to check the assumptions of the linear model using residual plots How to construct confidence intervals for the slope How to test hypotheses about the slope


Download ppt "Inference in Linear Models"

Similar presentations


Ads by Google