Download presentation
Presentation is loading. Please wait.
Published byColleen Willis Modified over 5 years ago
1
Statistical inference for the slope and intercept in SLR
In this topic, we first study the parameter beta0 and beta1 in the regression model, and learn how to compute the confidence interval and perform hypothesis test about them.
2
Yi = Ξ²0 + Ξ²1Xi + πi π π for i = 1, 2, . . . , n Ξ²0 is the intercept.
Simple Linear Regression Model Yi = Ξ²0 + Ξ²1Xi + πi for i = 1, 2, , n Simple Linear Regression Model Parameters Ξ²0 is the intercept. Ξ²1 is the slope. π π are independent, normally distributed random errors with mean 0 and variance Ο2, βΌ N (0, Ο2) Throughout this topic and the remainder of Simple Linear Regression topics (unless otherwise stated), we assume that the normal error regression model is applicable. In this regression model, beta0 and beta1 are parameters that defines a linear function, Xi are known constants, and the random error epsilon are independent and follows Normal distribution. π π iid
3
Inference for the slope, Ξ²1
Recall that, b 1 = Ξ£ X i β π π π β π Ξ£ π π β π 2 which we can rewrite as, =Ξ£ c i Y i β π =Ξ£ c i Y i β π Ξ£ π π =Ξ£ π π π π ~ Normal where π π = X i β π Ξ£ π π β π 2 It can be proved that, πΈ π 1 = π½ 1 πππ πππ π 1 = π 2 1 Ξ£ π π β π (denoted by π 2 π 1 ), therefore By replacing the parameter π 2 with πππΈ, the unbiased estimator of π 2 π 1 , we obtain the point estimator The point estimator b1 was given in topic 1 and shown here. b1 is a linear combination of the response variable Yi, and Yi has normal distribution. So b1 is also normally distributed. This sampling distribution of b1 describes the different values of b1 obtained during repeated sampling while X is kept as the same value. The mean of b1 is beta1 And the variance of b1 is sigma square divided by SSX. By replacing the parameter sigma2 with MSE, we obtain the point estimates. π 2 π 1 = πππΈ Ξ£ π π β π 2 π π 1 = πππΈ Ξ£ π π β π 2
4
π β = π π βπ· π π π π ~π(π§βπ) π π βπ· π π π π ~π(π,π)
Sampling distribution of ππβ π· π π π π , denoted by π‘ β π β = π π βπ· π π π π ~π(π§βπ) π π βπ· π π π π ~π(π,π) There are n β 2 degrees of freedom because 2 parameters must be estimated to obtain the numerator for s2: SSE = Ξ£ [Yi β (b 0 β b1Xi)]2 Since b1 is normally distributed, we know that the standardized statistic b1-beta1 over sigma b1 is a standard normal variable. Ordinarily, we need to estimate sigma b1 by s b1, the standard error, and hence are interested in the distribution of b1-beta1 over s b1 as shown. When a statistic is standardized but the denominator is an estimated standard deviation rather than the true standard deviation, it is called a studentized statistic, or t. t follows a t distribution with a df of n-2. Two degree of freedom are lost here because two parameters (beta0 and beta1) need to be estimated first.
5
Confidence Interval for the slope Ξ²1
Since π‘ β = π 1 βπ½ 1 π π 1 ~ π‘(πβ2) π π‘ πΌ 2 ;πβ2 β€ π 1 βπ½ 1 π π 1 β€π‘ 1β πΌ 2 ;πβ2 =1βπΌ Where π‘( πΌ 2 ;πβ2) denotes the πΌ percentile of the t distribution with πβ2 degrees of freedom. Because of the symmetry of the π‘ distribution around its mean 0, it follows that: π‘ πΌ 2 ;πβ2 =βπ‘(1β πΌ 2 ;πβ2) Since the test statistic follows a t distribution, we can make the following probability statement. Where π‘( πΌ 2 ;πβ2) denotes the πΌ percentile of the t distribution with πβ2 degrees of freedom. Because of the symmetry of the π‘ distribution around its mean 0, it follows that the upper percentile and lower percentile are the same value, one is positive and the other is negative. π‘ πΌ 2 ;πβ2 =βπ‘(1β πΌ 2 ;πβ2) Rearranging the inequalities in the probability statement, we obtain the formula of confidence interval of beta1. In general, the confidence interval is Point estimate Β± Margin error, where Margin error (denoted by ME) = t * standard error Hence the 1βπΌ confidence interval for π½ 1 are: π π Β±π πβ πΆ π ;πβπ π{ π π } Point estimate Β± Margin error, where Margin error (denoted by ME) = t * standard error
6
Significance Tests for π· π
π»π: π½ 1 = π½ 1 β π»π: π½ 1 β π½ 1 β The test statistic π‘ β = (π 1 β π½ 1 β )/π { π 1 }~π‘(πβ2) For two sided test Reject H0 if | tβ| β₯ tc, tc = tnβ2(1 β Ξ±/2) Or, reject H0 if πβπ£πππ’π β€ Ξ± For one sided test Since the test statistic follows a t distribution, the test concerning beta1 is a regular t test, and should have been covered in your previous statistical course. Here I assume everyone has a good understanding on how to perform the test., and just present the major reject rules here. Reject H0 if | tβ| β₯ tc, tc = tnβ2(1 β Ξ±) Or, reject H0 if πβπ£πππ’π β€ Ξ±
7
Inference for the intercept, Ξ²0
π 0 = π β π 1 π It can be proved that, πΈ π 0 = π½ 0 πππ πππ π 0 = π 2 [ 1 π + π 2 Ξ£ π π β π 2 ] (denoted by π 2 π 0 ), therefore By replacing the parameter π 2 with πππΈ, the unbiased estimator of π 2 π 0 , we obtain the point estimator π 2 π 0 =πππΈ[ 1 π + π 2 Ξ£ π π β π 2 ] π π 0 = πππΈ[ 1 π + π 2 Ξ£ π π β π 2 ] Now letβs switch to the intercept, beta0. The point estimate of bo is the average of y minus b1 times the average of X. Its sampling distribution is the different values of b0 that would be obtained with repeated sampling with one X value. The mean is the true beta0, and the variance is also related to the residual variance, sigma square, as shown here. Like the slope term b1, bo is also an unbiased estimator. By replacing the parameter sigma square with MSE, we get the standard error, denoted by s of b0. Analogous to the theorem for b1, we use the similar t test to get the inference for the intercept, beta0. Analogous to theorem for π 1 , π‘ β = (π 0 β π½ 0 )/π { π 0 } ~ π‘(πβ2)
8
Confidence Interval for Ξ²0
π 0 Β±π‘ 1β πΌ 2 ;πβ2 π { π 0 } Significance Tests for π· π π»π: π½ 0 = π½ 0 β π»π: π½ 0 β π½ 0 β The test statistic π‘ β = (π 0 β π½ 0 β )/π { π 0 }~π‘(πβ2)
9
Comments on the inference assumptions
Both π 1 and π 0 follow Normal distribution because they are based on Yi, which are themselves independent and normally distributed. As long as the Yi are close to normal, inferences (CIs and hypothesis tests) based on the t distribution will be approximately correct, even with small sample sizes. In general, the CLT ensures that b 0 and b 1 are asymptotically normal as long as the random errors are independently and identically distributed (iid). Therefore, inferences based on the t distribution will be approximately correct as long as n is large enough. π 1 = π΄ π π π π π 0 = π β π 1 π That is: Y has a symmetric distribution without outliers That is, when Y follows any form of distribution Regarding the assumptions or limitations on performing confidence interval and hypothesis test on the parameter beta0 and beta1. The sampling distribution of b0 and b1 are both normal since they are computed from Y. When the sample size is small, as long as Y is close to normal, the T-method will be approximately correct. [B] Note that the requirement βclose to normalβ means Y may not be a normal distribution, but at least it should has a symmetric distribution with no outliers. When the sample size is big, b0 and b1 are asymptotically normal as long as the random errors are independently and identically distributed. [B] This means that when Y is not close to Normal, with skew pattern and outliers, you will need a big sample size to ensure the T method is appropriate. There is no rule for how big it is, n of 25 to 40 is a good starting point in general case for using the T test.
10
Comments on the inference assumptions
Often, the value of the intercept is not of direct interest, so there is no need to calculate CIs or hypothesis tests on Ξ²0. Because it is just a single value of Y when X=0 and will be of no much value to predict other Y values. Caution again: the linear regression model might not be appropriate when the scope of the model is extended to X=0 π 2 1 Ξ£ π π β π , Because π 2 π 1 = we can increase the precision of the estimator, i.e., reduce this sigma by increasing the dilation in X, i.e., bigger Ξ£ π π β π 2 Now that you have the basic theory layout, I want to mention some practical issues. In linear model, slope typically means the changes in Y when X changes, For example, in the diamondβs case, b1=3721, for every one more carat, the price will go up $3721 on average. The intercept, on the other hand, is usually of no direct interest since it means a single value of Y when X equals 0. Think about βwhat is the price when carat=0β? . [B] the linear regression model might not be appropriate when the X scope of model is extended to 0 Second, because the variance of residual of the slope is sigma square over SSX. One way to increase precision of the estimator is to reduce the variance of residual, or to increase the SSX. [B] To do this, when we design an experiment, collect X variables in a wider and random range. For example, if we could only study 100 diamond rings, try collect different weights of diamond ring rather than many rings with similar weight. The precision of the estimators also depends on other issues such as sample size and number of parameters. Collect X variables in a wider and random range The precision of the estimators also depends on the difference between sample size and the number of functional parameters (Ξ²s) to be estimated.
11
The diamond weight and price example: Confidence interval for the slope π· π
π 1 Β±π‘ 1β πΌ 2 ;πβ2 π { π 1 } Where πΌ=0.05, π=48 From lm output Now compute the confidence interval and hypothesis test for the slope beta1. We will show both how to do it from R and by hand. In the diamond example, b1 is estimated be 3721 and residual standard error s to be Use a significant level of 0.05, As shown in the lower right codes, the confint function compute the confidence level in R. The first function parameter is the lm model; the second is your target parameter, here we want to compute the confidence interval for the slope, i.e., weight. By default, it will show you both confidence interval for the intercept and slope. The third function parameter is the confidence level, or 1-alpha. From confint output
12
Where πΌ=0.05, π=48, ππ=46 πππ’ππ πππ€π π‘π 40
Now see how to compute by hand with the t table. When using t table, we donβt always have the degree of freedom, or pvalues, and need to estimate. For example, the df of 46 is not available, we need to round down to the closest value, 40. The reason of using a smaller df is to have a larger t value and wider interval, and be precise. In the case, the t value for 40 df is We can use this value to computer the confidence interval. In R, the qt funciton gives the t value for 46 df, You can use boh values in the homework. But only use the t table in the exam. Or use R
13
The diamond weight and price example: Confidence interval for the slope π· π
π 1 Β±π‘ 1β πΌ 2 ;πβ2 π { π 1 } Where πΌ=0.05, π=48 =3721Β±π.πππ (81.79) = , From lm output π΄πΊπ¬= π π = ππ.ππ π =ππππ.π π π 1 = πππΈ Ξ£ π π β π = π π 2 πβ1 = β1 = =81.7 We now know the b1 is 3721, t value is or (t able). The last thing is the standard error of the b1. In the R output, the standard error is provided as You can compute it with the formula, using MSE of 1014 and SSX of Note that here I also show how to computer SSX from the standard deviation, SSX= Sx square times (n-1), where Sx is the usual standard deviation of X. This is a useful trick especially for the exam. In case you forgot from your past stat course, I suggest you copy this onto the cheat sheet. [B] Finally, the confidence interval to be and , the average price increases by at least 3556 and at most 3889 dollars when the diamond is 1 carat heavier. From confint output Conclusion: we are 95% confident that, the average price will increase by at least 3556 and at most 3889 when the weight increase by 1 carat
14
The diamond weight and price example: hypothesis test for the slope π· π
π»π: π½ 1 =0 π£π π»π: π½ 1 β 0 πβπ π‘ππ π‘ π π‘ππ‘ππ π‘ππ: π‘ π = π 1 β0 π π 1 = 3721β =45.5 πβπ π π£πππ’π=2π π>45.5 <0.0001, or <0.001 using T table πππππ π π£πππ’π<π‘βπ π ππππππππππ‘ πππ£ππ, πΌ=0.05, ππππππ‘ π‘βπ π»π. Next consider a two sided test for beta1. The test statistic ts is 45.5, from the lm output in R, the last two columns give you the t value and the pvalue. When we report the result on the p value, any value that is lower than can be reported as < [B] Since the pvalue is very small, we reject the hypothesis, and conclude that the slope is significantly different from 0. The result is also consistent with the confidence interval computed on the previous page and does not include 0. From lm output Consistent with CI of π· π when CI does not include 0 (all positives)
15
Estimate p value , two sided test, ππ=40, π‘ π =45.5
π‘ π =45.5>3.551 ππ£πππ’π<0.001 Now estimate the pvalue with T table. Use a degree of freedom of 40, the largest t value is The corresponding two sided P is This means that the probability is only of the test statistic in this distribution has a value that is larger than (> 3.551, or <-3.551). The probability will be smaller if the test statistic gets higher. Our test statistic is 45.5 which is much higher than , the the pvalue is estimated to be less than
16
The diamond weight and price example: hypothesis test for the slope π· π (one sided test)
Comment: R output is usually for the two sided test, and can be adjusted for one sided test. 1. π»π: π½ 1 =0 π£π π»π: π½ 1 >0 πβπ π‘ππ π‘ π π‘ππ‘ππ π‘ππ: π‘ π = π 1 β0 π π 1 = 3721β =45.5 P value πβπ π π£πππ’π=π π> < or < using T table πππππ π π£πππ’π<π‘βπ π ππππππππππ‘ πππ£ππ, πΌ=0.05, ππππππ‘ π‘βπ π»π. π‘ π =45.5 Estimate p value , one sided test, ππ=40, π‘ π =45.5 Suppose we want to do a one sided test. The test statistic is the same as long as data is the same. T is also 45.5. In a one sided test beta1>0, the bigger the ts is, the more likely the sample is, and more reason to reject Ho. Hence the pvalue is the area to the right. This pvalue is also the smaller area and can be directly access from T table. As shown, since ts is greater than the largest, it happens in a chance less than The p value is <0.0005 π‘ π =45.5>3.551 ππ£πππ’π<0.0005
17
The diamond weight and price example: hypothesis test for the slope π· π (one sided test)
Comment: R output is usually for the two sided test, and can be adjusted for one sided test. 2. π»π: π½ 1 =0 π£π π»π: π½ 1 <0 πβπ π‘ππ π‘ π π‘ππ‘ππ π‘ππ: π‘ π = π 1 β0 π π 1 = 3721β =45.5 P value πβπ π π£πππ’π=π π<45.5 >1β = ππ > π’π πππ π π‘ππππ πππππ π π£πππ’π>π‘βπ π ππππππππππ‘ πππ£ππ, πΌ=0.05, ππ πππ‘ ππππππ‘ π‘βπ π»π. π‘ π =45.5 One the other hand, if the hypothesis is in the opposite side, in this case, beta1<0. The test statistic doesnβt change since the sample data is the same. We will have the opposite analogy to the previous example. That is, the smaller the ts is, the more likely the sample is, and more reason to reject Ho. Hence the pvalue is the area to the left. Now the pvalue is left or the bigger area, which is 1-the smaller area. Recall that the t- table only gives you the smaller area, which is <0.0005, we obtain pvalue by subtracting from 1. Hence the pvalue > , or pvalue >
18
πβπ π‘ππ π‘ π π‘ππ‘ππ π‘ππ: π‘ π = π 0 β0 π π 0 = β259.63β0 17.32 =β14.99
The diamond weight and price example: hypothesis test for the intercept π· π (self exercise) π»π: π½ 0 =0 π£π π»π: π½ 0 β 0 πβπ π‘ππ π‘ π π‘ππ‘ππ π‘ππ: π‘ π = π 0 β0 π π 0 = β259.63β =β14.99 πβπ π π£πππ’π=2π |π|>14.99 <0.0001 πππππ π π£πππ’π<π‘βπ π ππππππππππ‘ πππ£ππ, πΌ=0.05, ππππππ‘ π‘βπ π»π. As a self practice, now try finding a confidence interval and hypothesis test of the intercept, beta0.
19
=β259.62Β±2.013 (17.31) = β294.487, β224.765 Answer: π 0 Β± π‘ π π π 0
The diamond weight and price example: Confidence interval for the intercept π· π (self exercise) Answer: π 0 Β± π‘ π π π 0 =β259.62Β±2.013 (17.31) = β , β So π½ 0 <0 , what does it mean? The confidence interval of intercept is -294, and But the intercept is the value of Y when X=0. Or the price of a diamond ring that not exist (X=0)? Actually, this is an example where we should consider what is the range of X (and hence Y) that actually should be considered meaningful or effective for a model. After discussing parameters beta0 and beta1 in the regression model, we will learn how to use the model to estimate the mean of Y or predict Y, in the next topic. It means nothing, this means we should consider the two extreme levels of predictors. The βeffective rangeβ of your model.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.