Statistical inference for the slope and intercept in SLR

Statistical inference for the slope and intercept in SLR
In this topic, we first study the parameter beta0 and beta1 in the regression model, and learn how to compute the confidence interval and perform hypothesis test about them.

Yi = β0 + β1Xi + 𝜀i 𝜀 𝑖 for i = 1, 2, . . . , n β0 is the intercept.
Simple Linear Regression Model Yi = β0 + β1Xi + 𝜀i for i = 1, 2, , n Simple Linear Regression Model Parameters β0 is the intercept. β1 is the slope. 𝜀 𝑖 are independent, normally distributed random errors with mean 0 and variance σ2, ∼ N (0, σ2) Throughout this topic and the remainder of Simple Linear Regression topics (unless otherwise stated), we assume that the normal error regression model is applicable. In this regression model, beta0 and beta1 are parameters that defines a linear function, Xi are known constants, and the random error epsilon are independent and follows Normal distribution. 𝜀 𝑖 iid

Inference for the slope, β1
Recall that, b 1 = Σ X i − 𝑋 𝑌 𝑖 − 𝑌 Σ 𝑋 𝑖 − 𝑋 2 which we can rewrite as, =Σ c i Y i − 𝑌 =Σ c i Y i − 𝑌 Σ 𝑐 𝑖 =Σ 𝑐 𝑖 𝑌 𝑖 ~ Normal where 𝑐 𝑖 = X i − 𝑋 Σ 𝑋 𝑖 − 𝑋 2 It can be proved that, 𝐸 𝑏 1 = 𝛽 1 𝑎𝑛𝑑 𝑉𝑎𝑟 𝑏 1 = 𝜎 2 1 Σ 𝑋 𝑖 − 𝑋 (denoted by 𝜎 2 𝑏 1 ), therefore By replacing the parameter 𝜎 2 with 𝑀𝑆𝐸, the unbiased estimator of 𝜎 2 𝑏 1 , we obtain the point estimator The point estimator b1 was given in topic 1 and shown here. b1 is a linear combination of the response variable Yi, and Yi has normal distribution. So b1 is also normally distributed. This sampling distribution of b1 describes the different values of b1 obtained during repeated sampling while X is kept as the same value. The mean of b1 is beta1 And the variance of b1 is sigma square divided by SSX. By replacing the parameter sigma2 with MSE, we obtain the point estimates. 𝑠 2 𝑏 1 = 𝑀𝑆𝐸 Σ 𝑋 𝑖 − 𝑋 2 𝑠 𝑏 1 = 𝑀𝑆𝐸 Σ 𝑋 𝑖 − 𝑋 2

𝒕 ∗ = 𝒃 𝟏 −𝜷 𝟏 𝒔 𝒃 𝟏 ~𝐭(𝐧−𝟐) 𝒃 𝟏 −𝜷 𝟏 𝝈 𝒃 𝟏 ~𝐍(𝟎,𝟏)
Sampling distribution of 𝒃𝟏− 𝜷 𝟏 𝒔 𝒃 𝟏 , denoted by 𝑡 ∗ 𝒕 ∗ = 𝒃 𝟏 −𝜷 𝟏 𝒔 𝒃 𝟏 ~𝐭(𝐧−𝟐) 𝒃 𝟏 −𝜷 𝟏 𝝈 𝒃 𝟏 ~𝐍(𝟎,𝟏) There are n − 2 degrees of freedom because 2 parameters must be estimated to obtain the numerator for s2: SSE = Σ [Yi − (b 0 − b1Xi)]2 Since b1 is normally distributed, we know that the standardized statistic b1-beta1 over sigma b1 is a standard normal variable. Ordinarily, we need to estimate sigma b1 by s b1, the standard error, and hence are interested in the distribution of b1-beta1 over s b1 as shown. When a statistic is standardized but the denominator is an estimated standard deviation rather than the true standard deviation, it is called a studentized statistic, or t. t follows a t distribution with a df of n-2. Two degree of freedom are lost here because two parameters (beta0 and beta1) need to be estimated first.

Confidence Interval for the slope β1
Since 𝑡 ∗ = 𝑏 1 −𝛽 1 𝑠 𝑏 1 ~ 𝑡(𝑛−2) 𝑃 𝑡 𝛼 2 ;𝑛−2 ≤ 𝑏 1 −𝛽 1 𝑠 𝑏 1 ≤𝑡 1− 𝛼 2 ;𝑛−2 =1−𝛼 Where 𝑡( 𝛼 2 ;𝑛−2) denotes the 𝛼 percentile of the t distribution with 𝑛−2 degrees of freedom. Because of the symmetry of the 𝑡 distribution around its mean 0, it follows that: 𝑡 𝛼 2 ;𝑛−2 =−𝑡(1− 𝛼 2 ;𝑛−2) Since the test statistic follows a t distribution, we can make the following probability statement. Where 𝑡( 𝛼 2 ;𝑛−2) denotes the 𝛼 percentile of the t distribution with 𝑛−2 degrees of freedom. Because of the symmetry of the 𝑡 distribution around its mean 0, it follows that the upper percentile and lower percentile are the same value, one is positive and the other is negative. 𝑡 𝛼 2 ;𝑛−2 =−𝑡(1− 𝛼 2 ;𝑛−2) Rearranging the inequalities in the probability statement, we obtain the formula of confidence interval of beta1. In general, the confidence interval is Point estimate ± Margin error, where Margin error (denoted by ME) = t * standard error Hence the 1−𝛼 confidence interval for 𝛽 1 are: 𝒃 𝟏 ±𝒕 𝟏− 𝜶 𝟐 ;𝒏−𝟐 𝒔{ 𝒃 𝟏 } Point estimate ± Margin error, where Margin error (denoted by ME) = t * standard error

Significance Tests for 𝜷 𝟏
𝐻𝑜: 𝛽 1 = 𝛽 1 ∗ 𝐻𝑎: 𝛽 1 ≠ 𝛽 1 ∗ The test statistic 𝑡 ∗ = (𝑏 1 − 𝛽 1 ∗ )/𝑠{ 𝑏 1 }~𝑡(𝑛−2) For two sided test Reject H0 if | t∗| ≥ tc, tc = tn−2(1 − α/2) Or, reject H0 if 𝑝−𝑣𝑎𝑙𝑢𝑒 ≤ α For one sided test Since the test statistic follows a t distribution, the test concerning beta1 is a regular t test, and should have been covered in your previous statistical course. Here I assume everyone has a good understanding on how to perform the test., and just present the major reject rules here. Reject H0 if | t∗| ≥ tc, tc = tn−2(1 − α) Or, reject H0 if 𝑝−𝑣𝑎𝑙𝑢𝑒 ≤ α

Inference for the intercept, β0
𝑏 0 = 𝑌 − 𝑏 1 𝑋 It can be proved that, 𝐸 𝑏 0 = 𝛽 0 𝑎𝑛𝑑 𝑉𝑎𝑟 𝑏 0 = 𝜎 2 [ 1 𝑛 + 𝑋 2 Σ 𝑋 𝑖 − 𝑋 2 ] (denoted by 𝜎 2 𝑏 0 ), therefore By replacing the parameter 𝜎 2 with 𝑀𝑆𝐸, the unbiased estimator of 𝜎 2 𝑏 0 , we obtain the point estimator 𝑠 2 𝑏 0 =𝑀𝑆𝐸[ 1 𝑛 + 𝑋 2 Σ 𝑋 𝑖 − 𝑋 2 ] 𝑠 𝑏 0 = 𝑀𝑆𝐸[ 1 𝑛 + 𝑋 2 Σ 𝑋 𝑖 − 𝑋 2 ] Now let’s switch to the intercept, beta0. The point estimate of bo is the average of y minus b1 times the average of X. Its sampling distribution is the different values of b0 that would be obtained with repeated sampling with one X value. The mean is the true beta0, and the variance is also related to the residual variance, sigma square, as shown here. Like the slope term b1, bo is also an unbiased estimator. By replacing the parameter sigma square with MSE, we get the standard error, denoted by s of b0. Analogous to the theorem for b1, we use the similar t test to get the inference for the intercept, beta0. Analogous to theorem for 𝑏 1 , 𝑡 ∗ = (𝑏 0 − 𝛽 0 )/𝑠{ 𝑏 0 } ~ 𝑡(𝑛−2)

Confidence Interval for β0
𝑏 0 ±𝑡 1− 𝛼 2 ;𝑛−2 𝑠{ 𝑏 0 } Significance Tests for 𝜷 𝟎 𝐻𝑜: 𝛽 0 = 𝛽 0 ∗ 𝐻𝑎: 𝛽 0 ≠ 𝛽 0 ∗ The test statistic 𝑡 ∗ = (𝑏 0 − 𝛽 0 ∗ )/𝑠{ 𝑏 0 }~𝑡(𝑛−2)

Comments on the inference assumptions
Both 𝑏 1 and 𝑏 0 follow Normal distribution because they are based on Yi, which are themselves independent and normally distributed. As long as the Yi are close to normal, inferences (CIs and hypothesis tests) based on the t distribution will be approximately correct, even with small sample sizes. In general, the CLT ensures that b 0 and b 1 are asymptotically normal as long as the random errors are independently and identically distributed (iid). Therefore, inferences based on the t distribution will be approximately correct as long as n is large enough. 𝑏 1 = 𝛴 𝑐 𝑖 𝑌 𝑖 𝑏 0 = 𝑌 − 𝑏 1 𝑋 That is: Y has a symmetric distribution without outliers That is, when Y follows any form of distribution Regarding the assumptions or limitations on performing confidence interval and hypothesis test on the parameter beta0 and beta1. The sampling distribution of b0 and b1 are both normal since they are computed from Y. When the sample size is small, as long as Y is close to normal, the T-method will be approximately correct. [B] Note that the requirement “close to normal” means Y may not be a normal distribution, but at least it should has a symmetric distribution with no outliers. When the sample size is big, b0 and b1 are asymptotically normal as long as the random errors are independently and identically distributed. [B] This means that when Y is not close to Normal, with skew pattern and outliers, you will need a big sample size to ensure the T method is appropriate. There is no rule for how big it is, n of 25 to 40 is a good starting point in general case for using the T test.

Comments on the inference assumptions
Often, the value of the intercept is not of direct interest, so there is no need to calculate CIs or hypothesis tests on β0. Because it is just a single value of Y when X=0 and will be of no much value to predict other Y values. Caution again: the linear regression model might not be appropriate when the scope of the model is extended to X=0 𝜎 2 1 Σ 𝑋 𝑖 − 𝑋 , Because 𝜎 2 𝑏 1 = we can increase the precision of the estimator, i.e., reduce this sigma by increasing the dilation in X, i.e., bigger Σ 𝑋 𝑖 − 𝑋 2 Now that you have the basic theory layout, I want to mention some practical issues. In linear model, slope typically means the changes in Y when X changes, For example, in the diamond’s case, b1=3721, for every one more carat, the price will go up $3721 on average. The intercept, on the other hand, is usually of no direct interest since it means a single value of Y when X equals 0. Think about “what is the price when carat=0”? . [B] the linear regression model might not be appropriate when the X scope of model is extended to 0 Second, because the variance of residual of the slope is sigma square over SSX. One way to increase precision of the estimator is to reduce the variance of residual, or to increase the SSX. [B] To do this, when we design an experiment, collect X variables in a wider and random range. For example, if we could only study 100 diamond rings, try collect different weights of diamond ring rather than many rings with similar weight. The precision of the estimators also depends on other issues such as sample size and number of parameters. Collect X variables in a wider and random range The precision of the estimators also depends on the difference between sample size and the number of functional parameters (βs) to be estimated.

The diamond weight and price example: Confidence interval for the slope 𝜷 𝟏
𝑏 1 ±𝑡 1− 𝛼 2 ;𝑛−2 𝑠{ 𝑏 1 } Where 𝛼=0.05, 𝑛=48 From lm output Now compute the confidence interval and hypothesis test for the slope beta1. We will show both how to do it from R and by hand. In the diamond example, b1 is estimated be 3721 and residual standard error s to be Use a significant level of 0.05, As shown in the lower right codes, the confint function compute the confidence level in R. The first function parameter is the lm model; the second is your target parameter, here we want to compute the confidence interval for the slope, i.e., weight. By default, it will show you both confidence interval for the intercept and slope. The third function parameter is the confidence level, or 1-alpha. From confint output

Where 𝛼=0.05, 𝑛=48, 𝑑𝑓=46 𝑟𝑜𝑢𝑛𝑑 𝑑𝑜𝑤𝑛 𝑡𝑜 40
Now see how to compute by hand with the t table. When using t table, we don’t always have the degree of freedom, or pvalues, and need to estimate. For example, the df of 46 is not available, we need to round down to the closest value, 40. The reason of using a smaller df is to have a larger t value and wider interval, and be precise. In the case, the t value for 40 df is We can use this value to computer the confidence interval. In R, the qt funciton gives the t value for 46 df, You can use boh values in the homework. But only use the t table in the exam. Or use R

The diamond weight and price example: Confidence interval for the slope 𝜷 𝟏
𝑏 1 ±𝑡 1− 𝛼 2 ;𝑛−2 𝑠{ 𝑏 1 } Where 𝛼=0.05, 𝑛=48 =3721±𝟐.𝟎𝟏𝟑 (81.79) = , From lm output 𝑴𝑺𝑬= 𝒔 𝟐 = 𝟑𝟏.𝟖𝟒 𝟐 =𝟏𝟎𝟏𝟑.𝟖 𝑠 𝑏 1 = 𝑀𝑆𝐸 Σ 𝑋 𝑖 − 𝑋 = 𝑠 𝑋 2 𝑛−1 = −1 = =81.7 We now know the b1 is 3721, t value is or (t able). The last thing is the standard error of the b1. In the R output, the standard error is provided as You can compute it with the formula, using MSE of 1014 and SSX of Note that here I also show how to computer SSX from the standard deviation, SSX= Sx square times (n-1), where Sx is the usual standard deviation of X. This is a useful trick especially for the exam. In case you forgot from your past stat course, I suggest you copy this onto the cheat sheet. [B] Finally, the confidence interval to be and , the average price increases by at least 3556 and at most 3889 dollars when the diamond is 1 carat heavier. From confint output Conclusion: we are 95% confident that, the average price will increase by at least 3556 and at most 3889 when the weight increase by 1 carat

The diamond weight and price example: hypothesis test for the slope 𝜷 𝟏
𝐻𝑜: 𝛽 1 =0 𝑣𝑠 𝐻𝑎: 𝛽 1 ≠0 𝑇ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: 𝑡 𝑠 = 𝑏 1 −0 𝑆 𝑏 1 = 3721− =45.5 𝑇ℎ𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒=2𝑃 𝑇>45.5 <0.0001, or <0.001 using T table 𝑆𝑖𝑛𝑐𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒<𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, 𝑟𝑒𝑗𝑒𝑐𝑡 𝑡ℎ𝑒 𝐻𝑜. Next consider a two sided test for beta1. The test statistic ts is 45.5, from the lm output in R, the last two columns give you the t value and the pvalue. When we report the result on the p value, any value that is lower than can be reported as < [B] Since the pvalue is very small, we reject the hypothesis, and conclude that the slope is significantly different from 0. The result is also consistent with the confidence interval computed on the previous page and does not include 0. From lm output Consistent with CI of 𝜷 𝟏 when CI does not include 0 (all positives)

Estimate p value , two sided test, 𝑑𝑓=40, 𝑡 𝑠 =45.5
𝑡 𝑠 =45.5>3.551 𝑝𝑣𝑎𝑙𝑢𝑒<0.001 Now estimate the pvalue with T table. Use a degree of freedom of 40, the largest t value is The corresponding two sided P is This means that the probability is only of the test statistic in this distribution has a value that is larger than (> 3.551, or <-3.551). The probability will be smaller if the test statistic gets higher. Our test statistic is 45.5 which is much higher than , the the pvalue is estimated to be less than

The diamond weight and price example: hypothesis test for the slope 𝜷 𝟏 (one sided test)
Comment: R output is usually for the two sided test, and can be adjusted for one sided test. 1. 𝐻𝑜: 𝛽 1 =0 𝑣𝑠 𝐻𝑎: 𝛽 1 >0 𝑇ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: 𝑡 𝑠 = 𝑏 1 −0 𝑆 𝑏 1 = 3721− =45.5 P value 𝑇ℎ𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒=𝑃 𝑇> < or < using T table 𝑆𝑖𝑛𝑐𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒<𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, 𝑟𝑒𝑗𝑒𝑐𝑡 𝑡ℎ𝑒 𝐻𝑜. 𝑡 𝑠 =45.5 Estimate p value , one sided test, 𝑑𝑓=40, 𝑡 𝑠 =45.5 Suppose we want to do a one sided test. The test statistic is the same as long as data is the same. T is also 45.5. In a one sided test beta1>0, the bigger the ts is, the more likely the sample is, and more reason to reject Ho. Hence the pvalue is the area to the right. This pvalue is also the smaller area and can be directly access from T table. As shown, since ts is greater than the largest, it happens in a chance less than The p value is <0.0005 𝑡 𝑠 =45.5>3.551 𝑝𝑣𝑎𝑙𝑢𝑒<0.0005

The diamond weight and price example: hypothesis test for the slope 𝜷 𝟏 (one sided test)
Comment: R output is usually for the two sided test, and can be adjusted for one sided test. 2. 𝐻𝑜: 𝛽 1 =0 𝑣𝑠 𝐻𝑎: 𝛽 1 <0 𝑇ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: 𝑡 𝑠 = 𝑏 1 −0 𝑆 𝑏 1 = 3721− =45.5 P value 𝑇ℎ𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒=𝑃 𝑇<45.5 >1− = 𝑜𝑟 > 𝑢𝑠𝑖𝑛𝑔 𝑇 𝑡𝑎𝑏𝑙𝑒 𝑆𝑖𝑛𝑐𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒>𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, 𝑑𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝑡ℎ𝑒 𝐻𝑜. 𝑡 𝑠 =45.5 One the other hand, if the hypothesis is in the opposite side, in this case, beta1<0. The test statistic doesn’t change since the sample data is the same. We will have the opposite analogy to the previous example. That is, the smaller the ts is, the more likely the sample is, and more reason to reject Ho. Hence the pvalue is the area to the left. Now the pvalue is left or the bigger area, which is 1-the smaller area. Recall that the t- table only gives you the smaller area, which is <0.0005, we obtain pvalue by subtracting from 1. Hence the pvalue > , or pvalue >

𝑇ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: 𝑡 𝑠 = 𝑏 0 −0 𝑆 𝑏 0 = −259.63−0 17.32 =−14.99
The diamond weight and price example: hypothesis test for the intercept 𝜷 𝟎 (self exercise) 𝐻𝑜: 𝛽 0 =0 𝑣𝑠 𝐻𝑎: 𝛽 0 ≠0 𝑇ℎ𝑒 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: 𝑡 𝑠 = 𝑏 0 −0 𝑆 𝑏 0 = −259.63− =−14.99 𝑇ℎ𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒=2𝑃 |𝑇|>14.99 <0.0001 𝑆𝑖𝑛𝑐𝑒 𝑝 𝑣𝑎𝑙𝑢𝑒<𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, 𝑟𝑒𝑗𝑒𝑐𝑡 𝑡ℎ𝑒 𝐻𝑜. As a self practice, now try finding a confidence interval and hypothesis test of the intercept, beta0.

=−259.62±2.013 (17.31) = −294.487, −224.765 Answer: 𝑏 0 ± 𝑡 𝑐 𝑆 𝑏 0
The diamond weight and price example: Confidence interval for the intercept 𝜷 𝟎 (self exercise) Answer: 𝑏 0 ± 𝑡 𝑐 𝑆 𝑏 0 =−259.62±2.013 (17.31) = − , − So 𝛽 0 <0 , what does it mean? The confidence interval of intercept is -294, and But the intercept is the value of Y when X=0. Or the price of a diamond ring that not exist (X=0)? Actually, this is an example where we should consider what is the range of X (and hence Y) that actually should be considered meaningful or effective for a model. After discussing parameters beta0 and beta1 in the regression model, we will learn how to use the model to estimate the mean of Y or predict Y, in the next topic. It means nothing, this means we should consider the two extreme levels of predictors. The “effective range” of your model.

Statistical inference for the slope and intercept in SLR

Similar presentations

Presentation on theme: "Statistical inference for the slope and intercept in SLR"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical inference for the slope and intercept in SLR

Similar presentations

Presentation on theme: "Statistical inference for the slope and intercept in SLR"— Presentation transcript:

Similar presentations

About project

Feedback