Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical inference for the slope and intercept in SLR

Similar presentations


Presentation on theme: "Statistical inference for the slope and intercept in SLR"β€” Presentation transcript:

1 Statistical inference for the slope and intercept in SLR
In this topic, we first study the parameter beta0 and beta1 in the regression model, and learn how to compute the confidence interval and perform hypothesis test about them.

2 Yi = Ξ²0 + Ξ²1Xi + πœ€i πœ€ 𝑖 for i = 1, 2, . . . , n Ξ²0 is the intercept.
Simple Linear Regression Model Yi = Ξ²0 + Ξ²1Xi + πœ€i for i = 1, 2, , n Simple Linear Regression Model Parameters Ξ²0 is the intercept. Ξ²1 is the slope. πœ€ 𝑖 are independent, normally distributed random errors with mean 0 and variance Οƒ2, ∼ N (0, Οƒ2) Throughout this topic and the remainder of Simple Linear Regression topics (unless otherwise stated), we assume that the normal error regression model is applicable. In this regression model, beta0 and beta1 are parameters that defines a linear function, Xi are known constants, and the random error epsilon are independent and follows Normal distribution. πœ€ 𝑖 iid

3 Inference for the slope, Ξ²1
Recall that, b 1 = Ξ£ X i βˆ’ 𝑋 π‘Œ 𝑖 βˆ’ π‘Œ Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2 which we can rewrite as, =Ξ£ c i Y i βˆ’ π‘Œ =Ξ£ c i Y i βˆ’ π‘Œ Ξ£ 𝑐 𝑖 =Ξ£ 𝑐 𝑖 π‘Œ 𝑖 ~ Normal where 𝑐 𝑖 = X i βˆ’ 𝑋 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2 It can be proved that, 𝐸 𝑏 1 = 𝛽 1 π‘Žπ‘›π‘‘ π‘‰π‘Žπ‘Ÿ 𝑏 1 = 𝜎 2 1 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 (denoted by 𝜎 2 𝑏 1 ), therefore By replacing the parameter 𝜎 2 with 𝑀𝑆𝐸, the unbiased estimator of 𝜎 2 𝑏 1 , we obtain the point estimator The point estimator b1 was given in topic 1 and shown here. b1 is a linear combination of the response variable Yi, and Yi has normal distribution. So b1 is also normally distributed. This sampling distribution of b1 describes the different values of b1 obtained during repeated sampling while X is kept as the same value. The mean of b1 is beta1 And the variance of b1 is sigma square divided by SSX. By replacing the parameter sigma2 with MSE, we obtain the point estimates. 𝑠 2 𝑏 1 = 𝑀𝑆𝐸 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2 𝑠 𝑏 1 = 𝑀𝑆𝐸 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2

4 𝒕 βˆ— = 𝒃 𝟏 βˆ’πœ· 𝟏 𝒔 𝒃 𝟏 ~𝐭(π§βˆ’πŸ) 𝒃 𝟏 βˆ’πœ· 𝟏 𝝈 𝒃 𝟏 ~𝐍(𝟎,𝟏)
Sampling distribution of π’ƒπŸβˆ’ 𝜷 𝟏 𝒔 𝒃 𝟏 , denoted by 𝑑 βˆ— 𝒕 βˆ— = 𝒃 𝟏 βˆ’πœ· 𝟏 𝒔 𝒃 𝟏 ~𝐭(π§βˆ’πŸ) 𝒃 𝟏 βˆ’πœ· 𝟏 𝝈 𝒃 𝟏 ~𝐍(𝟎,𝟏) There are n βˆ’ 2 degrees of freedom because 2 parameters must be estimated to obtain the numerator for s2: SSE = Ξ£ [Yi βˆ’ (b 0 βˆ’ b1Xi)]2 Since b1 is normally distributed, we know that the standardized statistic b1-beta1 over sigma b1 is a standard normal variable. Ordinarily, we need to estimate sigma b1 by s b1, the standard error, and hence are interested in the distribution of b1-beta1 over s b1 as shown. When a statistic is standardized but the denominator is an estimated standard deviation rather than the true standard deviation, it is called a studentized statistic, or t. t follows a t distribution with a df of n-2. Two degree of freedom are lost here because two parameters (beta0 and beta1) need to be estimated first.

5 Confidence Interval for the slope Ξ²1
Since 𝑑 βˆ— = 𝑏 1 βˆ’π›½ 1 𝑠 𝑏 1 ~ 𝑑(π‘›βˆ’2) 𝑃 𝑑 𝛼 2 ;π‘›βˆ’2 ≀ 𝑏 1 βˆ’π›½ 1 𝑠 𝑏 1 ≀𝑑 1βˆ’ 𝛼 2 ;π‘›βˆ’2 =1βˆ’π›Ό Where 𝑑( 𝛼 2 ;π‘›βˆ’2) denotes the 𝛼 percentile of the t distribution with π‘›βˆ’2 degrees of freedom. Because of the symmetry of the 𝑑 distribution around its mean 0, it follows that: 𝑑 𝛼 2 ;π‘›βˆ’2 =βˆ’π‘‘(1βˆ’ 𝛼 2 ;π‘›βˆ’2) Since the test statistic follows a t distribution, we can make the following probability statement. Where 𝑑( 𝛼 2 ;π‘›βˆ’2) denotes the 𝛼 percentile of the t distribution with π‘›βˆ’2 degrees of freedom. Because of the symmetry of the 𝑑 distribution around its mean 0, it follows that the upper percentile and lower percentile are the same value, one is positive and the other is negative. 𝑑 𝛼 2 ;π‘›βˆ’2 =βˆ’π‘‘(1βˆ’ 𝛼 2 ;π‘›βˆ’2) Rearranging the inequalities in the probability statement, we obtain the formula of confidence interval of beta1. In general, the confidence interval is Point estimate Β± Margin error, where Margin error (denoted by ME) = t * standard error Hence the 1βˆ’π›Ό confidence interval for 𝛽 1 are: 𝒃 𝟏 ±𝒕 πŸβˆ’ 𝜢 𝟐 ;π’βˆ’πŸ 𝒔{ 𝒃 𝟏 } Point estimate Β± Margin error, where Margin error (denoted by ME) = t * standard error

6 Significance Tests for 𝜷 𝟏
π»π‘œ: 𝛽 1 = 𝛽 1 βˆ— π»π‘Ž: 𝛽 1 β‰  𝛽 1 βˆ— The test statistic 𝑑 βˆ— = (𝑏 1 βˆ’ 𝛽 1 βˆ— )/𝑠{ 𝑏 1 }~𝑑(π‘›βˆ’2) For two sided test Reject H0 if | tβˆ—| β‰₯ tc, tc = tnβˆ’2(1 βˆ’ Ξ±/2) Or, reject H0 if π‘βˆ’π‘£π‘Žπ‘™π‘’π‘’ ≀ Ξ± For one sided test Since the test statistic follows a t distribution, the test concerning beta1 is a regular t test, and should have been covered in your previous statistical course. Here I assume everyone has a good understanding on how to perform the test., and just present the major reject rules here. Reject H0 if | tβˆ—| β‰₯ tc, tc = tnβˆ’2(1 βˆ’ Ξ±) Or, reject H0 if π‘βˆ’π‘£π‘Žπ‘™π‘’π‘’ ≀ Ξ±

7 Inference for the intercept, Ξ²0
𝑏 0 = π‘Œ βˆ’ 𝑏 1 𝑋 It can be proved that, 𝐸 𝑏 0 = 𝛽 0 π‘Žπ‘›π‘‘ π‘‰π‘Žπ‘Ÿ 𝑏 0 = 𝜎 2 [ 1 𝑛 + 𝑋 2 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2 ] (denoted by 𝜎 2 𝑏 0 ), therefore By replacing the parameter 𝜎 2 with 𝑀𝑆𝐸, the unbiased estimator of 𝜎 2 𝑏 0 , we obtain the point estimator 𝑠 2 𝑏 0 =𝑀𝑆𝐸[ 1 𝑛 + 𝑋 2 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2 ] 𝑠 𝑏 0 = 𝑀𝑆𝐸[ 1 𝑛 + 𝑋 2 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2 ] Now let’s switch to the intercept, beta0. The point estimate of bo is the average of y minus b1 times the average of X. Its sampling distribution is the different values of b0 that would be obtained with repeated sampling with one X value. The mean is the true beta0, and the variance is also related to the residual variance, sigma square, as shown here. Like the slope term b1, bo is also an unbiased estimator. By replacing the parameter sigma square with MSE, we get the standard error, denoted by s of b0. Analogous to the theorem for b1, we use the similar t test to get the inference for the intercept, beta0. Analogous to theorem for 𝑏 1 , 𝑑 βˆ— = (𝑏 0 βˆ’ 𝛽 0 )/𝑠{ 𝑏 0 } ~ 𝑑(π‘›βˆ’2)

8 Confidence Interval for Ξ²0
𝑏 0 ±𝑑 1βˆ’ 𝛼 2 ;π‘›βˆ’2 𝑠{ 𝑏 0 } Significance Tests for 𝜷 𝟎 π»π‘œ: 𝛽 0 = 𝛽 0 βˆ— π»π‘Ž: 𝛽 0 β‰  𝛽 0 βˆ— The test statistic 𝑑 βˆ— = (𝑏 0 βˆ’ 𝛽 0 βˆ— )/𝑠{ 𝑏 0 }~𝑑(π‘›βˆ’2)

9 Comments on the inference assumptions
Both 𝑏 1 and 𝑏 0 follow Normal distribution because they are based on Yi, which are themselves independent and normally distributed. As long as the Yi are close to normal, inferences (CIs and hypothesis tests) based on the t distribution will be approximately correct, even with small sample sizes. In general, the CLT ensures that b 0 and b 1 are asymptotically normal as long as the random errors are independently and identically distributed (iid). Therefore, inferences based on the t distribution will be approximately correct as long as n is large enough. 𝑏 1 = 𝛴 𝑐 𝑖 π‘Œ 𝑖 𝑏 0 = π‘Œ βˆ’ 𝑏 1 𝑋 That is: Y has a symmetric distribution without outliers That is, when Y follows any form of distribution Regarding the assumptions or limitations on performing confidence interval and hypothesis test on the parameter beta0 and beta1. The sampling distribution of b0 and b1 are both normal since they are computed from Y. When the sample size is small, as long as Y is close to normal, the T-method will be approximately correct. [B] Note that the requirement β€œclose to normal” means Y may not be a normal distribution, but at least it should has a symmetric distribution with no outliers. When the sample size is big, b0 and b1 are asymptotically normal as long as the random errors are independently and identically distributed. [B] This means that when Y is not close to Normal, with skew pattern and outliers, you will need a big sample size to ensure the T method is appropriate. There is no rule for how big it is, n of 25 to 40 is a good starting point in general case for using the T test.

10 Comments on the inference assumptions
Often, the value of the intercept is not of direct interest, so there is no need to calculate CIs or hypothesis tests on Ξ²0. Because it is just a single value of Y when X=0 and will be of no much value to predict other Y values. Caution again: the linear regression model might not be appropriate when the scope of the model is extended to X=0 𝜎 2 1 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 , Because 𝜎 2 𝑏 1 = we can increase the precision of the estimator, i.e., reduce this sigma by increasing the dilation in X, i.e., bigger Ξ£ 𝑋 𝑖 βˆ’ 𝑋 2 Now that you have the basic theory layout, I want to mention some practical issues. In linear model, slope typically means the changes in Y when X changes, For example, in the diamond’s case, b1=3721, for every one more carat, the price will go up $3721 on average. The intercept, on the other hand, is usually of no direct interest since it means a single value of Y when X equals 0. Think about β€œwhat is the price when carat=0”? . [B] the linear regression model might not be appropriate when the X scope of model is extended to 0 Second, because the variance of residual of the slope is sigma square over SSX. One way to increase precision of the estimator is to reduce the variance of residual, or to increase the SSX. [B] To do this, when we design an experiment, collect X variables in a wider and random range. For example, if we could only study 100 diamond rings, try collect different weights of diamond ring rather than many rings with similar weight. The precision of the estimators also depends on other issues such as sample size and number of parameters. Collect X variables in a wider and random range The precision of the estimators also depends on the difference between sample size and the number of functional parameters (Ξ²s) to be estimated.

11 The diamond weight and price example: Confidence interval for the slope 𝜷 𝟏
𝑏 1 ±𝑑 1βˆ’ 𝛼 2 ;π‘›βˆ’2 𝑠{ 𝑏 1 } Where 𝛼=0.05, 𝑛=48 From lm output Now compute the confidence interval and hypothesis test for the slope beta1. We will show both how to do it from R and by hand. In the diamond example, b1 is estimated be 3721 and residual standard error s to be Use a significant level of 0.05, As shown in the lower right codes, the confint function compute the confidence level in R. The first function parameter is the lm model; the second is your target parameter, here we want to compute the confidence interval for the slope, i.e., weight. By default, it will show you both confidence interval for the intercept and slope. The third function parameter is the confidence level, or 1-alpha. From confint output

12 Where 𝛼=0.05, 𝑛=48, 𝑑𝑓=46 π‘Ÿπ‘œπ‘’π‘›π‘‘ π‘‘π‘œπ‘€π‘› π‘‘π‘œ 40
Now see how to compute by hand with the t table. When using t table, we don’t always have the degree of freedom, or pvalues, and need to estimate. For example, the df of 46 is not available, we need to round down to the closest value, 40. The reason of using a smaller df is to have a larger t value and wider interval, and be precise. In the case, the t value for 40 df is We can use this value to computer the confidence interval. In R, the qt funciton gives the t value for 46 df, You can use boh values in the homework. But only use the t table in the exam. Or use R

13 The diamond weight and price example: Confidence interval for the slope 𝜷 𝟏
𝑏 1 ±𝑑 1βˆ’ 𝛼 2 ;π‘›βˆ’2 𝑠{ 𝑏 1 } Where 𝛼=0.05, 𝑛=48 =3721±𝟐.πŸŽπŸπŸ‘ (81.79) = , From lm output 𝑴𝑺𝑬= 𝒔 𝟐 = πŸ‘πŸ.πŸ–πŸ’ 𝟐 =πŸπŸŽπŸπŸ‘.πŸ– 𝑠 𝑏 1 = 𝑀𝑆𝐸 Ξ£ 𝑋 𝑖 βˆ’ 𝑋 = 𝑠 𝑋 2 π‘›βˆ’1 = βˆ’1 = =81.7 We now know the b1 is 3721, t value is or (t able). The last thing is the standard error of the b1. In the R output, the standard error is provided as You can compute it with the formula, using MSE of 1014 and SSX of Note that here I also show how to computer SSX from the standard deviation, SSX= Sx square times (n-1), where Sx is the usual standard deviation of X. This is a useful trick especially for the exam. In case you forgot from your past stat course, I suggest you copy this onto the cheat sheet. [B] Finally, the confidence interval to be and , the average price increases by at least 3556 and at most 3889 dollars when the diamond is 1 carat heavier. From confint output Conclusion: we are 95% confident that, the average price will increase by at least 3556 and at most 3889 when the weight increase by 1 carat

14 The diamond weight and price example: hypothesis test for the slope 𝜷 𝟏
π»π‘œ: 𝛽 1 =0 𝑣𝑠 π»π‘Ž: 𝛽 1 β‰ 0 π‘‡β„Žπ‘’ 𝑑𝑒𝑠𝑑 π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘: 𝑑 𝑠 = 𝑏 1 βˆ’0 𝑆 𝑏 1 = 3721βˆ’ =45.5 π‘‡β„Žπ‘’ 𝑝 π‘£π‘Žπ‘™π‘’π‘’=2𝑃 𝑇>45.5 <0.0001, or <0.001 using T table 𝑆𝑖𝑛𝑐𝑒 𝑝 π‘£π‘Žπ‘™π‘’π‘’<π‘‘β„Žπ‘’ π‘ π‘–π‘”π‘›π‘–π‘“π‘–π‘π‘Žπ‘›π‘‘ 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, π‘Ÿπ‘’π‘—π‘’π‘π‘‘ π‘‘β„Žπ‘’ π»π‘œ. Next consider a two sided test for beta1. The test statistic ts is 45.5, from the lm output in R, the last two columns give you the t value and the pvalue. When we report the result on the p value, any value that is lower than can be reported as < [B] Since the pvalue is very small, we reject the hypothesis, and conclude that the slope is significantly different from 0. The result is also consistent with the confidence interval computed on the previous page and does not include 0. From lm output Consistent with CI of 𝜷 𝟏 when CI does not include 0 (all positives)

15 Estimate p value , two sided test, 𝑑𝑓=40, 𝑑 𝑠 =45.5
𝑑 𝑠 =45.5>3.551 π‘π‘£π‘Žπ‘™π‘’π‘’<0.001 Now estimate the pvalue with T table. Use a degree of freedom of 40, the largest t value is The corresponding two sided P is This means that the probability is only of the test statistic in this distribution has a value that is larger than (> 3.551, or <-3.551). The probability will be smaller if the test statistic gets higher. Our test statistic is 45.5 which is much higher than , the the pvalue is estimated to be less than

16 The diamond weight and price example: hypothesis test for the slope 𝜷 𝟏 (one sided test)
Comment: R output is usually for the two sided test, and can be adjusted for one sided test. 1. π»π‘œ: 𝛽 1 =0 𝑣𝑠 π»π‘Ž: 𝛽 1 >0 π‘‡β„Žπ‘’ 𝑑𝑒𝑠𝑑 π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘: 𝑑 𝑠 = 𝑏 1 βˆ’0 𝑆 𝑏 1 = 3721βˆ’ =45.5 P value π‘‡β„Žπ‘’ 𝑝 π‘£π‘Žπ‘™π‘’π‘’=𝑃 𝑇> < or < using T table 𝑆𝑖𝑛𝑐𝑒 𝑝 π‘£π‘Žπ‘™π‘’π‘’<π‘‘β„Žπ‘’ π‘ π‘–π‘”π‘›π‘–π‘“π‘–π‘π‘Žπ‘›π‘‘ 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, π‘Ÿπ‘’π‘—π‘’π‘π‘‘ π‘‘β„Žπ‘’ π»π‘œ. 𝑑 𝑠 =45.5 Estimate p value , one sided test, 𝑑𝑓=40, 𝑑 𝑠 =45.5 Suppose we want to do a one sided test. The test statistic is the same as long as data is the same. T is also 45.5. In a one sided test beta1>0, the bigger the ts is, the more likely the sample is, and more reason to reject Ho. Hence the pvalue is the area to the right. This pvalue is also the smaller area and can be directly access from T table. As shown, since ts is greater than the largest, it happens in a chance less than The p value is <0.0005 𝑑 𝑠 =45.5>3.551 π‘π‘£π‘Žπ‘™π‘’π‘’<0.0005

17 The diamond weight and price example: hypothesis test for the slope 𝜷 𝟏 (one sided test)
Comment: R output is usually for the two sided test, and can be adjusted for one sided test. 2. π»π‘œ: 𝛽 1 =0 𝑣𝑠 π»π‘Ž: 𝛽 1 <0 π‘‡β„Žπ‘’ 𝑑𝑒𝑠𝑑 π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘: 𝑑 𝑠 = 𝑏 1 βˆ’0 𝑆 𝑏 1 = 3721βˆ’ =45.5 P value π‘‡β„Žπ‘’ 𝑝 π‘£π‘Žπ‘™π‘’π‘’=𝑃 𝑇<45.5 >1βˆ’ = π‘œπ‘Ÿ > 𝑒𝑠𝑖𝑛𝑔 𝑇 π‘‘π‘Žπ‘π‘™π‘’ 𝑆𝑖𝑛𝑐𝑒 𝑝 π‘£π‘Žπ‘™π‘’π‘’>π‘‘β„Žπ‘’ π‘ π‘–π‘”π‘›π‘–π‘“π‘–π‘π‘Žπ‘›π‘‘ 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, π‘‘π‘œ π‘›π‘œπ‘‘ π‘Ÿπ‘’π‘—π‘’π‘π‘‘ π‘‘β„Žπ‘’ π»π‘œ. 𝑑 𝑠 =45.5 One the other hand, if the hypothesis is in the opposite side, in this case, beta1<0. The test statistic doesn’t change since the sample data is the same. We will have the opposite analogy to the previous example. That is, the smaller the ts is, the more likely the sample is, and more reason to reject Ho. Hence the pvalue is the area to the left. Now the pvalue is left or the bigger area, which is 1-the smaller area. Recall that the t- table only gives you the smaller area, which is <0.0005, we obtain pvalue by subtracting from 1. Hence the pvalue > , or pvalue >

18 π‘‡β„Žπ‘’ 𝑑𝑒𝑠𝑑 π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘: 𝑑 𝑠 = 𝑏 0 βˆ’0 𝑆 𝑏 0 = βˆ’259.63βˆ’0 17.32 =βˆ’14.99
The diamond weight and price example: hypothesis test for the intercept 𝜷 𝟎 (self exercise) π»π‘œ: 𝛽 0 =0 𝑣𝑠 π»π‘Ž: 𝛽 0 β‰ 0 π‘‡β„Žπ‘’ 𝑑𝑒𝑠𝑑 π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘: 𝑑 𝑠 = 𝑏 0 βˆ’0 𝑆 𝑏 0 = βˆ’259.63βˆ’ =βˆ’14.99 π‘‡β„Žπ‘’ 𝑝 π‘£π‘Žπ‘™π‘’π‘’=2𝑃 |𝑇|>14.99 <0.0001 𝑆𝑖𝑛𝑐𝑒 𝑝 π‘£π‘Žπ‘™π‘’π‘’<π‘‘β„Žπ‘’ π‘ π‘–π‘”π‘›π‘–π‘“π‘–π‘π‘Žπ‘›π‘‘ 𝑙𝑒𝑣𝑒𝑙, 𝛼=0.05, π‘Ÿπ‘’π‘—π‘’π‘π‘‘ π‘‘β„Žπ‘’ π»π‘œ. As a self practice, now try finding a confidence interval and hypothesis test of the intercept, beta0.

19 =βˆ’259.62Β±2.013 (17.31) = βˆ’294.487, βˆ’224.765 Answer: 𝑏 0 Β± 𝑑 𝑐 𝑆 𝑏 0
The diamond weight and price example: Confidence interval for the intercept 𝜷 𝟎 (self exercise) Answer: 𝑏 0 Β± 𝑑 𝑐 𝑆 𝑏 0 =βˆ’259.62Β±2.013 (17.31) = βˆ’ , βˆ’ So 𝛽 0 <0 , what does it mean? The confidence interval of intercept is -294, and But the intercept is the value of Y when X=0. Or the price of a diamond ring that not exist (X=0)? Actually, this is an example where we should consider what is the range of X (and hence Y) that actually should be considered meaningful or effective for a model. After discussing parameters beta0 and beta1 in the regression model, we will learn how to use the model to estimate the mean of Y or predict Y, in the next topic. It means nothing, this means we should consider the two extreme levels of predictors. The β€œeffective range” of your model.


Download ppt "Statistical inference for the slope and intercept in SLR"

Similar presentations


Ads by Google