Univariate Linear Regression Problem Model: Y= 0 + 1 X+ Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both null and alternative. Under null, var(Y)=σ 0 2. Under alternative, β 1 >0, and var(Y)=σ 1 2.
Step 1: Choose the test statistic and specify its null distribution Use conditions of the null to find:
Bringing sample size into regression design The sample size n is hidden in the regression results. That is, let:
Step 2: Define the critical value For the univariate linear regression test:
Step 3: Define the Rejection Rule Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value.
Step 4: Specify the Distribution of Test Statistic under Alternative Use conditions of the null to find:
Step 5: Define a Type II Error For the univariate linear regression test:
Step 6: Find β For a univariate linear regression test:
Basic Insight Notice that all three problems have the same basic structure. That is, if you understand the solution of the one sample test, then you can derive the answer to the other problems.
Step 7: Phrase requirement on β For example, we seek to “choose n so that β=0.01.” That is, “choose n so that Pr 1 {Accept H 0 }=β=0.01.
Step 7: Phrase requirement on β For example, we seek to “choose n so that
Step 7: Phrase requirement on β Notice the parallel phrasing:
Step 7: Phrase requirement on β That is, “choose n so that (note that E 0 =0):
Step 7: Phrase requirement on β That is, choose n so that (after algebraic clearing out):
Step 8: State the conclusion The result for a left sided test has to be worked through but is similar. You must remember to keep all entries positive. This is reasonable if both α and β are constrained to be less than or equal to 0.5. The restriction is not a hardship in practice.
Univariate Linear Regression Note that the σ 0 factor is changed to σ 0 /σ X. There is a similar adjustment for the alternative standard deviation.
Example Problem Group Two hundred values of an independent variable x i are chosen so that Σ(x i -xbar) 2 is equal to 400,000. For each setting of x i, the random variable Y i =β 0 +β 1 x i +σZ i is observed. Here β 0 and β 1 are fixed but unknown parameters, σ=400, and the Z i are independent standard normal random variables.
Example Problem Group The null hypothesis to be tested is H 0 : β 1 =0, α=0.01, and the alternative is H 1 : β 1 <0. The random variable B 1 is the OLS estimate of β 1.
Example Question 1 When H 0 is true, what is the standard deviation of B 1, the OLS estimate of the slope? Var(B 1 )=σ 2 /Σ(x i -xbar) 2 =400 2 /400,000=0.4. sd(B 1 )=0.632.
Example Question 2 What is the probability of a Type II error in the test specified in the common section using B 1, the OLS estimator of the slope, as test statistic when β 1 =-4, α=0.01, σ=400, and Σ(x i -xbar) 2 is equal to 400,000?
Solution to Question 2 The critical value is (0.632)=-1.47 A Type II error occurs when B 1 > Under alternative B 1 is normal with expected value -4 and standard deviation (error) Pr{B 1 >-1.47}=Pr{Z>(-1.47-(-4))/0.632} =Pr{Z>4.00}= The answer is
Example Question 3 How many observations n are necessary so that the probability of a Type II error in the test specified in the common section when β 1 =-4, α=0.01, σ=400, and Σ(x i -xbar n ) 2 is equal to 2,000n?
Outline of Solution to Problem 3 For σ o term, use (400 2 /2000) 0.5 =8.94. Use same value for σ 1 term. Use |z 0.01 |= Use |E 1 -E 0 |=|-4-0|=4. Square root of sample size is Sample size is 109 or more.
Chapter 21: Residual Analysis If the assumptions in regression are violated: –Residuals are one way of checking model: R i = Y i - Fitted value at x i
Checking the Assumptions –Check for normality (test of normality, histogram, q-q plots) –Check variance if it is the same for all values of the independent variable (plot residuals against predicted values) –Check independence (plot residuals against sequence variable) –Check for linearity (plot dependent variable against independent variable)
Residual Plots Plot residuals against independent variable. –Plot should be flat indicating the same variance. –There should be no fanning out pattern. –Check for influential observations. Plot residuals against predicted variable. –For univariate regression this is the same as the above plot. There should be no pattern.
What to do if problem? Can look for transformations of either independent or dependent variable or both. Using computer this is easy: compute option from menu bar.
Influential Points An easier way to look for points that have a large impact on the slope is to plot the change in slope against an arbitrary case sequence number.
Example Data set in the web page aim: predict final exam score from midterm score dependent variable: final exam score independent variable: midterm score model, check assumptions, predict
Output Model: Y= 0 + 1 X + R 2 = F statistics=60.91, Significance=0.0 1 = , t statistic=7.805, Significance=0.0 0 =238.95, t statistic=8.329, Significance=0.0
Next Class Multiple Regression! Check web site for your data file