Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.

Topic 6: Estimation and Prediction of Y h

Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band for the entire regression line

Estimation of E(Y h ) E(Y h ) = μ h = β 0 + β 1 X h, the mean value of Y for the subpopulation with X=X h We will estimate E(Y h ) by KNNL use for this estimate, see equation (2.28) on pp 52

Theory for Estimation of E(Y h ) is Normal with mean μ h and variance The Normality is a consequence of the fact that b 0 + b 1 X h is a linear combination of Y i ’s See KNNL pp 52-54 for details

Application of the Theory We estimate σ 2 ( ) by It then follows that Details for confidence intervals and significance tests are consequences

95% Confidence Interval for E(Y h ) ± t c s( ) where t c = t(.975, n-2) NOTE: significance tests can be constructed but they are rarely used in practice

Toluca Company Example (pg 19) Manufactures refrigeration equipment One replacement part manufactured in lots of varying sizes Company wants to determine the optimum lot size To do this, company needs to first describe the relationship between work hours and lot size

Scatterplot w/ regr line

***Generating the data set***; data toluca; infile ‘../data/CH01TA01.txt'; input lotsize hours; data other; size=65; output; size=100; output; data toluca1; set toluca other; proc print data=toluca1; run; SAS CODE

***Generating the confidence intervals for all values of X in the data set***; proc reg data=toluca1; model hours=size/clm; id lotsize; run; SAS CODE clm option generates confidence intervals for the mean

Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept162.3658626.177432.380.0259 lotsize13.570200.3469710.29<.0001 Output Statistics Obslotsize Dependent Variable Predicted Value Std Error Mean Predict95% CL Mean 180399.0000347.982010.3628326.5449369.4191 2570323.0000312.28009.7647292.0803332.4797 2665.294.42909.9176273.9129314.9451 27100.419.386114.2723389.8615448.9106

Notes Standard error affected by how far X h is from (see Figure 2.6) Recall teeter-totter idea…a change in the slope has bigger impact on Y as you move away from

Prediction of Y h(new) Want to predict value for a new observation at X=X h Model: Y h(new) = β 0 + β 1 X h +  Since E(e)=0 same value as for E(Y h ) Prediction interval, however, relies heavily on assumption that e are Normally distributed Note!!

Prediction of Y h(new) Var(Y h(new) )=Var( )+Var(ξ ) Then follows that

Notes Procedure can be modified for the mean of m observations at X=X h (see 2.39a and 239b on page 60) Standard error affected by how far X h is from (see Figure 2.6)

***Generating the prediction intervals for all values of X in data set***; proc reg data=toluca1; model hours=lotsize/cli; id lotsize; run; SAS CODE cli option generates prediction interval for a new observation

Output Statistics Obslotsize Dependent Variable Predicted Value Std Error Mean Predict95% CL Predict 180399.0000347.982010.3628244.7333451.2307 2570323.0000312.28009.7647209.2811415.2789 2665.294.42909.9176191.3676397.4904 27100.419.386114.2723314.1604524.6117 These are wrong…same as before. Does not include variability about regression line

Notes The standard error (Std Error Mean Predict)given in this output is the standard error of not s(pred) The prediction interval is correct and wider than the previous confidence interval

Notes To get correct standard error need to add the variance about the regression line

Confidence band for regression line ± Ws( ) where W 2 =2F(1-α; 2, n-2) This gives combined “confidence intervals” for all X h Boundary values of confidence bands define a hyperbola Will be wider at X h than single CI

Confidence band for regression line Theory comes from the joint confidence region for (β 0, β 1 ) which is an ellipse (Stat 524) We can find an alpha for t c that gives the same results We find W 2 and then find the alpha for t c that will give W = t c

data a1; n=25; alpha=.10; dfn=2; dfd=n-2; tsingle=tinv(1-alpha/2,dfd); w2=2*finv(1-alpha,dfn,dfd); w=sqrt(w2); alphat=2*(1-probt(w,dfd)); t_c=tinv(1-alphat/2,dfd); output; proc print data=a1; run; SAS CODE

nalphadfndfdtsinglew2walphatt_c 250.12231.713875.098582.258000.0337402.25800 SAS OUTPUT Used for single 90% CI Used for 90% confidence band

symbol1 v=circle i=rlclm97; proc gplot data=toluca; plot hours*lotsize; run; SAS CODE

Estimation of E(Y h ) and Prediction of Y h = b 0 + b 1 X h

symbol1 v=circle i=rlclm95; proc gplot data=toluca; plot hours*lotsize; symbol1 v=circle i=rlcli95; proc gplot data=toluca; plot hours*lotsize; run; SAS CODE Confidence intervals Prediction intervals

Confidence band

Confidence intervals

Prediction intervals

Background Reading Program topic6.sas has the code for the various plots and calculations; Sections 2.7, 2.8, and 2.9

Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.

Similar presentations

Presentation on theme: "Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.

Similar presentations

Presentation on theme: "Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band."— Presentation transcript:

Similar presentations

About project

Feedback