1 732G21/732A35/732G28
Formal statement Y i is i th response value β 0 β 1 model parameters, regression parameters (intercept, slope) X i is i th predictor value is i.i.d. normally distributed random vars with expectation zero and variance σ 2 732G21/732A35/732G282
Inference about regression coefficients and response: Interval estimates and test concerning coefficients Confidence interval for Y Prediction interval for Y ANOVA-table 732G21/732A35/732G283
After fitting the data, we may obtain a regr. line Is significant or just because of random variation? (hence, no linear dependence between Y and X) How to do? ◦ Use Hypothesis testing (later) ◦ Derive confindence interval for β 0. If ”0” does not fall within this interval, there is dependence 732G21/732A35/732G284
Estimated slope b 1 is a random variable (look at formula) Properties of b 1 Normally distributed (show) E(b 1 )= β 1 Variance Further: Test statistics is distributed as t(n-2) 732G21/732A35/732G285
See table B.2 (p. 1317) Example one-sided interval t(95%), 15 observations t 13 = G21/732A35/732G286
Confidence interval for β 1 (show…) If variance in the data is unknown, Example Compute confidence interval for slope, Salary dataset 732G21/732A35/732G287
8
Often, we have sample and we test at some confidence level α How to do? Step 1: Find and compute appropriate test function T=T(sample,λ 0 ) Step 2: Plot test function’s distrubution and mark a critical area dependent on α If T is in the critical area, reject H 0 otherwise do not reject H 0 (accept H 1 ) 732G21/732A35/732G289
Test Step 1: compute Step 2: Plot the distribution, mark the points and the critical area. Step 3: define where t* is and reject H 0 if it is in the critical area Example Test the hypothesis for Salary dataset: Manually, compute also P-values By Minitab 732G21/732A35/732G2810
Sometimes, we need to know ” β 0 =0?” Do confidence intervals and hypothesis testing in the same way using folmulas below! Properties of b 0 Normally distributed (show) E(b 0 )= β 0 Variance (show..) Further: Test statistics is distributed as t(n-2) 732G21/732A35/732G2811
If distribution not normal (if slightly, OK, otherwise asymptotic) Spacing affects variance (larger spacing –smaller variance) Example Test β 0 =0 for Salary data 732G21/732A35/732G2812
Estimate at X=X h (X h – any): Properties of E(Y h ) Normally distributed (show) Variance Further: Test statistics is distributed as t(n-2) Confidence interval 732G21/732A35/732G2813
Make a plot… 732G21/732A35/732G2814 POINT ESTIMATE CONFIDENCE INTERVAL We estimate the position of the mean in the population with X = X h PREDICTION INTERVAL We estimate the position of the individual observation in the population with X = X h
When parameters are unknown, the mean E(Y h ) may have more than one possible location New observation = mean + random error -> prediction interval should be wider 732G21/732A35/732G2815
Further: Test statistics is distributed as t(n-2) Prediction interval How to estimate s(pred) ? New observ. is any within b 0 +b 1 X h +ε. Hence Standard error (show) 732G21/732A35/732G2816
Example Calculate confidence and prediction intervals for 35 years old person Compare with output in Minitab 732G21/732A35/732G2817
Total sum of squares Error sum of squares Regression sum of squares 732G21/732A35/732G2818
SSTO has n-1 (sum up to zero) SSE has n-2 ( 2 model parameters) SSR has 1 (fitted values lie on regression line= 2 degrees- sum up to zero 1 degree) n-1 = n SSTO =SSE + SSR Important : MSxx= SSxx/degrees_of_freedom 732G21/732A35/732G2819
ANOVA table 732G21/732A35/732G2820 Source of variation SSdfMS Regression1 Errorn - 2 Totaln - 1
Expected mean squares E(MSE) does not depend on the slope, even when zero E(MSR) =E(MSE) when slope is zero -> IF MSR much more than MSE, slope is not zero, if approximately same, can be zero 732G21/732A35/732G2821
Test statistics F* = MSR/MSE, use F(1,n-2) (see p. 1320) Decision rules: If F* > F(1-α;1, n-2) conclude H a If F* ≤ F(1-α;1, n-2) conclude H 0 Note: F test and t test about β 1 are equivalent 732G21/732A35/732G2822
General approach Full model: (linear) Reduced model: (constant) 732G21/732A35/732G2823
It is known (why?..) SSE(F)≤SSE(R). Large difference -different models, small difference – can be same Test statistics For univariate linear model, equivalent to F* = MSR/MSE F* belongs to F(df R -df F,df F ) distribution (plot critical area..) Test rule: F*> F(1-α; df R -df F,df F ) reject H 0 732G21/732A35/732G2824
Example For Salary dataset Compose ANOVA table and compare with MINITAB Perform F-test and compare with MINITAB 732G21/732A35/732G2825
Coefficient of determination: Coefficient of correlation: Limitations: High R does not mean a good fit Low R does not mean than X and Y are not related Example: For Salary dataset, compute R 2 and compare with MINITAB 732G21/732A35/732G2826
Chapter 2 up to page G21/732A35/732G2827