Presentation is loading. Please wait.

Presentation is loading. Please wait.

ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith.

Similar presentations


Presentation on theme: "ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith."— Presentation transcript:

1 ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith

2 Overview for Today Review Simple Linear Regression, Ch 12 Go over problem 12.56 Multiple Linear Regression, Ch 13 (1-5) Multiple explanatory variables Coefficient of multiple determination Adjusted R 2 Residue Analysis F-test t test and confidence interval for slope Partial F-tests for each individual contributions Coefficients of partial determination Homework assignment

3 Regression Modeling Analysis of variance to “fit” a predictive model for a response (dependent) variable to a set of one or more explanatory (independent) variables Minimize residual error w.r.t. linear coefficients Interpolative over relevant range - do not extrapolative Typically linear, but may be curvilinear or more complex (w.r.t. independent variables) Related to Correlation Analysis - measuring the strength of association between variables Regression is about variance in the response variable Correlation is about co-variance - symmetric

4 Types of Regression Models Based on Scatter Plots Y vs X Dependent vs independent Linear Models Positive, negative or no slope Zero or non-zero intercept Curvilinear Models Positive, negative or no “slope” Positive, negative or varied curvature May be U shaped, with extrema May be asymptotically or piece-wise linear May be polynomial, exponential, inverse,…

5 Least-Square Linear Regression Simple Linear Model (for population) Y i =  0 +  1 X i +  i X i = value of independent variable Y i = observed value of dependent variable  0 = Y-intercept (Y at X=0)  1 = slope (  Y/  X)  i = random error for observation i Y i ’ = b 0 + b 1 X i (predicted value) b 0 and b 1 are called regression coefficients e i = Y i - Y i ’ (residual) Minimize  e i 2 for sample with respect to b 0 and b 1

6 Partitioning of Variation Total variation Regression variation Random variation (Mean response) SST = SSR + SSE Coefficient of Determination r 2 = SSR/SST Standard Error of the Estimate

7 Partitioning of Variation - Graphically

8 Assumptions of Regression (and Correlation) Normality of error about regression line Homoscedasticity (equal variance) along X Independence of errors with respect to X No autocorrelation in time Analysis of residuals to test assumptions Histogram, Box-and-Whisker plots Normalcy plot Ordered plots (by X, by time,…) See figures on pp 584-5

9 t Test for Slope H 0 :  1 = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom

10 F Test for Single Regression F = MSR / MSE Reject H 0 if F > F U ( ,1,n-2) [or p<  ] Note: t 2 ( ,n-2) = F U ( ,1,n-2) One-Way ANOVA Summary SourceDegrees of Freedom (df) Sum of Squares (SS) Mean Square (MS) (Variance) Fp-value Regression1SSRMSR = SSRMSR/ MSE Errorn-2SSEMSE = SSE/(n-2) Totaln-1SST

11 Confidence and Prediction Intervals Confidence Interval Estimate for the Slope Confidence Interval Estimate for the Mean Confidence Interval Estimate for Individual Response See Fig 12.16, p 592

12 Pitfalls Not testing assumptions of least-square regression by analyzing residuals, looking for Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8 Not being aware of alternatives to least-square regression when assumptions are violated Not knowing subject matter being modeled

13 Computing by Hand Slope Y-Intercept

14 Computing by Hand Measures of Variation

15 Coefficient of Correlation For a regression For a correlation Covariance Also called… Pearson’s product-moment correlation coefficient

16 t Test for Correlation H 0 :  = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom Compared to F U ( ,1,n-2) = t 2 ( ,n-2) Or

17 Multiple Regression Linear model - multiple dependent variables Y i =  0 +  1 X 1i + … +  j X ji +  i X ji = value of independent variable Y i = observed value of dependent variable  0 = Y-intercept (Y at X=0)  j = slope (  Y/  X j )  i = random error for observation i Y i ’ = b 0 + b 1 X i + … + b j X ji (predicted value) The b j ’s are called the regression coefficients e i = Y i - Y i ’ (residual) Minimize  e i 2 for sample with respect to all b j

18 Partitioning of Variation Total variation Regression variation Random variation (Mean response) SST = SSR + SSE Coefficient of Multiple Determination R 2 Y.12..k = SSR/SST Standard Error of the Estimate

19 Adjusted R 2 To account for sample size (n) and number of dependent variables (k) for comparison purposes

20 Residual Analysis Plot residuals vs Y i ’ (predicted values) X 1, X 2,…,X k Time (for autocorrelation) Check for Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8

21 F Test for Multiple Regression F = MSR / MSE Reject H 0 if F > F U ( ,k,n-k-1) [or p<  ] k = number of independent variables One-Way ANOVA Summary SourceDegrees of Freedom (df) Sum of Squares (SS) Mean Square (MS) (Variance) Fp-value RegressionkSSRMSR = SSR/kMSR/ MSE Errorn-k-1SSEMSE = SSE/(n-k-1) Totaln-1SST

22 Alternate F-Test Compared to F U ( ,k,n-k-1)

23 t Test for Slope H 0 :  j = 0 Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom See output from PHStat

24 Confidence and Prediction Intervals Confidence Interval Estimate for the Slope Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response Beyond the scope of this text

25 Partial F Tests Significance test for contribution from individual independent variable Measure of incremental improvement All others already taken into account F j = SSR(X j |{X i≠j }) / MSE SSR(X j |{X i≠j }) = SSR - SSR({X i≠j }) Reject H 0 if F j > F U ( ,1,n-k-1) [or p<  ] Note: t 2 ( ,n-k-1) = F U ( ,1,n-k-1)

26 Coefficients of Partial Determination See PHStat output in Fig 13.10, p 637

27 Homework Review “Multiple Regression”, 13.1-5 Work through Appendix 13.1 Work and hand in Problem 13.62 Read “Multiple Regression”, 13.6-11 Quadratic model Dummy-variable model Using transformations Collinearity (VIF) Modeling building C p statistic and stepwise regression Preview problems 13.63-13.67


Download ppt "ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith."

Similar presentations


Ads by Google