ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith.

ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith

Overview for Today Review Simple Linear Regression, Ch 12 Go over problem 12.56 Multiple Linear Regression, Ch 13 (1-5) Multiple explanatory variables Coefficient of multiple determination Adjusted R 2 Residue Analysis F-test t test and confidence interval for slope Partial F-tests for each individual contributions Coefficients of partial determination Homework assignment

Regression Modeling Analysis of variance to “fit” a predictive model for a response (dependent) variable to a set of one or more explanatory (independent) variables Minimize residual error w.r.t. linear coefficients Interpolative over relevant range - do not extrapolative Typically linear, but may be curvilinear or more complex (w.r.t. independent variables) Related to Correlation Analysis - measuring the strength of association between variables Regression is about variance in the response variable Correlation is about co-variance - symmetric

Types of Regression Models Based on Scatter Plots Y vs X Dependent vs independent Linear Models Positive, negative or no slope Zero or non-zero intercept Curvilinear Models Positive, negative or no “slope” Positive, negative or varied curvature May be U shaped, with extrema May be asymptotically or piece-wise linear May be polynomial, exponential, inverse,…

Least-Square Linear Regression Simple Linear Model (for population) Y i =  0 +  1 X i +  i X i = value of independent variable Y i = observed value of dependent variable  0 = Y-intercept (Y at X=0)  1 = slope (  Y/  X)  i = random error for observation i Y i ’ = b 0 + b 1 X i (predicted value) b 0 and b 1 are called regression coefficients e i = Y i - Y i ’ (residual) Minimize  e i 2 for sample with respect to b 0 and b 1

Partitioning of Variation Total variation Regression variation Random variation (Mean response) SST = SSR + SSE Coefficient of Determination r 2 = SSR/SST Standard Error of the Estimate

Partitioning of Variation - Graphically

Assumptions of Regression (and Correlation) Normality of error about regression line Homoscedasticity (equal variance) along X Independence of errors with respect to X No autocorrelation in time Analysis of residuals to test assumptions Histogram, Box-and-Whisker plots Normalcy plot Ordered plots (by X, by time,…) See figures on pp 584-5

t Test for Slope H 0 :  1 = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom

F Test for Single Regression F = MSR / MSE Reject H 0 if F > F U ( ,1,n-2) [or p<  ] Note: t 2 ( ,n-2) = F U ( ,1,n-2) One-Way ANOVA Summary SourceDegrees of Freedom (df) Sum of Squares (SS) Mean Square (MS) (Variance) Fp-value Regression1SSRMSR = SSRMSR/ MSE Errorn-2SSEMSE = SSE/(n-2) Totaln-1SST

Confidence and Prediction Intervals Confidence Interval Estimate for the Slope Confidence Interval Estimate for the Mean Confidence Interval Estimate for Individual Response See Fig 12.16, p 592

Pitfalls Not testing assumptions of least-square regression by analyzing residuals, looking for Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8 Not being aware of alternatives to least-square regression when assumptions are violated Not knowing subject matter being modeled

Computing by Hand Slope Y-Intercept

Computing by Hand Measures of Variation

Coefficient of Correlation For a regression For a correlation Covariance Also called… Pearson’s product-moment correlation coefficient

t Test for Correlation H 0 :  = 0 Critical t value based on chosen level of significance, , and n-2 degrees of freedom Compared to F U ( ,1,n-2) = t 2 ( ,n-2) Or

Multiple Regression Linear model - multiple dependent variables Y i =  0 +  1 X 1i + … +  j X ji +  i X ji = value of independent variable Y i = observed value of dependent variable  0 = Y-intercept (Y at X=0)  j = slope (  Y/  X j )  i = random error for observation i Y i ’ = b 0 + b 1 X i + … + b j X ji (predicted value) The b j ’s are called the regression coefficients e i = Y i - Y i ’ (residual) Minimize  e i 2 for sample with respect to all b j

Partitioning of Variation Total variation Regression variation Random variation (Mean response) SST = SSR + SSE Coefficient of Multiple Determination R 2 Y.12..k = SSR/SST Standard Error of the Estimate

Adjusted R 2 To account for sample size (n) and number of dependent variables (k) for comparison purposes

Residual Analysis Plot residuals vs Y i ’ (predicted values) X 1, X 2,…,X k Time (for autocorrelation) Check for Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8

F Test for Multiple Regression F = MSR / MSE Reject H 0 if F > F U ( ,k,n-k-1) [or p<  ] k = number of independent variables One-Way ANOVA Summary SourceDegrees of Freedom (df) Sum of Squares (SS) Mean Square (MS) (Variance) Fp-value RegressionkSSRMSR = SSR/kMSR/ MSE Errorn-k-1SSEMSE = SSE/(n-k-1) Totaln-1SST

Alternate F-Test Compared to F U ( ,k,n-k-1)

t Test for Slope H 0 :  j = 0 Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom See output from PHStat

Confidence and Prediction Intervals Confidence Interval Estimate for the Slope Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response Beyond the scope of this text

Partial F Tests Significance test for contribution from individual independent variable Measure of incremental improvement All others already taken into account F j = SSR(X j |{X i≠j }) / MSE SSR(X j |{X i≠j }) = SSR - SSR({X i≠j }) Reject H 0 if F j > F U ( ,1,n-k-1) [or p<  ] Note: t 2 ( ,n-k-1) = F U ( ,1,n-k-1)

Coefficients of Partial Determination See PHStat output in Fig 13.10, p 637

Homework Review “Multiple Regression”, 13.1-5 Work through Appendix 13.1 Work and hand in Problem 13.62 Read “Multiple Regression”, 13.6-11 Quadratic model Dummy-variable model Using transformations Collinearity (VIF) Modeling building C p statistic and stepwise regression Preview problems 13.63-13.67

ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith.

Similar presentations

Presentation on theme: "ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith.

Similar presentations

Presentation on theme: "ENGR 610 Applied Statistics Fall 2007 - Week 11 Marshall University CITE Jack Smith."— Presentation transcript:

Similar presentations

About project

Feedback