Multiple Linear Regression.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
The Multiple Regression Model.
Statistical Techniques I EXST7005 Multiple Regression.
Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
Topic 15: General Linear Tests and Extra Sum of Squares.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Linear Models Alan Lee Sample presentation for STATS 760.
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
EXTRA SUMS OF SQUARES  Extra sum squares  Decomposition of SSR  Usage of Extra sum of Squares  Coefficient of partial determination.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 9.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression.
Chapter 15 Multiple Regression Model Building
Lecture 11: Simple Linear Regression
Inference for Least Squares Lines
Chapter 12 Simple Linear Regression and Correlation
Chapter 9 Multiple Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
BPK 304W Regression Linear Regression Least Sum of Squares
Regression model with multiple predictors
Chapter 13 Simple Linear Regression
Slides by JOHN LOUCKS St. Edward’s University.
CHAPTER 29: Multiple Regression*
Multiple Regression.
6-1 Introduction To Empirical Models
Prepared by Lee Revere and John Large
Multiple Regression Models
Hypothesis testing and Estimation
Checking the data and assumptions before the final analysis.
Regression Forecasting and Model Building
Multivariate Linear Regression
Multiple Linear Regression
Statistics for Business and Economics
Chapter 13 Additional Topics in Regression Analysis
Statistics 350 Lecture 18.
Presentation transcript:

Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity. Errors in X variables. September 19, 2018 AGR206

Concept & Uses. Did you know? REGRESSION ANOVA Did you know? Description restricted to data set. Did biomass increase with pH in the sample? Prediction of Y. How much biomass we expect to find in certain soil conditions? Extrapolation for new conditions: can we predict biomass in other estuaries? Estimation and understanding. How much does biomass change per unit change in pH and controlling for other factors? Control of process: requires causality. Can we create sites with certain biomass by changing the pH? September 19, 2018 AGR206

Body fat example in JMP. Three variables (X1, X3, X3) were measured to predict body fat % (Y) in people. Random sample of people. Y was measured by an expensive and very accurate method (assume it reveals true %fat). X1: thickness of triceps skinfold X2: thigh circumference X3: midarm circumference. Bodyfat.jmp September 19, 2018 AGR206

Ho’s or values “of interest” Does thickness of triceps skinfold contribute significantly to predict fat content? What is the CI for fat content for a person whose X’s have been measured? Do I have more or less fat than last summer? Do I have more fat than recommended? September 19, 2018 AGR206

Model and Assumptions. Linear, additive model to relate Y to p independent variables. Note: here, p is number of variables, but some authors use p for number of parameters, which is one more than variables due to the intercept. Yi=b0+ b1 Xi1+…+ bp Xip+ei where ei are normal and independent random variables with common variance s2. In matrix notation the model and solution are exactly the same as for SLR: Y= Xb + e b=(X’X)-1(X’Y) All equations from SLR apply without change. September 19, 2018 AGR206

Linear models Linear, and intrinsically linear models. Linearity refers to the parameters. The model can involve any function of X’s for as long as they do not have parameters that have to be adjusted. A linear model does not always produce a hyperplane. Yi=b0+ b1 f1(Xi1)+…+ bp fp(Xi1)+ei Polynomial regression. Is a special case where the functions are powers of X. September 19, 2018 AGR206

Matrix Equations September 19, 2018 AGR206

Extra Sum of Squares Effects of order of entry on SS. The 4 types of SS. Partial correlation. September 19, 2018 AGR206

Extra Sum of Squares:body fat September 19, 2018 AGR206

Response plane and error The response surface in more than 3D is a hyperplane. X1 X2 Yi E{Yi} Y September 19, 2018 AGR206

Model development What variables to include. Depends on objective: descriptive -> no need to reduce number of variables. Prediction and estimation of Yhat: OK to reduce for economical use. Estimation of b and understanding: sensitive to deletions; may bias MSE and b. No real solution other than getting more data from better experiment. (Sorry!) September 19, 2018 AGR206

Variable Selection Effects of elimination of variables: MSE is positively biased unless true b for variables eliminated is 0. bhat and Yhat are biased unless previous condition or variables eliminated are orthogonal to those retained. Variance of estimated parameters and predictions is usually lower. There are conditions for which MSE for reduced model (including variance and bias2) is smaller. September 19, 2018 AGR206

Criteria for variable selection R2 - Coefficient of determination. R2 = SSReg/SSTotal MSE or MSRes - Mean squared residuals. if all X’s in it estimates s2. R2adj - Adjusted R2. R2adj = 1-MSE/MSTo = =1-[(n-1)/(n-p)] (SSE/SSTo) Mallow’s Cp Cp=[SSRes/MSEFull] + 2 p- n (p=number of parameters) September 19, 2018 AGR206

Example September 19, 2018 AGR206

Checking assumptions. Note that although we have many X’s, errors are still in a single dimension. Residual analysis is performed as for SLR, sometimes repeated over different X’s. Normality. Use proc univ normal option. Transform. Homogeneity of variance. Plot error vs. each X. Transform. Weighted least squares. Independence of errors. Adequacy of model. Plots errors. LOF. Influence and outliers. Use influence option in proc reg. Collinearity. Use collinoint option of proc reg. September 19, 2018 AGR206

code for PROC REG data s00.spart2; set s00.spartina; colin=2*ph+0.5*acid+sal+rannor(23); run; proc reg data=s00.spart2; model bmss= colin h2s sal eh7 ph acid p k ca mg na mn zn cu nh4 / r influence vif collinoint stb partial; run; model colin=ph sal acid; September 19, 2018 AGR206

Spartina ANOVA output Model: MODEL1 Dependent Variable: BMSS Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 15 16369583.2 1091305.552 11.297 0.0001 Error 29 2801379.9 96599.307 C Total 44 19170963.2 Root MSE 310.80429 R-square 0.8539 Dep Mean 1000.80000 Adj R-sq 0.7783 C.V. 31.05558 September 19, 2018 AGR206

Parameters and VIF September 19, 2018 AGR206 Parameter Estimates Parameter Standard T for H0: Standardized Variable DF Estimate Error Parameter=0 Prob > |T| Estimate INTERCEP 1 3809.233562 3038.081 1.254 0.2199 0.00000000 COLIN 1 -178.317065 58.718 -3.037 0.0050 -1.06227792 H2S 1 0.336242 2.656 0.127 0.9001 0.01563626 SAL 1 150.513276 61.960 2.429 0.0216 0.84818417 EH7 1 2.288694 1.785 1.282 0.2099 0.12813770 PH 1 486.417077 306.756 1.586 0.1237 0.91891994 ACID 1 -24.816449 109.856 -0.226 0.8229 -0.09422943 P 1 0.153015 2.417 0.063 0.9500 0.00639498 K 1 -0.733250 0.439 -1.668 0.1061 -0.33059243 CA 1 -0.137163 0.111 -1.230 0.2286 -0.35706572 MG 1 -0.318586 0.243 -1.308 0.2010 -0.45340287 NA 1 -0.005294 0.022 -0.239 0.8127 -0.05520175 MN 1 -4.279887 4.836 -0.885 0.3835 -0.15872971 ZN 1 -26.270852 19.452 -1.351 0.1873 -0.32953283 CU 1 346.606818 99.295 3.491 0.0016 0.54452366 NH4 1 0.539373 3.061 0.176 0.8614 0.03862822 Variance Inflation 0.00000000 24.28364757 3.02785626 24.19556405 1.98216733 66.64921013 34.53131689 2.02507775 7.79660017 16.72702792 23.82835726 10.57323219 6.38589662 11.81574077 4.82931410 9.53842459 September 19, 2018 AGR206