Regression: (2) Multiple Linear Regression and Path Analysis Hal Whitehead BIOL4062/5062.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
Correlation and regression
Chapter 12 Simple Linear Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Probability & Statistical Inference Lecture 9
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression
Chapter 10 Simple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Multiple Linear Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Chapter 11 Multiple Regression.
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Lecture 5 Correlation and Regression
Correlation & Regression
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Correlation and Regression
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Chapter 12 Multiple Regression and Model Building.
Simple Linear Regression Models
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
General Linear Models; Generalized Linear Models Hal Whitehead BIOL4062/5062.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
MARE 250 Dr. Jason Turner Multiple Regression. y Linear Regression y = b 0 + b 1 x y = dependent variable b 0 + b 1 = are constants b 0 = y intercept.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Chapter 15 Multiple Regression and Model Building
Regression Analysis AGEC 784.
Chapter 9 Multiple Linear Regression
Multiple Regression.
Quantitative Methods Simple Regression.
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Presentation transcript:

Regression: (2) Multiple Linear Regression and Path Analysis Hal Whitehead BIOL4062/5062

Multiple Linear Regression and Path Analysis Multiple linear regression –assumptions –parameter estimation –hypothesis tests –selecting independent variables –collinearity –polynomial regression Path analysis

Regression One Dependent VariableY Independent VariablesX 1,X 2,X 3,...

Purposes of Regression 1. Relationship between Y and X's 2. Quantitative prediction of Y 3. Relationship between Y and X controlling for C 4. Which of X's are most important? 5. Best mathematical model 6. Compare regression relationships: Y 1 on X, Y 2 on X 7. Assess interactive effects of X's

Simple regression: one X Multiple regression: two or more X's Y = ß 0 + ß 1 X(1) + ß 2 X(2) + ß 3 X(3) ß k X(k) + E

Multiple linear regression: assumptions (1) For any specific combination of X's, Y is a (univariate) random variable with a certain probability distribution having finite mean and variance (Existence) Y values are statistically independent of one another (Independence) Mean value of Y given the X's is a straight linear function of the X's (Linearity)

Multiple linear regression: assumptions (2) The variance of Y is the same for any fixed combinations of X's (Homoscedasticity) For any fixed combination of X's, Y has a normal distribution (Normality) There are no measurement errors in the X's (Xs measured without error)

Multiple linear regression: parameter estimation Y = ß 0 + ß 1 X(1) + ß 2 X(2) + ß 3 X(3) ß k X(k) + E Estimate the ß 's in multiple regression using least squares Sizes of the coefficients not good indicators of importance of X variables Number of data points in multiple regression –at least one more than number of X’s –preferably 5 times number of X’s

Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Multiple regression of Y [Log (CNS)] on: X’ sßSE(ß) Log(Mass)-0.49(0.70) Log(Fat)-0.07(0.10) Log(Muscle)1.03(0.54) Log(Heart)0.42(0.22) Log(Bone)-0.07(0.30) N=39

Multiple linear regression: hypothesis tests Usually test: H0: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) ß j ⋅ X(j) + E H1: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) ß j ⋅ X(j) ß k ⋅ X(k) + E F-test with k-j, n-(k-j)-1 degrees of freedom (“partial F-test”) H0: variables X(j+1),…,X(k) do not help explain variability in Y

Multiple linear regression: hypothesis tests e.g. Test significance of overall multiple regression H0: Y = ß 0 + E H1: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) ß k ⋅ X(k) + E Test significance of –adding independent variable –deleting independent variable

Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Multiple regression of Y [Log (CNS)] on: X’ sßSE(ß)P Log(Mass)-0.49(0.70)0.49 Log(Fat)-0.07(0.10)0.52 Log(Muscle)1.03(0.54)0.07 Log(Heart)0.42(0.22)0.06 Log(Bone)-0.07(0.30)0.83 Tests whether removal of variable reduces fit

Multiple linear regression: selecting independent variables Reasons for selecting a subset of independent variables (X’s): –cost (financial and other) –simplicity –improved prediction –improved explanation

Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection –forward selection based upon improvement in fit –backward selection based upon improvement in fit –stepwise (backward/forward) Mallow’s C(p) AIC

Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection Mass, Bone, Heart, Muscle, Fat –forward selection based upon improvement in fit –backward selection based upon improvement in fit –Stepwise (backward/forward)

Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection –forward selection based upon improvement in fit –backward selection based upon improvement in fit –stepwise (backward/forward)

Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Complete model (r 2 =0.97): Forward stepwise (α-to-enter=0.15; α-to-remove=0.15): –1. Constant (r 2 =0.00) –2. Constant + Muscle (r 2 =0.97) –3. Constant + Muscle + Heart (r 2 =0.97) –4. Constant + Muscle + Heart + Mass (r 2 =0.97) xMass +1.24xMuscle xHeart

Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Complete model (r 2 =0.97): Backward stepwise (α-to-enter=0.15; α-to-remove=0.15): –1. All (r 2 =0.97) –2. Remove Bone (r 2 =0.97) –3. Remove Fat (r 2 =0.97) xMass +1.24xMuscle xHeart

Comparing models Mallow’s C(p) –C(p) = (k-p).F(p) + (2p-k+1) k parameters in full model; p parameters in restricted model F(p) is the F value comparing the fit of the restricted model with that of the full model –Lowest C(p) is best model Akaike Information Criteria (AIC) –AIC=n.Log(σ 2 ) +2p –Lowest AIC indicates best model –Can compare models not included in one another

Comparing models

Collinearity If two (or more) X’s are linearly related: –they are collinear –the regression problem is indeterminate X(3)=5.X(2)+16, or X(2)=4.X(1)+ 16.X(4) If they are nearly linearly related (near collinearity), coefficients and tests are very inaccurate

What to do about collinearity? Centering (mean = 0) Scaling (SD =1) Regression on first few Principal Components Ridge Regression

Curvilinear (Polynomial) Regression Y = ß 0 + ß 1 ⋅ X + ß 2 ⋅ X² + ß 3 ⋅ X ß k ⋅ X k + E Used to fit fairly complex curves to data ß’s estimated using least squares Use sequential partial F-tests, or AIC, to find how many terms to use –k>3 is rare in biology Better to transform data and use simple linear regression, when possible

Curvilinear (Polynomial) Regression Y= X Y= X X² Y= X X² X 3 From Sokal and Rohlf

Path Analysis

Models with causal structure Represented by path diagram All variables quantitative All path relationships assumed linear –(transformations may help) A B C D E

Path Analysis All paths one way –A => C –C => A No loops Some variables may not be directly observed: –residual variables (U) Some variables not observed but known to exist –latent variables (D) A B C D E U

Path Analysis Path coefficients and other statistics calculated using multiple regressions Variables are: –centered (mean = 0) so no constants in regressions –often standardized (SD = 1) So: path coefficients usually between -1 and +1 Paths with coefficients not significantly different from zero may be eliminated A B C D E U

Path Analysis: an example Isaak and Hubert “Production of stream habitat gradients by montane watersheds: hypothesis tests based on spatially explicit path analyses” Can. J. Fish. Aquat. Sci.

- - - Predicted negative interaction ________ Predicted positive interaction