Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

Review of Univariate Linear Regression BMTRY 726 3/4/14.
5/11/ lecture 71 STATS 330: Lecture 7. 5/11/ lecture 72 Prediction Aims of today’s lecture  Describe how to use the regression model to.
5/18/ lecture 101 STATS 330: Lecture 10. 5/18/ lecture 102 Diagnostics 2 Aim of today’s lecture  To describe some more remedies for non-planar.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Linear Regression Exploring relationships between two metric variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Correlation and Regression Analysis
Simple Linear Regression: An Introduction Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney.
Multiple Regression Analysis. General Linear Models  This framework includes:  Linear Regression  Analysis of Variance (ANOVA)  Analysis of Covariance.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
SPH 247 Statistical Analysis of Laboratory Data April 9, 2013SPH 247 Statistical Analysis of Laboratory Data1.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
Regression and Correlation Methods Judy Zhong Ph.D.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
SWC Methodology - TWG February 19, 2015 Settlement Document Subject to I.R.E. 408.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Lecture 5: SLR Diagnostics (Continued) Correlation Introduction to Multiple Linear Regression BMTRY 701 Biostatistical Methods II.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Lecture 4: Inference in SLR (continued) Diagnostic approaches in SLR BMTRY 701 Biostatistical Methods II.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Assessing Survival: Cox Proportional Hazards Model
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
23-1 Multiple Covariates and More Complicated Designs in ANCOVA (§16.4) The simple ANCOVA model discussed earlier with one treatment factor and one covariate.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Lecture 13 Diagnostics in MLR Variance Inflation Factors Added variable plots Identifying outliers BMTRY 701 Biostatistical Methods II.
Regression and Analysis Variance Linear Models in R.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Regression Model Building LPGA Golf Performance
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
 Chapter 3! 1. UNIT 7 VOCABULARY – CHAPTERS 3 & 14 2.
Linear Models Alan Lee Sample presentation for STATS 760.
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and regression
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Lecture 12 Model Building
Presentation transcript:

Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II

Graphical Displays in MLR  No more one simple scatterplot: need to look at multiple pairs of variables  “pairs” in R.  but, we cant look at all in regards to the way thet enter the model  solution: adjusted variable plot

Adjusted Variable Plots  Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model.  With two covariates: Shows the association between X and Y adjusted for another variable, Z.  With more than two covariates: Shows the association between X and Y adjusted for many other covariates  In our example, association between logLOS and number of nurses, adjusted for number of beds

Approach  Assume we want to look at the association of Y and X, adjusted for Z  Step 1: Regress Y on X and save residuals (res.xy)  Step 2: Regress Z on X and save residuals (res.xz)  Step 3: plot res.xy versus res.xz  Optional step 4: perform regression of res.xy on res.xz compare slope to that of MLR of Y on X and Z

SENIC

R pairs(~INFRISK+BEDS+logLOS, data=data, pch=16) # adjusted variable plot approach # look at the association between INFRISK and logLOS, # adjusting for BEDS reg.xy <- lm(logLOS ~ BEDS, data=data) res.xy <- reg.xy$residuals reg.xz <- lm(INFRISK ~ BEDS, data=data) res.xz <- reg.xz$residuals plot(res.xz, res.xy, pch=16) reg.res <- lm(res.xy ~ res.xz) abline(reg.res, lwd=2) reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data)

Why is this important or interesting?  It shows us the ‘adjusted’ relationship  it can help us determine if it is an important variable (at all) if another form of X is more appropriate if the correlation is high vs. low after adjustment we need to/want to adjust for this variable  It also informs us about why a variable ‘loses’ significance  Most important: check for non-linearity  Example: logLOS ~ NURSE

What about BEDS and NURSE? # why NURSE is not associated, after adjustment for BEDS? reg.nurse <- lm(logLOS ~ NURSE, data=data) reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data) reg.xy <- lm(logLOS ~ BEDS, data=data) res.xy <- reg.xy$residuals reg.xz <- lm(NURSE ~ BEDS, data=data) res.xz <- reg.xz$residuals plot(res.xz, res.xy, pch=16) reg.res <- lm(res.xy ~ res.xz) abline(reg.res, lwd=2)

What about the other way around? ####################### # what about the other way? what about why BEDS is # assoc after adjustment for NURSE? reg.xy <- lm(logLOS ~ NURSE, data=data) res.xy <- reg.xy$residuals reg.xz <- lm(BEDS ~ NURSE, data=data) res.xz <- reg.xz$residuals plot(res.xz, res.xy, pch=16) reg.res <- lm(res.xy ~ res.xz) abline(reg.res, lwd=2) reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data)

Interpretation in MLR  “Adjusted for”  “Controlled for “  “Holding all else constant”  In MLR, you need to include one of these phrases (or something like one of them) when interpreting a regression coefficient

LOS ~ INFRISK + BEDS > reg.infrisk.beds <- lm(LOS ~ BEDS + INFRISK, data=data) > summary(reg.infrisk.beds) Call: lm(formula = LOS ~ BEDS + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) < 2e-16 *** BEDS ** INFRISK e-07 *** ---

Hard to interpret with so many decimal places! > data$beds100 <- data$BEDS/100 > reg.infrisk.beds <- lm(LOS ~ beds100 + INFRISK, data=data) > summary(reg.infrisk.beds) Call: lm(formula = LOS ~ beds100 + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) < 2e-16 *** beds ** INFRISK e-07 *** ---

logLOS ~ INFRISK + BEDS > reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data) > summary(reg.infrisk.beds) Call: lm(formula = logLOS ~ BEDS + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.926e e < 2e-16 *** BEDS 2.407e e ** INFRISK 6.048e e e-07 *** ---

Hard to interpret with so many decimal places! > data$beds100 <- data$BEDS/100 > reg.infrisk.beds100 <- lm(logLOS ~ beds100 + INFRISK, data=data) > summary(reg.infrisk.beds100) Call: lm(formula = logLOS ~ beds100 + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) < 2e-16 *** beds ** INFRISK e-07 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 110 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 31.1 on 2 and 110 DF, p-value: 1.971e-11

How to interpret?  Pick two values of BEDS e.g. 100 to 200 e.g. 400 to 500  Estimate the difference in logLOS for each value  What do we plug in for INFRISK?

How to interpret?  Remember that our inferences are “holding all else constant”  To compare two hospitals with the same INFRISK, it doesn’t matter what you put in

How to interpret? Comparing two hospitals whose number of beds differ by 100 and assuming the same infection risk in the two hospitals is the same, the ratio of average LOS in the two hospitals is 1.02 with the hospital with more beds having the longer stay.

difference of 400 beds?

When outcome is log transformed  interpretation of coefficients can be made as RATIOS instead of DIFFERENCES  Need to exponentiate the coefficient.  its interpretation is the ratio for a one-unit difference in the predictor.