Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
GENERAL LINEAR MODELS: Estimation algorithms
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 17 Making Sense of Advanced Statistical Procedures in Research Articles.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Lecture 6: Multiple Regression
Multiple Regression and Correlation Analysis
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Multiple Regression Research Methods and Statistics.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Correlation and Regression
Simple Linear Regression
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Linear correlation and linear regression + summary of tests
HSRP 734: Advanced Statistical Methods July 17, 2008.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Chapter 13 Multiple Regression
Adjusted from slides attributed to Andrew Ainsworth
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Examining Relationships in Quantitative Research
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Logistic Regression Analysis Gerrit Rooks
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Multiple Regression David A. Kenny January 12, 2014.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Multivariate Statistics.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
BINARY LOGISTIC REGRESSION
CHAPTER 7 Linear Correlation & Regression Methods
Regression Analysis Simple Linear Regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
What is Regression Analysis?
Introduction to Logistic Regression
Product moment correlation
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses

What and Why?  Multivariate analysis (MVA) techniques allow more than two variables to be analysed at once.  Compared with univariate or bivariate  Data richness with computational technologies advanced  Data reductions or classifications  eg., Factor analysis, Principal Component Analysis(PCA)  Several variables are potentially correlated with some degree  potential confounding  bias the result  eg., Analysis of Covariance (ANCOVA), Multiple Linear or Generalized Linear Regression Models

What and Why ?  Many variables are all interrelated with multiple dependent and independent variables  eg., Multivariate Analysis of Variance (MANOVA), Path Models, Structural Equation Models(SEM), Partially Least Square(PLS) Models.  This Session will focus on multiple regression models.

Why regression models?  To reduce “Random Noise” in Data => better variance estimations by adding source of variability of your dependent variables  eg. ANCOVA  To determine a optimal set of predictors => predictive models  eg. Variable selection procedures for multiple regression models  To adjust for potential confounding effects  eg, regression models with covariates

Actual mathematical Models  ANOVA Y ij =μ+τ i + ij,, where Y ij represents the j th observation (j=1,2,…,n) on the i th treatment (i=1,2,…,l levels). The errors ij are assumed to be normally and independently (NID) distributed, with mean zero and variance σ 2.  ANCOVA with k number of covariates Y ij =μ+τ i +X 1ij + X 2ij + …+ X kij + ij,  MANOVA (with p number of outcome variables) Y(nxp) = X(nx[q+1]) B([q+1] x p) + E (n x p)

Actual mathematical Models  Simple Linear Regression Models (SLR) Y i = β 0 + β 1 X i + ε i µ Y (true mean value of Y)  ε =“error” (random noise due to random sampling error), assumed ε follow a normal distribution with mean=0, variance=σ 2  β 0 & β 1 = intercept & slope  often called Regression (or beta) Coefficients  Y=Dependent Variable(DV)  X=Independent Variable (IV) eg., Y= Insulin Sensitivity X= FattyAcid in percentage  Multiple Linear Regression Models (MLR)  Simple Logistic Models(SL)  Multiple Logistic Models(ML)

SLR: Example SPSS output Two-sided p-value= Thus, there is significant statistical evidence (alpha=0.05) to conclude that the true slope is not zero  Fatty Acid(%) is significantly related to insulin sensitivity. Mean Insulin sensitivity increase by unit as Fatty Acid(%) increase by one percent.

SLR w/CI

Checking the assumptions using a residual Plot A plot has to be looked as “RANDOM” no special pattern is supposed to be shown if the assumptions are met.

Actual mathematical Models  Multiple Linear Regression Models (SLR) Y = β 0 + β 1 X 1 + β 2 X 2 + … + β k X k + ε µ Y (true mean value of Y)  Assumptions are the same as SLR with one more addition : All Xs are not highly correlated. If they are, this is called “Multicollinearity”, which will make model very unstable.  Diagnosis for multicollinearity  Variance Inflation Factor (VIF) = 1  OK  VIF < 5  Tolerable  VIF > 5  Problematic  Remove the variable which has a high VIF or do PCA  Multiple Linear Regression Models (MLR)  Simple Logistic Models(SL)  Multiple Logistic Models(ML)

MRL: Example m Y = X X 2 11  1.634*Flexibility For every 1 degree increase in flexibility, MEAN punt distance increases by feet, adjusting for leg strength.  0.249*Strength For every 1 lb increase in strength, MEAN punt distance increases by feet, adjusting for flexibility.

What do mean by “adjusted for”?  If categorical covariates?  eg.,  Mean % gain w/o adjustment for Gender  Exercise & Diet: (20% x 10+10% x 40) / 50 = 12 %  Exercise only: (15%x40 + 5%x10) / 50 = 13 %  Mean % gain with adjustment for Gender  Exercise & Diet: Male avg. x Female avg. x 0.5 = 20% x % x 0.5=15 %  Exercise only: Male avg. x Female avg. x 0.5 = 15% x % x 0.5=10% Mean muscle gain % (N) Exercise & DietExercise only Male20% (10) 15% (40) Female10% (40)5 % (10)

Why different?  % gain for males are 10% higher than female in both diet  potential confounding  However, two groups are unbalanced in terms of gender, i.e, 80% male for the exercise group while 20% female for the diet & exercise group  dilute the “treatment effect”  If continuous covariates such as baseline age, similar adjustment will be performed based on the correlation between % gain and the baseline age.

Graphical illustration : Adjusting for a continuous covariate * Changes in Adiponectin (a glucose regulating protein) b/w two groups

Multiple Logistic Regression Models The model: Logit(π)= β 0 + β 1 X 1 + β 2 X 2 + +β k X k where π=Prob (event =1), Logit(π)= ln[π /(1- π)] or π = e LP / (1+ e LP ), where Lp= β 0 + β 1 X 1 + β 2 X 2 + +β k X k

Interpretation of the coefficients in logistic regression models  For a continuous predictor, a coefficient (e β ) represents the multiplicative increase in the mean odds of Y=1 for one unit change in X  odds ratio for X+1 to X.  Similarly, for a nominal predictor, the coefficient represent the odds ratio for one group (X=1) to another (X=0).  Remember, MLR has other covariates. Hence, the interpretation of one coefficient is applied when other covariates are adjusted for. 16

Estimated Prob. Vs. Age 17

Other Models  Ordinal Logistic Regression for ordinal responses such as cancer stage I, II, III, IV : assumes the constant rate of change in OR between any two groups.  Poisson regressions when responses are count data such as # of pregnancy : over dispersion is common and some times a negative binomial distribution is used instead.  Mixed Model ; commonly used for a repeated measures ANOVA or ANCOVA. Time is used as within-subject factor and random factor. Mixed models are also used for nested design.  Cox proportional Hazard models: multivariate models for survival data.

General Linear Model vs. Generalized Linear Model(GLM)  A Linear Model  General Linear Model –eg., ANOVA, ANCOVA, MANOVA, MANCOVA, Linear regression, mixed model  A Non Linear Model  Generalized Linear Model –Eg., Logistic, Ordinary Logistic, Possion  All these used a link function for a response variable (Y) such as a logit link or possion link.  GEE(Generalized Estimating Equation) models are an extension of GLM.

Variable Selection Procedures  Forward  By adding a new predictor that as the lowest p-value and keep repeating this step until no more predictors to be added at 0.05 alpha level  Backward  Start a full model with all predictors and eliminate the predictor with the highest p-value and keep repeating this procedure until no more predictors left to be eliminated at 0.05 alpha level  Stepwise  Combination of Forward and Backward  Level of stay : 0.01, Level of entry: 0.05 usually used  Simulation studies show Backward is most recommendable based on many simulation studies.