Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Exploring the Shape of the Dose-Response Function.
Kin 304 Regression Linear Regression Least Sum of Squares
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
Regression analysis Linear regression Logistic regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 10 Curve Fitting and Regression Analysis
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Lecture 6: Multiple Regression
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Multiple Regression MARE 250 Dr. Jason Turner.
An Introduction to Logistic Regression
Chapter 15: Model Building
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Regression and Correlation
Correlation & Regression
Module 32: Multiple Regression This module reviews simple linear regression and then discusses multiple regression. The next module contains several examples.
Advantages of Multivariate Analysis Close resemblance to how the researcher thinks. Close resemblance to how the researcher thinks. Easy visualisation.
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Chapter 11 Simple Regression
Simple Linear Regression
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
MARE 250 Dr. Jason Turner Multiple Regression. y Linear Regression y = b 0 + b 1 x y = dependent variable b 0 + b 1 = are constants b 0 = y intercept.
Analytical epidemiology Disease frequency Study design: cohorts & case control Choice of a reference group Biases Alain Moren, 2006 Impact Causality Effect.
Research Techniques Made Simple: Multivariable Analysis Marlies Wakkee Loes Hollestein Tamar Nijsten Department of Dermatology, Erasmus University Medical.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Tim Wiemken PhD MPH CIC Assistant Professor Division of Infectious Diseases University of Louisville, Kentucky Confounding.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Instructor Resource Chapter 15 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Canadian Bioinformatics Workshops
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The simple linear regression model and parameter estimation
BINARY LOGISTIC REGRESSION
Correlation, Bivariate Regression, and Multiple Regression
Projection on Latent Variables
CHAPTER 29: Multiple Regression*
Multiple Regression Models
CHAPTER- 17 CORRELATION AND REGRESSION
Simple Linear Regression
Regression Analysis.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June 20091/29 Multivariate analysis: Introduction Third training Module EpiSouth Madrid, 15 th to 19 th June, 2009 Dr D. Hannoun National Institute of Public Health Algeria

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Generality Introduction: Generality Stratification allows us: Control confounding Reveal effect modification Limits of stratification: Only a few number of confounders could be controlled simultaneously The joint effect of confounders cannot be analysed correctly +++ Choice of classes with quantitative variables  Other tools: MULTIVARIATE ANALYSIS Assess the reality of the effect of exposure on the disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Joint effect Introduction: Joint effect Example: Hepatitis BSEP Potential confounders: Age (children/adults), immunity (good/deficient)  Joint effect: the effect of two/more factors combined together  Marginal effect: the effect of one confounder alone without taking in consideration the other potential confounders Control onStrate 1 F+ Strate 2 F- Strate 3Strate 4Crude effect Adjusted Measure 2.0 F1+/F2+F1-/F2-F1+/F2-F1-/F2+ Age (F1)2,0 2,0 2,0 Immunity (F2)2,0 2,0 2,0 Factors 1+21,0 1, ,0

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Definition Multivariate analysis: Definition Definition: Simultaneously, adjust for several variables Simultaneously, control for several potential confounders Several models: Multiple linear regression Logistic regression Cox regression …. Vocabulary Disease Y= dependant variable Risk factors= independant variables or predictors Procedures, at the analysis phase, that

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Definition Multivariate analysis: Definition How: Representation of the disease Y as a function of other variables Risk factors Potential confounders By modelling the relationship studied Set of variables Statistical procedures: Multivariate analysis: The best Subset of variables describes the relationship between RF and disease Measure of the relationship: parameters To describe the disease via an equation The best model fitting the data

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Definition Multivariate analysis: Definition Writing Model: E(Y/E, X 1, X 2 …, X p ) = f(E, X 1, X 2 …, X p ) Y: a given Disease E: Exposure X 1,X 2 …: other variables Example: F= linear function E(Y/E, X 1, X 2 …, X p ) = α + βE + β 1 X 1 + β 2 X 2 + … + β p X p β, β 1, β 2 … measure the relation between the exposure E, the others risk factors X1, X2… and the disease Y controlled on the other variables If β =0  No relationship between exposure and the disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Definition Multivariate analysis: Definition  The adjusted measures of association we obtain from multivariable analysis are:  For each variable in the model, we obtain the effect measure of the relationship between this variable and the disease controlled on the other variables Direct effects and not total effects

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Advantages Multivariate analysis: Advantages Advantages/techniques: Estimation of effects and controlling for more than one confounder simultaneously Study of the joint effect of several risk factors and quantify the intensity of interaction Possibility to have continuous risk factor Study the dose-response relationship: interest for causality and the specific risk at intermediary levels Study the trend effect according to the level of the risk factor Prediction of the disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Step Multivariate analysis: Step Several steps: Choosing the appropriate model to summarize data Define the strategy variable selection Estimate the model coefficients Method of least squares (LS) estimation Method of maximum likelihood (ML) estimation Writing and interpreting the model Study the adequation of the model

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Choice of the model Multivariate analysis: Choice of the model Depends on the form of the function f: 1. Nature of the outcome variable Continuous outcome  Multiple linear Regression Categorical outcome  Logistic regression (LR) Outcome time to an event  Cox regression 2. Nature of joint effect Additif  Multiple linear regression Multiplicatif  Logistic regression Cox regression 3. Form of the variable-distribution Normally distributed… 4. Assumption

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 variables selection Multivariate analysis: variables selection The final model depends on the variables will be selected: At the study design: Decide which variables to adjust or to control for How the variable will be coded Which interaction should be considered At the analytical phase: Which variables must be entered in the model Variables must be forced P value E.g.: 7 variables coded 0/1 with all interaction terms2 7 = 128 coefficients to estimate in the final model!  Neccesity of STRATEGY

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Parameters estimation Multivariate analysis: Parameters estimation Purpose of multivariate analysis: To obtain some measure of the effect that describes the exposure- outcome relationship adjusted for relevant extraneous factors Parameters estimation depends on the model used: In MLR  regression coefficients β In LR  odds ratio In Cox  hazard ratio

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Modeladequation Multivariate analysis: Model adequation Verify the adequation of the model: Capacity of the model to represent correctly the value of the disease given the value of subset of risk factors Steps: Adequation of the model: Graphical methods +++ Statistical tests Interpreting the test: be careful to the outlier The best model is necessary not the best statistical model: choose the model with the best understanding of the disease  The fitting model could be used for prediction

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Introduction MLR: Introduction = multivariate model used in case of continuous data Principle: Describe one variable as a linear function of one or more other variables Form: E(Y)=f(E,X1,X2…)  F= linear function E(Y/X) = α + βX Simple linear regression model E(Y/X 1,, X p )= α + β 1 X 1 + … + β p X p Multi. linear regression model E(Y) = α + βX Disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Introduction MLR: Introduction Incidence rate of ARI Atmopsheric pollution: density of PM 10 Y = α + βX + ε β = slope of the straight line Estimate the change in Y for one unit of X E.g. when pollution atmospheric increases 1%, the incidence rate of ARI increases by 2 cas/ person α = intercept which correspond to the value of disease when the exposure equal 0, or more generally describes the baseline ε = error term in the model Statistical model In simple linear regression: Y = α + βX ^^^

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Introduction MLR: Introduction In Multiple linear regression: Statistical model: Y = α + β 1 X 1 + β 2 X 2 + … + β p X p + ε E.g.: Variation of incidence rate of ARI with atmospheric pollution Potential confounders: age and smoking X 1 = density of PM 10 X 2 = age of person X 3 = smoking ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 3 smoking + ε

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Introduction MLR: Introduction In Multiple linear regression: β 1 = slope along the X 1 dimension: variation of ARI with the change of 1 unit of PM 10 density controlled on the other variables β 2 = slope along the X 2 dimension: variation of ARI with the change of one unit of AGE controlled on the other variables β 3 = slope along the X 3 dimension: variation of ARI with the change of one unit of smoking (person/year) controlled on the other variables α = intercept, value of the disease when there is no risk factor… ε = error term in the model ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 3 smoking + ε

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Parameters estimation MLR: Parameters estimation Method used: least squares estimation Principle: Identify the best straight line that minimizes the sum of squared residuals YiYi ŶiŶi (X i,Ŷ i,) (X i,Y i,) XiXi Least squared line fit SSR = Σ(Y i - Ŷ i ) 2 = Σ(Yi - α – βX) 2

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Variables selection MLR: Variables selection Decide which variables to control for: 1. Prediction of the risk of the disease We haven’t to take in consideration all confounders but the best group of predictors Importance in term of public Health +++ E.g.: incidence rate of ARI – Exposure: atmospheric pollution – Predictors: age and smoking 2. Estimation of the relation between exposure and disease We have to take in consideration ALL confounders to control confounding Importance in term of causal association E.g.: incidence rate of ARI – Exposure: atmospheric pollution – Predictors: age, smoking, breastfeeding, ROR…

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Variables selection MLR: Variables selection Which variables must be entered in the initial model:2 situations Some are obligatory in the model because there are recognized as risk factor: exposure Other variables  significant relationship between the variable and the disease in the bivariate analysis  All candidate variables to modelling

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Variables selection MLR: Variables selection Which interaction should be considered: Problem of interaction must be approached in a manner wich facilitates understanding of the nature of the causal effect Statistical consideration should serve rather than determine our objectives Adjonction of an interaction term  Addition of an other regression coefficient in the equation More difficulties to interpret the model For a given interaction, you must ensure that the variables which are in the term interaction are contained in the model

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Variables selection MLR: Variables selection Example : Incidence rate of ARI 1. Model WITH an interaction term: Interaction BETWEEN smoking and age:β 2,3 X 2 X 3 ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 3 smoking + β 2,3 Age smoking + β 4 breastfeeding + β 5 ROR + ε ARI Inc. Rate = α + β 1 density of PM 10 + β 2 Age + β 2,3 Age smoking + ε

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Variables selection MLR: Variables selection Which variables must be entered in the initial model:2 situations … How the variables must be entered in the initial model: Strategy must be defined Start with ALL variables  Backward elimination Start with NO variable  Forward selection Mixed the two previous methods  Stepwise selection

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Variables selection MLR: Variables selection sexeage Pollution ROR smoking breastfeeding region Profession Age*smoking At The stud design Bivariate analysis and stratification First part of analytical phase Significant variables Pollution Age Smoking Breastfeeding ROR V. must be forced Pollution Candidate variables to modeling The largest possible model Define how the V. could be entered in the model Backward Forward Stepwise Multivariate analysis Rules Second part of analytical phase Final model: Pollution Age Smoking

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Backwards strategy MLR: Backwards strategy Principle : Begins with ALL candidate variables in the model  largest POSSIBLE model At each step, Drop one variable, the choice of this variable is based on statistical rules  remains variable which is not significant Continue until no more variables can be dropped, meaning all remaining variables are relevant Advantages: Evaluate the joint confounding effects of all variables Limits: With many risk factors, strata could provide no information

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Forward strategy MLR: Forward strategy Principle : Begins with NO variable in the model  smallest POSSIBLE model At each step, Keep one variable in the model, the choice of this variable is based on statistical rules Start with the variable that has the biggest change-in-estimate impact when evaluated individually Keep the var. which changes meangfully the adjusted estimate Continue until no other variables can be added Advantages: Avoids the initial sparse cell problem of backwards approach Limits: Does not evaluate joint confounding effects of many variables

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Conclusion MLR: Conclusion Goal of modeling: To obtain The smallest subset of relevant risk factors to describes the disease With the best understanding of the disease Like for stratification, you must identify: First, significant interaction term: don’t forget to verifiy that the v. which are in the term interaction are contained in the model  statistical significance + biological consideration Secondly, test the confounding effect  No statistical test Retain significant risk factors, confounder risk factors and interaction term that help us to understand and to explain the occurrence of disease

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Conclusion Multivariate analysis allows to control and adjust the effect of exposure with several extraneaous factors simultaneously The adjusted measures of association are direct effects and not total effects Multivariate analysis is a useful tool but it could be very dangerous if we haven’t preliminary defined the strategy Purpose of the study Method of variable selection Assumption Adequation of the model…

Third training Module, EpiSouth: Multivariate analysis, 15 th to 19 th June /29 Conclusion As with stratification method, statistical considerations should serve rather than determine our objectives Multivariate analysis requires computer to run the statistical programme The choice of the model depends upon of a lot of factors: outcome variable, form of the relationship between exposure and disease…