Download presentation
Presentation is loading. Please wait.
Published byRatna Tedjo Modified over 5 years ago
1
Önder Ergönül, MD, MPH Koç University, School of Medicine
Correlation Önder Ergönül, MD, MPH Koç University, School of Medicine Summer Course on Research Methodology and Ethics in Medical Sciences June 10-21, 2017, Istanbul
14
Example
20
Multivariate Analysis
Önder Ergönül, MD, MPH Koç University, School of Medicine Summer Course on Research Methodology and Ethics in Medical Sciences June 17-28, 2019, Istanbul
21
Background
22
% 1989 Descriptive only 27 12 13 Statistics tables (contingency tables) 36 53 Epidemiologic measures (relatif risk, odds) 10 22 35 Survival analysis 11 32 61 Multivariate analysis (regression) 5 14 51 Power analysis 3 39 Horton NJ, Switzer SS. NEJM 2005; 353.
23
Multivariate analyses
Regression Dependent variable (outcome) Linear Continuous Logistic Dichotomous Cox Poisson
24
Confounder Variable A Outcome Variable B
25
Confounder fatality Severity index APACHE score
Acinetobacter infection fatality Severity index APACHE score
26
Control of the confounders
Randomization Stratification Adjustment by multivariate analysis
27
Odds Ratio p = probability (or proportion)
The lower bound is 0, and the upper bound is 1. Probability of success: Pr(y = 1) = p Probability of failure: Pr(y = 0) = 1 – p
28
What is the p of success or failure?
Total 1 - p p (1 - p) + p = 1 .25 = 1 - p .75 = p 1 = (1 - p) + p Odds = p/(1-p) = .75/ ( ) = .75/.25 = 3 Odds Ratio= pA/(1-pA) pB/(1-pB)
29
Relative Risk = DVT Heparin 8 92 Plasebo 18 82 Riskheparin= 8/100 =
0.08 Riskplasebo= 18/100 = 0.18 Risk plasebo 0.18 = Relative risk = = 2.25 Risk heparin 0.08
30
Odds Ratio = DVT Heparin 8 92 Plasebo 18 82 Oddsheparin= 8/92 = 0.087
Oddsplasebo= 18/82 = 0.22 Odds plasebo 0.22 = Odds ratio = = 2.53 Odds heparin 0.087
31
Comparing risks and odds
0.05 or 5% 0.053 0.1 or 10% 0.11 0.2 or 20% 0.25 0.3 or 30% 0.43 0.4 or 40% 0.67 0.5 or 50% 1 0.6 or 60% 1.5 0.7 or 70% 2.3 0.8 or 80% 4 0.9 or 90% 9 0.95 or 95% 19
32
Confounder OC MI Smoking
33
Oral contraceptives (OC) and myocardial infarction (MI)
Case-control study, unstratified data OC MI Controls OR Yes No Ref. Total
34
Oral contraceptives (OC) and myocardial infarction (MI)
Case-control study, unstratified data Smoking MI Controls OR Yes No Ref. Total
35
Odds ratio for OC adjusted for smoking = 4 .5
36
Regression
37
From Correlation to Regression
Lineer Reg Logistic Reg TheSimplest case: Y depends linearly on X E(Y) = ß0+ ß1x ß0, ß1: PARAMETERS
38
Regression X: Predictor or independent variable Y: outcome or dependent variable Y Y= a+bx All 3 of them possible best summarze closest to the observed data how to defıne = square vertical distances between the points and the line then select a line that minimizes the sum of squared distances least squares estımates X
39
Scatterplot and Linear Regression
40
Lineer Regression E(Y) = ß0 + ß1 x Weight = -44,16 + 0,55 * height
Parameters: ß0: INTERCEPT ß1: SLOPE, regression coefficient
41
SBP (mm Hg) Age (years)
42
Simple linear regression
Relation between 2 continuous variables (SBP and age) Regression coefficient b1 Measures association between y and x Amount by which y changes on average when x changes by one unit Least squares method y Slope x
43
Age and signs of coronary heart disease (CD)
Logistic regression Age and signs of coronary heart disease (CD)
44
Logistic function Probability of disease x
45
Why is it called “Logistic” regression?
It uses the logit transformation. The logistics transformation can be interpreted as the logarithm of the odds of success vs. failure.
46
{ Transformation α = log odds of disease in unexposed
β = log odds ratio associated with being exposed eβ = odds ratio logit of P(y|x) {
47
Fitting equation to the data
Linear regression: Least squares Logistic regression: Maximum likelihood Likelihood function Estimates parameters a and b Practically easier to work with log-likelihood
48
Maximum likelihood Iterative computing Results
Choice of an arbitrary value for the coefficients (usually 0) Computing of log-likelihood Variation of coefficients’ values Reiteration until maximisation (plateau) Results Maximum Likelihood Estimates (MLE) for and Estimates of P(y) for a given value of x
49
Why Do We do Logistic Regression so much?
Predict the likelihood of discrete outcomes Group membership Binary outcome (disease/no disease) Quite Flexible Statistical Assumptions No need for assumptions about the distributions of the predictor variables. Predictors do not have to be normally distributed Does not have to be linearly related. Does not have to have equal variance within each group. Very good in giving Odds Ratio
50
Construction of Model
51
Construction of Model Perform univariate statistics (outliers, distribution, gaps) Perform univariate analysis Transform nominal independent variables to dichotomous (dummied) variable
52
Occupation Nominal Dummy Farmer Nurse 1 Housewife 2 Physician 3 4 Policeman 5
53
Construction of Model Multicollinearity
4. Run a correlation matrix. If any pair of independent variables are correlated at > 0.90 (multicollinearity), decide which one to keep and which one to exclude. More practical way is to consider “biologic relation” (smoking, carrying matches, and cancer)
54
Diagnosing confounder
Outcome Variable A Variable B Confounder Fatality Acinetobacter infection APACHE score
55
How to choose variables?
Kitchen sink Inclusion of significant variables Forward selection Backward selection
56
An example Outcome : Deep Vein Thrombosis Independent variables : Heparin, gender, Coronary Heart Disease, aspirin use Y= a+b1x1+ b2x2 + b3x3 + b4x4 DVT=a+b1(heparin)+ b2(female) + b3(CAD)+ b4(aspirin) 0.5 1.5 3 0.6
57
OR p-değeri 95% CI Heparin use 0.5 0.003 0.15-0.72 Female 1.48 0.504
Logistic regression Number of obs = 10 LR chi2(4) = 0.97 Prob > chi2 = Log likelihood = Pseudo R = DVT | Odds Ratio Std. Err z P>|z| [95% Conf Interval] Heparin | Kadin | KAH | aspirin | OR p-değeri 95% CI Heparin use 0.5 0.003 Female 1.48 0.504 CHD 3.06 0.009 Aspirin use 0.58 0.622 Bır sonraki slyta baska regresyonlarda OR ın degısecegını koy
58
Assessment of the Model
59
Methods for measuring how well a model accounts for the outcome
Multiple lineer regression Multiple logistic regression Proportional hazard analysis Model accounts for outcome better than chance F test Likelihood ratio test (LR) Quantitative/qualitative assessment of how well model accounts for outcome R2 R2 (rarely used) Comparison of estimated to observed value Hosmer-Lemeshov Comparsion of estimated to observed value Prediction of outcome NA Sensitivity, Specificity, Accuracy, c index
60
LR and p value of the test
For both logistic regression and proportional hazard analysis: If chi square of the LR is large, the p value will be small, and the null hypothesis can be rejected.
61
R2 R2 is a quantitative measure of how well the independent variables account for the outcome When R2 is multiplied by 100, it can be thought of as the percentage of the variance in the dependent variable explained by the independent variable
62
Goodness of Fit r² is between 0 and 1 r² = 1 perfect fit
(x does not add any information to y) r² > 0.8 pretty good fit r² ≈ 0.3 what we see usually ! r² ≈ 0 poor fit
64
Interactions How can interactions be taken into consideration?
Let X and Y be two explanatory variables, then include X·Y in the model. Is X a categorical variable with dummy variables (X1,...Xm-1), then X·Y = (X1 ·Y,…,Xm-1 ·Y). By interactions the multiplicative character of the OR is changed.
65
TIME-TO EVENT TIME-TO DEATH
SURVIVAL ANALYSIS TIME-TO EVENT TIME-TO DEATH
66
Exposure and outcome exposure NO exposure
67
Person-Time Jan Feb March April May June Total Time at risk A 3 months
Total person time 3+6+2=11
68
The Reason for Survival Analysis
Censored cases During the follow up: expected outcome did not happen for the case lossed from follow up or dropped All the patients do not need to start together.
69
Survival Analysis life table Kaplan-Meier curve Cox regresyon
70
Kaplan-Meier Curve Duration = Time to event (time until an event occurs) “Event” = Fatality, survived, relaps... months Number of the patients 50 49 44 42 40 39 2 4 6 8 Log-rank test compares 2 curves statistically
71
Kaplan-Meier Curve The number of patients treatment placebo ay 50 49
44 42 40 39 treatment placebo 2 4 6 8
72
Cox Regression Hazard ratio Logistic regression Number of obs = 10
LR chi2(4) = 0.97 Prob > chi2 = Log likelihood = Pseudo R = DVT | Odds Ratio Std. Err z P>|z| [95% Conf Interval] Heparin | Kadin | KAH | aspirin | Cox regression Hazard ratio
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.