Presentation is loading. Please wait.

Presentation is loading. Please wait.

Önder Ergönül, MD, MPH Koç University, School of Medicine

Similar presentations


Presentation on theme: "Önder Ergönül, MD, MPH Koç University, School of Medicine"— Presentation transcript:

1 Önder Ergönül, MD, MPH Koç University, School of Medicine
Correlation Önder Ergönül, MD, MPH Koç University, School of Medicine Summer Course on Research Methodology and Ethics in Medical Sciences June 10-21, 2017, Istanbul

2

3

4

5

6

7

8

9

10

11

12

13

14 Example

15

16

17

18

19

20 Multivariate Analysis
Önder Ergönül, MD, MPH Koç University, School of Medicine Summer Course on Research Methodology and Ethics in Medical Sciences June 17-28, 2019, Istanbul

21 Background

22 % 1989 Descriptive only 27 12 13 Statistics tables (contingency tables) 36 53 Epidemiologic measures (relatif risk, odds) 10 22 35 Survival analysis 11 32 61 Multivariate analysis (regression) 5 14 51 Power analysis 3 39 Horton NJ, Switzer SS. NEJM 2005; 353.

23 Multivariate analyses
Regression Dependent variable (outcome) Linear Continuous Logistic Dichotomous Cox Poisson

24 Confounder Variable A Outcome Variable B

25 Confounder fatality Severity index APACHE score
Acinetobacter infection fatality Severity index APACHE score

26 Control of the confounders
Randomization Stratification Adjustment by multivariate analysis

27 Odds Ratio p = probability (or proportion)
The lower bound is 0, and the upper bound is 1. Probability of success: Pr(y = 1) = p Probability of failure: Pr(y = 0) = 1 – p

28 What is the p of success or failure?
Total 1 - p p (1 - p) + p = 1 .25 = 1 - p .75 = p 1 = (1 - p) + p Odds = p/(1-p) = .75/ ( ) = .75/.25 = 3 Odds Ratio= pA/(1-pA) pB/(1-pB)

29 Relative Risk = DVT Heparin 8 92 Plasebo 18 82 Riskheparin= 8/100 =
0.08 Riskplasebo= 18/100 = 0.18 Risk plasebo 0.18 = Relative risk = = 2.25 Risk heparin 0.08

30 Odds Ratio = DVT Heparin 8 92 Plasebo 18 82 Oddsheparin= 8/92 = 0.087
Oddsplasebo= 18/82 = 0.22 Odds plasebo 0.22 = Odds ratio = = 2.53 Odds heparin 0.087

31 Comparing risks and odds
0.05 or 5% 0.053 0.1 or 10% 0.11 0.2 or 20% 0.25 0.3 or 30% 0.43 0.4 or 40% 0.67 0.5 or 50% 1 0.6 or 60% 1.5 0.7 or 70% 2.3 0.8 or 80% 4 0.9 or 90% 9 0.95 or 95% 19

32 Confounder OC MI Smoking

33 Oral contraceptives (OC) and myocardial infarction (MI)
Case-control study, unstratified data OC MI Controls OR Yes No Ref. Total

34 Oral contraceptives (OC) and myocardial infarction (MI)
Case-control study, unstratified data Smoking MI Controls OR Yes No Ref. Total

35 Odds ratio for OC adjusted for smoking = 4 .5

36 Regression

37 From Correlation to Regression
Lineer Reg Logistic Reg TheSimplest case: Y depends linearly on X E(Y) = ß0+ ß1x ß0, ß1: PARAMETERS

38 Regression X: Predictor or independent variable Y: outcome or dependent variable Y Y= a+bx All 3 of them possible best summarze closest to the observed data how to defıne = square vertical distances between the points and the line then select a line that minimizes the sum of squared distances least squares estımates X

39 Scatterplot and Linear Regression

40 Lineer Regression E(Y) = ß0 + ß1 x Weight = -44,16 + 0,55 * height
Parameters: ß0: INTERCEPT ß1: SLOPE, regression coefficient

41 SBP (mm Hg) Age (years)

42 Simple linear regression
Relation between 2 continuous variables (SBP and age) Regression coefficient b1 Measures association between y and x Amount by which y changes on average when x changes by one unit Least squares method y Slope x

43 Age and signs of coronary heart disease (CD)
Logistic regression Age and signs of coronary heart disease (CD)

44 Logistic function Probability of disease x

45 Why is it called “Logistic” regression?
It uses the logit transformation. The logistics transformation can be interpreted as the logarithm of the odds of success vs. failure.

46 { Transformation α = log odds of disease in unexposed
β = log odds ratio associated with being exposed eβ = odds ratio logit of P(y|x) {

47 Fitting equation to the data
Linear regression: Least squares Logistic regression: Maximum likelihood Likelihood function Estimates parameters a and b Practically easier to work with log-likelihood

48 Maximum likelihood Iterative computing Results
Choice of an arbitrary value for the coefficients (usually 0) Computing of log-likelihood Variation of coefficients’ values Reiteration until maximisation (plateau) Results Maximum Likelihood Estimates (MLE) for  and  Estimates of P(y) for a given value of x

49 Why Do We do Logistic Regression so much?
Predict the likelihood of discrete outcomes Group membership Binary outcome (disease/no disease) Quite Flexible Statistical Assumptions No need for assumptions about the distributions of the predictor variables. Predictors do not have to be normally distributed Does not have to be linearly related. Does not have to have equal variance within each group. Very good in giving Odds Ratio

50 Construction of Model

51 Construction of Model Perform univariate statistics (outliers, distribution, gaps) Perform univariate analysis Transform nominal independent variables to dichotomous (dummied) variable

52 Occupation Nominal Dummy Farmer Nurse 1 Housewife 2 Physician 3 4 Policeman 5

53 Construction of Model Multicollinearity
4. Run a correlation matrix. If any pair of independent variables are correlated at > 0.90 (multicollinearity), decide which one to keep and which one to exclude. More practical way is to consider “biologic relation” (smoking, carrying matches, and cancer)

54 Diagnosing confounder
Outcome Variable A Variable B Confounder Fatality Acinetobacter infection APACHE score

55 How to choose variables?
Kitchen sink Inclusion of significant variables Forward selection Backward selection

56 An example Outcome : Deep Vein Thrombosis Independent variables : Heparin, gender, Coronary Heart Disease, aspirin use Y= a+b1x1+ b2x2 + b3x3 + b4x4 DVT=a+b1(heparin)+ b2(female) + b3(CAD)+ b4(aspirin) 0.5 1.5 3 0.6

57 OR p-değeri 95% CI Heparin use 0.5 0.003 0.15-0.72 Female 1.48 0.504
Logistic regression Number of obs = 10 LR chi2(4) = 0.97 Prob > chi2 = Log likelihood = Pseudo R = DVT | Odds Ratio Std. Err z P>|z| [95% Conf Interval] Heparin | Kadin | KAH | aspirin | OR p-değeri 95% CI Heparin use 0.5 0.003 Female 1.48 0.504 CHD 3.06 0.009 Aspirin use 0.58 0.622 Bır sonraki slyta baska regresyonlarda OR ın degısecegını koy

58 Assessment of the Model

59 Methods for measuring how well a model accounts for the outcome
Multiple lineer regression Multiple logistic regression Proportional hazard analysis Model accounts for outcome better than chance F test Likelihood ratio test (LR) Quantitative/qualitative assessment of how well model accounts for outcome R2 R2 (rarely used) Comparison of estimated to observed value Hosmer-Lemeshov Comparsion of estimated to observed value Prediction of outcome NA Sensitivity, Specificity, Accuracy, c index

60 LR and p value of the test
For both logistic regression and proportional hazard analysis: If chi square of the LR is large, the p value will be small, and the null hypothesis can be rejected.

61 R2 R2 is a quantitative measure of how well the independent variables account for the outcome When R2 is multiplied by 100, it can be thought of as the percentage of the variance in the dependent variable explained by the independent variable

62 Goodness of Fit r² is between 0 and 1 r² = 1 perfect fit
(x does not add any information to y) r² > 0.8 pretty good fit r² ≈ 0.3 what we see usually ! r² ≈ 0 poor fit

63

64 Interactions How can interactions be taken into consideration?
Let X and Y be two explanatory variables, then include X·Y in the model. Is X a categorical variable with dummy variables (X1,...Xm-1), then X·Y = (X1 ·Y,…,Xm-1 ·Y). By interactions the multiplicative character of the OR is changed.

65 TIME-TO EVENT TIME-TO DEATH
SURVIVAL ANALYSIS TIME-TO EVENT TIME-TO DEATH

66 Exposure and outcome exposure NO exposure

67 Person-Time Jan Feb March April May June Total Time at risk A 3 months
Total person time 3+6+2=11

68 The Reason for Survival Analysis
Censored cases During the follow up: expected outcome did not happen for the case lossed from follow up or dropped All the patients do not need to start together.

69 Survival Analysis life table Kaplan-Meier curve Cox regresyon

70 Kaplan-Meier Curve Duration = Time to event (time until an event occurs) “Event” = Fatality, survived, relaps... months Number of the patients 50 49 44 42 40 39 2 4 6 8 Log-rank test compares 2 curves statistically

71 Kaplan-Meier Curve The number of patients treatment placebo ay 50 49
44 42 40 39 treatment placebo 2 4 6 8

72 Cox Regression Hazard ratio Logistic regression Number of obs = 10
LR chi2(4) = 0.97 Prob > chi2 = Log likelihood = Pseudo R = DVT | Odds Ratio Std. Err z P>|z| [95% Conf Interval] Heparin | Kadin | KAH | aspirin | Cox regression Hazard ratio


Download ppt "Önder Ergönül, MD, MPH Koç University, School of Medicine"

Similar presentations


Ads by Google