Download presentation
Presentation is loading. Please wait.
1
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility
2
Nemours Biomedical Research Odds The odds in favor of an event (e.g. occurrence of a disease) is the ratio of the probabilities of the event occurring to that of not occurring, i.e., p/(1-p) –where p is the probability of the event occurring If probability of an event is 0.6 (i.e., it is observed in 60% of the cases) then the probability of it not occurring is (1 - 0.6) = 0.4 The odds in favor of the event occurring is thus 0.6/0.4 = 1.5 The greater the odds of an event, the greater it’s probability
3
Nemours Biomedical Research Calculating odds for a 2x2 table Response Variable Predictor Variable CancerNo CancerTotal Smokersaba+b Non- Smokerscdc+d Totala+cb+dN=a+b+c+d The proportion of smokers with cancer: p = a/(a+b) (this is the likelihood of smokers having cancer) The proportion of smokers without cancer: 1-p = b/(a+b) (this is the likelihood of smokers not having cancer) The odds of smokers having cancer: p / (1 - p) = (a/(a+b))/(b/(a+b)) = a/b
4
Nemours Biomedical Research Calculating odds for a 2x2 table Response Variable Predictor Variable CancerNo CancerTotal Smokersaba+b Non-Smokerscdc+d Totala+cb+dN=a+b+c+d The proportion of non-smokers with cancer: p = c/(c+d) The proportion of non-smokers without cancer: 1-p = d/(d+d) The odds of cancer among non-smokers: (c/(c+d)) / (d/(c+d)) = c/d
5
Nemours Biomedical Research Odds Ratio The odds ratio of an event is the ratio of the odds of the event occurring in one group to the odds of it occurring in another group. Let p 1 be the probability of an event in group 1 and p 2 be the probability of the same event in group 2. Then the odds ratio (OR) of the event in these two groups is: The odds ratio is a measure of effect size
6
Nemours Biomedical Research Odds Ratio In the previous example, The odds of cancer among smokers is a/b The odds of cancer among non-smokers is c/d So, the odds ratio of cancer among smokers vs non-smokers
7
Nemours Biomedical Research Odds Ratio CancerNo Cancer Smokers12080200 Non- Smokers60140200 180220N=400 Estimating odds ratio in a 2x2 table The odds of cancer in smokers is 120/80 = 1.5 and the odds of cancer in non-smokers is 60/140 = 0.4286 and the ratio of two odds is, Interpretation: The risk of cancer is 3.5 times greater for smokers compared to non-smokers ( in this sample )
8
Nemours Biomedical Research Odds Ratio The odds ratio must be greater than or equal to zero. As the odds of the first group approaches to zero, the odds ratio approaches to zero. As the odds of the second group approaches to zero, the odds ratio approaches to positive infinity An odds ratio of 1 indicates that the condition or event under study is equally likely in both groups. In our example, that would mean no association between cancer and smoking was observed.
9
Nemours Biomedical Research Odds Ratio An odds ratio greater than 1 indicates that the condition or event is more likely in the first group. In the previous example, an odds ratio of 2 means that cancer is 2 times more likely in smokers compared to non-smokers An odds ratio less than 1 indicates that the condition or event is less likely in the first group. In our example, an odds ratio of 0.8 would mean that the cancer is 20% (i.e., 1 - 0.8) less likely in smokers compared to non-smokers
10
Nemours Biomedical Research R-demo: Odds ratio (from 2 x 2 contingency table) Install Package epitools for odds ratio Load epitools: >library(“epitools”) Input data in previous example: x<-matrix(c(120,60,80,140), 2, 2) oddsratio(x, method="wald")
11
Nemours Biomedical Research R-demo: Odds ratio - Output > oddsratio(x, method="wald") $data Outcome Predictor Disease1 Disease2 Total Exposed1 120 80 200 Exposed2 60 140 200 Total 180 220 400 $measure odds ratio with 95% C.I. Predictor estimate lower upper Exposed1 1.0 NA NA Exposed2 3.5 2.313230 5.295625 $p.value two-sided Predictor midp.exact fisher.exact chi.square Exposed1 NA NA NA Exposed2 1.467097e-09 2.298090e-09 1.637296e-09 $correction [1] FALSE attr(,"method") [1] "Unconditional MLE & normal approximation (Wald) CI"
12
Nemours Biomedical Research R-demo: Odds ratio (from raw data) Categorize the outcome variable PLUC.post in two groups (high and low PLUC groups). –Data management -> Manage variable in active data set -> Bin numeric variable ->From this window: pick variable to bin (e.g. PLUC.post ), Name the new variable (e.g. Newvar), Select number of bins (e.g. 2), select binning method (e.g. k- means clustering) oddsratio(data$Ped, data$Newvar, method="wald"), where Newvar is the categorized outcome variable and Ped is also a categorical variable with pediatrician 1 and 2.
13
Nemours Biomedical Research R-demo: Odds ratio - Output > oddsratio(data$Ped, data$Newvar, method="wald") $data Outcome Predictor 0 1 Total 0 12 18 30 1 16 14 30 Total 28 32 60 $measure odds ratio with 95% C.I. Predictor estimate lower upper 0 1.0000000 NA NA 1 0.5833333 0.2095645 1.623737 $p.value two-sided Predictor midp.exact fisher.exact chi.square 0 NA NA NA 1 0.3166258 0.4378954 0.300623 $correction [1] FALSE attr(,"method") [1] "Unconditional MLE & normal approximation (Wald) CI"
14
Nemours Biomedical Research Logit of p The logit of a number P between 0 and 1 is log(P/1-P). It is defined by logit(P) If P is the probability of an event, then P/1-P is odds of that event and logit(P) is the log(odds) = log(P/1-P). The difference between the logits of two probabilities is the log of the odds ratio (OR). log (OR) = The logit scale is linear and functions much like a z-score scale.
15
Nemours Biomedical Research Logit of p Logit is a continuous score in the range - to p = 0.50, then logit = 0.0 p = 0.70, then logit = 0.85 p = 0.30, then logit = -0.85 The standard deviation of logit is sqrt(1/a + 1/b + 1/c + 1/d).
16
Nemours Biomedical Research Logistic Regression Recall: For a categorical variable, we focus on number or proportion for each category. Proportion of a category simply says about how likely to happen that category Suppose, y is a variable that represent occurrence or not occurrence of cancer (two categories only). And y=1 indicates occurrence and y=0 indicates (not occurrence). Let, p=likelihood of the event (y=1), so 1-p = likelihood of the event (y=0). We want to relate p or (1-p) i.e. likelihood of happening or not happening, instead the response y itself), with an independent variable
17
Nemours Biomedical Research Logistic Regression Plot of two variables: An outcome variable (say cancer happening/not happening ) and an independent variable (say age) Could we use a linear regression?
18
Nemours Biomedical Research Logistic Regression Likelihood of cancer Age (in years) Likelihood of cancer increases with an increasing age Relationship: not linear Plot of proportions for different ages
19
Linear vs. Logistic Function
20
Nemours Biomedical Research Logistic Regression Recall: Simple linear regression: y = b 0 + b 1 x, where y is a continuous quantitative outcome variable, x is a quantitative/categorical variable. Like y, logit is a quantitative variable, and we can replace y by logit of p where p is the likelihood of an event That is, log(p/(1-p)) = log (odds) = b 0 + b 1 x, which is simple linear regression between log(p/(1-p)) =log(odds) and the independent variable x (say age). Association patterns with log(odds) are the same as the patterns with odds itself.
21
Nemours Biomedical Research Logistic Regression log(p/1-p) = log (odds of cancer) = b 0 + b 1 *age Interpretation of b 1 : change of log odds of cancer for 1 year change of age. Let us consider two persons of ages 55 and 56, then, log (odds of cancer at age 55) = b 0 + b 1 *55 log (odds of cancer at age 56) = b 0 + b 1 *56 The difference, log(odds at 56) – log (55) = b 1
22
Nemours Biomedical Research Logistic Regression b 1 =0 (or equivalently odds ratio exp(b 1 ) =1), indicates no association of log(Odds of cancer) with the variable age (X). b 1 >0, indicates a positive association of log(Odds of cancer) with the variable age (X). b 1 <0, indicates a negative association of log(Odds of cancer) with the variable age (X). If 95% confidence interval (CI) of b1 does not contains 0 (the null hypothesis), it indicates that the independent variable has an significant influence on the response variable at 5% level of significance. b 0 is the intercept
23
Nemours Biomedical Research Logistic Regression exp(b 1 )= 1, No association of response with predictor. For categorical predictor, an event is equally likely in both reference as well as comparative group. exp(b 1 )>1, indicates that an event is more likely to the comparative group compare to the reference group. exp(b 1 )<1, indicates that an event is less likely to the comparative group compare to the reference group. If 95% CI of odds ratio contains 1, it indicates that likelihood of an event two groups are significantly different at 5% level of significance.
24
Nemours Biomedical Research Rcmdr demo: Logistic Regression Statistics -> Fit models -> Generalized linear model - > from this window ( under the model formula, select dependent variable (e.g. Newvar) ~ independent (e.g. age), family – binomial, link – logit. Newvar is the categorical form of the variable PLUC.post (Details in the example for odds ratio)
25
Nemours Biomedical Research Simple Logistic Regression: Rcmdr output Call: glm(formula = Newvar ~ age, family = binomial(logit), data = data) Deviance Residuals: Min 1Q Median 3Q Max -1.246 -1.233 1.109 1.123 1.131 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.0818711 0.8222250 0.100 0.921 age 0.0005715 0.0086349 0.066 0.947 (Dispersion parameter for binomial family taken to be 1) Null deviance: 82.911 on 59 degrees of freedom Residual deviance: 82.906 on 58 degrees of freedom AIC: 86.906 95% CI for the coefficient of age is 0.0005715 2*0.0086349 = (-0.0166983, 0.0178413) In terms odds ratio: Odds ratio = exp (0.0005715) = 1.000572 95% CI for the odds ratio is (exp(-0.0080634), exp(0.0092064)) = (0.991969, 1.009249)
26
Nemours Biomedical Research Simple logistic regression: Rcmdr output Call: glm(formula = Newvar ~ Ped, family = binomial(logit), data = data) Deviance Residuals: Min 1Q Median 3Q Max -1.354 -1.121 1.011 1.011 1.235 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.4055 0.3727 1.088 0.277 Ped[T.1] -0.5390 0.5223 -1.032 0.302 (Dispersion parameter for binomial family taken to be 1) Null deviance: 82.911 on 59 degrees of freedom Residual deviance: 81.836 on 58 degrees of freedom AIC: 85.836
27
Nemours Biomedical Research Logistic Regression Outcome (response) variable is binary Independent variable (predictor) can be either categorical or quantitative Relationship of outcome variable and predictor (s) is not linear
28
Nemours Biomedical Research Multiple Logistic Regression More than one independent variables in the model i.e. log(p/1-p) = b 0 + bx 1 + bx 2 Interpretation: the same as it is for the simple logistic regression Response variable is binary
29
Nemours Biomedical Research Multinomial and Ordinal Logistic Regressions Multinomial: The response (outcome) variable is multicategorical (e.g. race- Caucasian, African American, Hispanic, Asian etc). Ordinal: Categories of the response (outcome) variable can be ranked or order (e.g. disease condition: mild, moderate, and severe)
30
Nemours Biomedical Research Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.