Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is 0.107 frequencyRelative.

Slides:



Advertisements
Similar presentations
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Advertisements

DTC Quantitative Methods Regression II: Thursday 13 th March 2014.
Logistic Regression Psy 524 Ainsworth.
Associations between Obesity and Depression by Race/Ethnicity and Education among Women: Results from the National Health and Nutrition Examination Survey,
Logistic Regression.
Simple Logistic Regression
Logistic Regression and Odds Ratios
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Measures of association
Two-Way Tables Two-way tables come about when we are interested in the relationship between two categorical variables. –One of the variables is the row.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Uncontrolled Hypertension, Systolic and Diastolic Blood Pressure and Development of Symptomatic Peripheral Arterial Disease in the Women’s Health Study.
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
INTRODUCTION TO EPIDEMIOLO FOR POME 105. Lesson 3: R H THEKISO:SENIOR PAT TIME LECTURER INE OF PRESENTATION 1.Epidemiologic measures of association 2.Study.
Unit 6: Standardization and Methods to Control Confounding.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Analysis of Categorical Data
The effects of initial and subsequent adiposity status on diabetes mellitus Speaker: Qingtao Meng. MD West China hospital, Chendu, China.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Assoc. Prof. Pratap Singhasivanon Faculty of Tropical Medicine, Mahidol University.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Measures of Association
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein.
1 Ch 11 Estimating Risk: Is There an Association? Table 11-1 A hypothetical investigation of a foodborne disease outbreak The suspect foods were identified.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Logistic and Nonlinear Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model.
Lipoatrophy and lipohypertrophy are independently associated with hypertension: the effect of lipoatrophy but not lipohypertrophy on hypertension is independent.
Association between Systolic Blood Pressure and Congestive Heart Failure Complication among Hypertensive and Diabetic Hypertensive Patients Mrs. Sutheera.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
SW318 Social Work Statistics Slide 1 Logistic Regression and Odds Ratios Example of Odds Ratio Using Relationship between Death Penalty and Race.
Describing the risk of an event and identifying risk factors Caroline Sabin Professor of Medical Statistics and Epidemiology, Research Department of Infection.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Coffee and Cardiovascular Disease
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
More Contingency Tables & Paired Categorical Data Lecture 8.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Logistic Regression Analysis Gerrit Rooks
Biostatistics in Practice Session 5: Associations and confounding Youngju Pak, Ph.D. Biostatisticianhttp://research.LABioMed.org/Biostat 1.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Logistic Regression An Introduction. Uses Designed for survival analysis- binary response For predicting a chance, probability, proportion or percentage.
Analysis of matched data Analysis of matched data.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Kelsey Vonderheide, PA1.  Heart Failure—a large number of conditions affecting the structure and function of the heart that make it difficult for the.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU
Multiple logistic regression
Kanguk Samsung Hospital, Sungkyunkwan University
Separate and combined associations of body-mass index and abdominal adiposity with cardiovascular disease: collaborative analysis of 58 prospective studies 
Evaluating Effect Measure Modification
Presentation transcript:

Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative frequency Have TB Don’t have TB Total680 2

The odds of having TB is Definition of odds Definition of odds : 3

The odds of 0.12 can be interpreted as: For the community in consideration, we expected about one eighth of them to have TB. Or inverting the odds, an individual is eight times as likely not to have TB as having TB. Interpreting odds 4

Association between a dependent variable and an independent variable If an independent variable impacts or has a relationship with the dependent variable, it will change the odds of being in the key dependent variable group( group with the event of interest) For example suppose we have information on HIV status of the individuals whose “TB status” we had earlier: 5

The impact of HIV on TB status can be measured using odds ratio 6 Have TBDon’t have TBTotal HIV+ve HIV-ve Total

The odds ratio is equal to The odds of having TB for HIV +ve individuals is 57/32= The odds of having TB for HIV-ve individuals is 106/505=

Therefore the odds ratio is /0.21= 8.48 The odds ratio can also be calculated directly as Interpretation: HIV +ve individuals are 8.5 times more likely to have TB compared to HIV-ve individuals. 8

Another example….. The table below gives the contingency table of number of women in a study according to use of contraceptive pill and presence/absence of myocardial infarction Myocardial infraction total contraceptiveYesNo using pill not using pill Total

The odds of women using the pill having infraction is 23/49 = And the odds of women not using the pill having infraction is 35/132= Thus the odds ratio having infraction for women using the pill compared to those not using the pill is 0.469/0.265=

Women using the pill are 1.77 times more likely to have myocardial infraction compared to women not using the pill. women using the pill are 77% [ ( ) x 100%] more likely to have myocardial infraction compared to women not using the pill. Interpreting the odds ratio 11

Logistic model It is a mathematical expression used to determine if a relationship exists between a binary dependent variable and a set of independent variables Logistic regression combines the independent variables to estimate the probability that a particular event will occur, i.e. an individual will be a member of one of the groups defined by the binary dependent variable 12

13 If we have only one independent variable, the model is log(odds of event) = a + b  predictor If we have two or more predictors, the model is log(odds of event) = a + b 1  predictor 1 + b 2  predictor 2 +….. + b k  predictor k b 1, b 2, …., b k are known regression coefficients

The independent variables can be either qualitative (categorical) or quantitative (continuous). The independent variables usually include exposure variables, potential confounders and potential effect modifiers 14 Measurements of independent variables:

15 Interpreting output of logistic regression If a coefficient is positive, its transformed log value will be greater than one, meaning that the modeled event is more likely to occur. If a coefficient is negative, its transformed log value will be less than one, and the odds of the event occurring decrease. A coefficient of zero (0) has a transformed log value of 1.0, meaning that this coefficient does not change the odds of the event one way or the other

16 The transformed log value is an odds ratio. For a qualitative independent variable, one level of the variable is selected as an reference and the other levels compared to it. For example, using gender as a variable; then suppose female is chosen as reference then the coefficient corresponding to this variable is interpreted using

17 Another example, suppose age is categorized into five groups : , 30 – 39, 40 – 49, 50 – 59, ; and 20 – 29 group is chosen as the reference group and then have four odds ratio (OR) for this variable

18 OR = odds of having disease for age group 30 – 39 odds of having disease for age group 20 – 29 OR= odds of having disease for age group 40 – 49 odds of having disease for age group 20 – 29

19 OR = odds of having disease for age group 50 – 59 odds of having disease for age group 20 – 29 OR = odds of having disease for age group 60 – 69 odds of having disease for age group 20 – 29

20 For a quantitative independent variable, then we compare two groups with a difference of one unit of measurement of the variable. For example if blood pressure is an independent variable Then odds ratio is OR = odds of having disease for those with (x + 1)mm/Hg odds of having disease for those with x mm/Hg where x is say 120

21 Each independent variable is interpreted adjusting for others. When reporting the results it is advised to report both the unadjusted and adjusted odds ratios

22 A study is designed to assess the association between obesity (defined as BMI > 30) and incident cardiovascular disease. Data were collected from participants who were between the ages of 35 and 65, and free of cardiovascular disease (CVD) at baseline. Each participant was followed for 10 years for the development of cardiovascular disease. A logistic regression analysis is fitted to assess the association between obesity(independent variable ) and CVD (present=1,absent=0) For independent variable, non obese persons is reference group. Example:

23 independent variable regression coefficient (  ) z-valuep-value Exp(  ) (odds ratio) 95% for odds ratio constant obesity The results of the fit were: exp(0.658) = 1.93, is the unadjusted odds ratio. The odds of developing CVD are 1.93 times higher among obese persons as compared to non obese persons. Obese persons are 93% more likely to develop CVD. The association between obesity and incident CVD is statistically significant (p=0.0017).

24 When examining the association between obesity and CVD, age was determined as a confounder. To adjust for it, a logistic regression model is fitted with obesity and age as independent variables and CVD as the dependent variable. Age is categorized as less than 50 years of age and 50 years of age and older. For the analysis, age group of less than 50 years of age is reference group.

25 The fitted model was: Log(odds of developing CVD)= obesity+0.655age Exp(0.415)=1.52; the odds of developing CVD are 1.52 times higher among obese persons as compared to non obese persons, adjusting for age. This is adjusted odd ratio.

26 Example: Researchers examined the relationship between coronary heart disease (CHD) risk and the risk factors: age(in years), cholesterol (mg/dL), systolic blood pressure (mmHg), body mass index (BMI) and smoking status. Using a logistic model, they obtained the results below:

27 independent variable regression coefficient (  ) z-valuep-value Exp(b) Odds ratio 95% for odds ratio constant age < cholestrol < sbp < bmi smokes <

28 Adjusting for cholesterol level, systolic blood pressure, body mass index and smoking status; for every additional year in age, an individual is 1.06 times more likely to have CHD. Alternatively, we can say that an is 6% more likely to have CHD for every additional age while adjusting for other variables. The unadjusted odds ratio for age was 1.08 Age is a significant predictor since p-value is small( <0.001).