Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein.

Slides:



Advertisements
Similar presentations
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Thomas Grein, Alain Moren.
Advertisements

Continued Psy 524 Ainsworth
Statistical Analysis SC504/HS927 Spring Term 2008
Exploring the Shape of the Dose-Response Function.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Interpreting regression for non-statisticians Colin Fischbacher.
Logistic Regression.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
An Introduction to Logistic Regression
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
C. Logit model, logistic regression, and log-linear model A comparison.
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Simple Linear Regression
Assoc. Prof. Pratap Singhasivanon Faculty of Tropical Medicine, Mahidol University.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Logistic Regression. Conceptual Framework - LR Dependent variable: two categories with underlying propensity (yes/no) (absent/present) Independent variables:
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Analytical epidemiology Disease frequency Study design: cohorts & case control Choice of a reference group Biases Alain Moren, 2006 Impact Causality Effect.
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
Lecture 12: Cox Proportional Hazards Model
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Logistic Regression Analysis Gerrit Rooks
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Analysis of matched data Analysis of matched data.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Kanguk Samsung Hospital, Sungkyunkwan University
Introduction to Logistic Regression
Logistic Regression.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Önder Ergönül, MD, MPH Koç University, School of Medicine
Presentation transcript:

Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein

Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data OC MIControlsOR Yes No Ref. Total

Oral contraceptives (OC) and myocardial infarction (MI) Case-control study, unstratified data Smoking MIControlsOR Yes No Ref. Total

Odds ratio for OC adjusted for smoking = 4.5

Ebola 6 2 potential risk factors 2 Contact with a case 2 Contact with the hospital

Number of cases One case Days Cases of gastroenteritis among residents of a nursing home, by date of onset, Pennsylvania, October 1986

ProteinTotalCasesAR%RR suppl. YES NO Total Cases of gastroenteritis among residents of a nursing home according to protein supplement consumption, Pa, 1986

Sex-specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 SexTotalCases AR(%)RR & 95% CI Male225 23Reference Female ( ) Total

Attack rates of gastroenteritis among residents of a nursing home, by place of meal, Pa, 1986 MealTotal CasesAR(%)RR & 95% CI Dining room Reference Bedroom ( ) Total

Age – specific attack rates of gastroenteritis among residents of a nursing home, Pa, 1986 Age groupTotalCasesAR(%) Total

Attack rates of gastroenteritis among residents of a nursing home, by floor of residence, Pa, 1986 FloorTotalCasesAR (%) One12325 Two Three30723 Four Total

Multivariate analysis Multiple models –Linear regression –Logistic regression –Cox model –Poisson regression –Loglinear model –Discriminant analysis – Choice of the tool according to the objectives, the study, and the variables

Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women

SBP (mm Hg) Age (years) adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

Simple linear regression Relation between 2 continuous variables (SBP and age) Regression coefficient  1 –Measures association between y and x –Amount by which y changes on average when x changes by one unit –Least squares method y x Slope

Multiple linear regression Relation between a continuous variable and a set of i continuous variables Partial regression coefficients  i –Amount by which y changes on average when x i changes by one unit and all the other x i s remain constant –Measures association between x i and y adjusted for all other x i Example –SBP versus age, weight, height, etc

Multiple linear regression Predicted Predictor variables Response variable Explanatory variables Outcome variable Covariables Dependent Independent variables

Logistic regression (1) Table 2 Age and signs of coronary heart disease (CD)

How can we analyse these data? Compare mean age of diseased and non-diseased –Non-diseased: 38.6 years –Diseased: 58.7 years (p<0.0001) Linear regression?

Dot-plot: Data from Table 2

Logistic regression (2) Table 3 Prevalence (%) of signs of CD according to age group

Dot-plot: Data from Table 3 Diseased % Age group

Logistic function (1) Probability of disease x

Transformation logit of P(y|x) {  = log odds of disease in unexposed  = log odds ratio associated with being exposed e  = odds ratio

Fitting equation to the data Linear regression: Least squares Logistic regression: Maximum likelihood Likelihood function –Estimates parameters  and  –Practically easier to work with log-likelihood

Maximum likelihood Iterative computing –Choice of an arbitrary value for the coefficients (usually 0) –Computing of log-likelihood –Variation of coefficients’ values –Reiteration until maximisation (plateau) Results –Maximum Likelihood Estimates (MLE) for  and  –Estimates of P(y) for a given value of x

Multiple logistic regression More than one independent variable –Dichotomous, ordinal, nominal, continuous … Interpretation of  i –Increase in log-odds for a one unit increase in x i with all the other x i s constant –Measures association between x i and log-odds adjusted for all other x i

Statistical testing Question –Does model including given independent variable provide more information about dependent variable than model without this variable? Three tests –Likelihood ratio statistic (LRS) –Wald test –Score test

Likelihood ratio statistic Compares two nested models Log(odds) =  +  1 x 1 +  2 x 2 +  3 x 3 (model 1) Log(odds) =  +  1 x 1 +  2 x 2 (model 2) LR statistic -2 log (likelihood model 2 / likelihood model 1) = -2 log (likelihood model 2) minus -2log (likelihood model 1) LR statistic is a  2 with DF = number of extra parameters in model

Coding of variables (2) Nominal variables or ordinal with unequal classes: –Tobacco smoked: no=0, grey=1, brown=2, blond=3 –Model assumes that OR for blond tobacco = OR for grey tobacco 3 –Use indicator variables (dummy variables)

Indicator variables: Type of tobacco Neutralises artificial hierarchy between classes in the variable "type of tobacco" No assumptions made 3 variables (3 df) in model using same reference OR for each type of tobacco adjusted for the others in reference to non-smoking

Reference Hosmer DW, Lemeshow S. Applied logistic regression. Wiley & Sons, New York, 1989

Logistic regression Synthesis

Salmonella enteritidis Protein supplement S. Enteritidis gastroenteritis Sex Floor Age Place of meal Blended diet

Unconditional Logistic Regression Term Odds Ratio 95% C.I.Coef.S. E. Z- Statistic P- Value AGG (2/1)1,67950,263410,70820,51850,94520,54860,5833 AGG (3/1)1,75700,32499,50220,56360,86120,65450,5128 Blended (Yes/No)1,03450,32773,26600,03390,58660,05780,9539 Floor (2/1)1,61260,26759,72200,47780,91660,52130,6022 Floor (3/1)0,72910,09915,3668-0,31591,0185-0,31020,7564 Floor (4/1)1,11370,15737,88700,10760,99880,10780,9142 Meal1,59420,49535,13170,46640,59650,78190,4343 Protein (Yes/No)9,09183,021927,35332,20740,56203,92780,0001 Sex1,30240,22787,44680,26420,88960,29700,7665 CONSTANT***-3,00802,0559-1,46310,1434

Unconditional Logistic Regression TermOdds Ratio 95% C.I.CoefficientS. E.Z-StatisticP-Value Age1,02340,96601,08420,02310,02940,78480,4326 Blended (Yes/No)1,01840,32203,22070,01830,58740,03110,9752 Floor (2/1)1,64400,27459,84680,49710,91330,54430,5862 Floor (3/1)0,71320,09725,2321-0,33791,0167-0,33240,7396 Floor (4/1)1,07080,15227,53220,06840,99530,06870,9452 Meal1,65610,52365,23790,50450,58750,85870,3905 Protein (Yes/No)8,76782,952126,04032,17110,55543,90910,0001 Sex1,19570,21356,69810,17870,87910,20330,8389 CONSTANT***-4,28962,8908-1,48390,1378

Logistic Regression Model Summary Statistics ValueDFp-value Deviance107, Likelihood ratio test34,80688< Parameter Estimates 95% C.I. TermsCoefficientStd.Errorp-valueORLowerUpper %GM-1,88571,04200,07030,15170,01971,1695 SEX ='2'0,21390,88120,80821,23850,22026,9662 FLOOR ='2'0,49870,90830,58291,64660,27769,7659 ²FLOOR ='3'-0,32351,01500,75000,72360,09905,2909 FLOOR ='4'0,10880,98390,91191,11500,16217,6698 MEAL ='2'0,53080,56130,34431,70020,56595,1081 Protein ='1'2,18090,5303< ,85413,131625,034 TWOAGG ='2'0,19040,51620,71221,20980,43993,3272 Termwise Wald Test TermWald Stat.DFp-value FLOOR1,081230,7816

Poisson Regression Model Summary Statistics ValueDFp-value Deviance60, Likelihood ratio test67,73788< Parameter Estimates 95% C.I. TermsCoefficientStd.Errorp-valueRRLowerUpper %GM-1,82130,84460,03100,16180,03090,8471 SEX ='2'0,12950,71060,85541,13830,28274,5828 FLOOR ='2'0,25030,68670,71541,28440,33444,9343 FLOOR ='3'-0,14220,80320,85950,86740,17974,1877 FLOOR ='4'0,13680,72630,85061,14660,27614,7608 MEAL ='2'0,23730,38540,53811,26780,59562,6987 Protein ='1'1,06580,34130,00182,90321,48715,6679 TWOAGG ='2'0,06450,36820,86111,06660,51822,1951 Termwise Wald Test TermWald Stat.DFp-value FLOOR0,417830,9365

Cox Proportional Hazards TermHazard Ratio95%C.I.CoefficientS. E.Z-StatisticP-Value _AGG (2/1)1,06660,51832,1950,06450,36820,1750,8611 Floor(2/1)1,28440,33444,93420,25030,68670,36460,7154 Floor(3/1)0,86740,17974,1876-0,14220,8032-0,1770,8595 Floor(4/1)1,14660,27614,76070,13680,72630,18830,8506 Meal (2/1)1,26780,59572,69860,23730,38540,61570,5381 Protein(Yes/No)2,90321,48715,66781,06580,34133,12250,0018 Sex (2/1)1,13830,28274,58270,12950,71060,18220,8554 Convergence:Converged Iterations:5 -2 * Log-Likelihood:346,0200 TestStatisticD.F.P-Value Score17,172770,0163 Likelihood Ratio15,488970,0302