Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Applied Epidemiologic Analysis - P8400 Fall 2002 Data Files Today we will use the case-control study data of esophageal cancer. If you use “infile” statement to read the ‘case-control978. dat’ file,Please make sure that you corrected the miscoded values and the two abnormally high values for alcohol. I corrected case-control978.dbf, case-control978.wk3, and case-control978.txt. You are welcome to use one of them. proc import datafile='a:case-control978.txt' out=case_control978 dbms=tab replace; getnames=yes; run; proc import datafile='a:case-control978.wk3' out= case_control978 dbms=wk3 replace; getnames=yes; run; proc import datafile='a:case-control978.dbf' out= case_control978 dbms=dbf replace; run;
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression Model A regression model in which the dependent variable is binary (yes, no). A form of the generalized linear model in which the link function is the logit, and the regression parameters are expressed as log odds associated with unit increase in the predictors. For ordinal response outcomes (no pain, slight pain, substantial pain), we can model the cumulative logits by performing ordered logistic regression using the proportional odds model For nominal outcomes (Democrate, Republicans, Independents), we can model the generalized logits by performing logistic analysis using the log-linear model
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Intercept only SAS Program proc logistic data=case_control978 descending; model status=; run; * Descending: to get the probability and OR for dependent variable=1 SAS Output The LOGISTIC Procedure Model Information Data Set WORK.CASE_CONTROL978 Response Variable status Number of Response Levels 2 Number of Observations 978 Model binary logit Optimization Technique Fisher's scoring
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Intercept only SAS Output Response Profile Ordered Total Value status Frequency Probability modeled is status=1. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. -2 Log L = Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Intercept only 1. Calculate the log odds In our model, intercept (α) = , is the log odds of cancer for total sample 2. Take the antilog to get the odds Odds=exp( )= Divide the odds by (1+odds) to get the P (P means probability in cohort or population, in case-control study P means proportion) P = /( )= = 200/( ) P is related to α in Logistic Model
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Dichotomous Predictor Alcohol Consumption (alcgrp): 0=0-39 gm/day; 1=40+ gm/day SAS Program proc logistic data=case_control978 descending; model status=alcgrp; run; SAS Output Model Fit Statistics Criterion Intercept Only Intercept and Covariates -2 Log L Likelihood Ratio Test G = – = df = 1 The model with variable ‘alcgrp’ is significantly.
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Dichotomous Predictor SAS Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 alcgrp <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits alcgrp OR = exp(β) = exp(1.7641) = Heavy drinkers (alcgrp=1) are about 6 times more likely to get cancer than light drinkers (alcgrp=0). OR is not related to α in Logistic Model
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Dichotomous Predictor 1. Calculate the log odds Light drinkers (alcgrp=0), log odds= Heavy drinkers (alcgrp=1), log odds= = Take the antilog to get the odds Light drinkers, Odds=exp( )= Heavy drinkers, Odds=exp(-0.827)= Divide the odds by (1+odds) to get the P(x) Light drinkers, P(x)=0.0749/( )= Heavy drinkers, P(x)=0.4374/( )=0.3043
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Ordinal Predictor Alcohol Consumption (alcgrp4): 0=0-39 gm/day; 1=40-79 gm/day 2= gm/day; 3=120+ gm/day SAS Program proc logistic data=case_control978 descending; model status=alcgrp4; run; SAS Output Model Fit Statistics Criterion Intercept Only Intercept and Covariates -2 Log L Likelihood Ratio Test G = – = df = 1 The model with variable ‘alcgrp4’ is significantly.
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Ordinal Predictor SAS Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 alcgrp <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits alcgrp OR = exp(1.0453) = Men with alcgrp4=1 are about 3 times more likely to get cancer than men with alcgrp4=0. This OR is also for alcgrp4= 1 vs. alcgrp4=2; or alcgrp4=2 vs. alcgrp4=3. OR = exp[(3-1)*1.0453] = exp(2.0906) = for alcgrp4=1 vs. alcgrp4=3 OR = exp[(3-0)*1.0453] = exp(3.1359) = for alcgrp4=0 vs. alcgrp4=3
Applied Epidemiologic Analysis - P8400 Fall 2002 OR=exp(β x ) is a special case when 1. X is a binary variable 2. No interactions between X and other variables If X is not a binary variable OR=exp[β x (X*-X**)] If X is not a binary variable, and there is a interaction between X and W, OR=exp[(X*-X**)(β x + β xw W)]
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Continuous Predictor Alcohol Consumption (alcohol): daily consumption in grams SAS Program proc logistic data=case_control978 descending; model status=alcohol; run; SAS Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 alcohol <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits alcohol
Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Continuous Predictor OR = exp(0.0261) = The odds of cancer increase by a factor of for each unit in alcohol consumption OR = exp[40*(0.0261)] = exp(1.044) = for a 40-grams increase in alcohol consumption per day OR = exp[120*(0.0261)] = for a man who drinks 160 grams per day compare with a man who is similar in other respects but drinks 40 grams per day.
Applied Epidemiologic Analysis - P8400 Fall 2002 Interaction in Logistic Regression model status = α + β 1 alcgrp + β 2 tobgrp β 1 : the effect of alcohol on cancer, controlling for tobacco (i.e., the same OR across levels of tobacco) β 2 :the effect of tobacco on cancer, controlling for alcohol (i.e., the same OR across levels of alcohol) model status = α + β 1 alcgrp + β 2 tobgrp + β 3 alcgrp*tobgrp β 1 : the effect of alcohol on cancer among non-smokers (tobgrp=0) β 2 :the effect of tobacco on cancer among non-drinkers (alcgrp=0) β 3 : interaction between smokers and drinkers
Applied Epidemiologic Analysis - P8400 Fall 2002 Interaction in Logistic Regression model status = (alcgrp) (tobgrp) –0.98 (alcgrp*tobgrp) Log odds odds A: alcgrp=0 & tobgrp=0 2.28* *0 – 0.98*0*0 = B: alcgrp=1 & tobgrp=0 2.28* *0 – 0.98*1*0 = C: alcgrp=0 & tobgrp=1 2.28* *1 – 0.98*0*1 = D: alcgrp=1 & tobgrp=1 2.28* *1 – 0.98*1*1 = Odds Ratio A vs. B9.78 = 9.78/1.00 A vs. C3.97 = 3.97/1.00 A vs. D14.59 = 14.59/1.00 B vs. D1.49 = 14.59/9.78 C vs. D3.68 = 14.59/3.97