Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith.
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Simple Logistic Regression
Overview of Logistics Regression and its SAS implementation
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Log-Linear Models & Dependent Samples Feng Ye, Xiao Guo, Jing Wang.
1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)
Linear statistical models 2009 Models for continuous, binary and binomial responses  Simple linear models regarded as special cases of GLMs  Simple linear.
Linear statistical models 2008 Binary and binomial responses The response probabilities are modelled as functions of the predictors Link functions: the.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
1 Modeling Ordinal Associations Section 9.4 Roanna Gee.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Analysis of Complex Survey Data Day 3: Regression.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Logistic regression for binary response variables.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Applied Epidemiologic Analysis - P8400 Fall 2002
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 9 Survival Analysis Henian Chen, M.D., Ph.D.
Logistic Regression. Linear Regression Purchases vs. Income.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 Everyday is a new beginning in life. Every moment is a time for self vigilance.
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
Sigmoidal Response (knnl558.sas). Programming Example: knnl565.sas Y = completion of a programming task (1 = yes, 0 = no) X 2 = amount of programming.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic regression (when you have a binary response variable)
EIPB 698D Lecture 5 Raul Cruz-Cano Spring Midterm Comments PROC MEANS VS. PROS SURVEYMEANS For non–parametric: Kriskal-Wallis.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 3 Type I, II Error, Sample Size, and Power Henian Chen, M.D., Ph.D.
Nonparametric Statistics
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
Analysis of matched data Analysis of matched data.
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Introduction to logistic regression a.k.a. Varbrul
ביצוע רגרסיה לוגיסטית. פרק ה-2
Applied Epidemiologic Analysis - P8400 Fall 2002
Introduction to Logistic Regression
Presentation transcript:

Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.

Applied Epidemiologic Analysis - P8400 Fall 2002 Data Files Today we will use the case-control study data of esophageal cancer. If you use “infile” statement to read the ‘case-control978. dat’ file,Please make sure that you corrected the miscoded values and the two abnormally high values for alcohol. I corrected case-control978.dbf, case-control978.wk3, and case-control978.txt. You are welcome to use one of them. proc import datafile='a:case-control978.txt' out=case_control978 dbms=tab replace; getnames=yes; run; proc import datafile='a:case-control978.wk3' out= case_control978 dbms=wk3 replace; getnames=yes; run; proc import datafile='a:case-control978.dbf' out= case_control978 dbms=dbf replace; run;

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression Model A regression model in which the dependent variable is binary (yes, no). A form of the generalized linear model in which the link function is the logit, and the regression parameters are expressed as log odds associated with unit increase in the predictors. For ordinal response outcomes (no pain, slight pain, substantial pain), we can model the cumulative logits by performing ordered logistic regression using the proportional odds model For nominal outcomes (Democrate, Republicans, Independents), we can model the generalized logits by performing logistic analysis using the log-linear model

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Intercept only SAS Program proc logistic data=case_control978 descending; model status=; run; * Descending: to get the probability and OR for dependent variable=1 SAS Output The LOGISTIC Procedure Model Information Data Set WORK.CASE_CONTROL978 Response Variable status Number of Response Levels 2 Number of Observations 978 Model binary logit Optimization Technique Fisher's scoring

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Intercept only SAS Output Response Profile Ordered Total Value status Frequency Probability modeled is status=1. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. -2 Log L = Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Intercept only 1. Calculate the log odds In our model, intercept (α) = , is the log odds of cancer for total sample 2. Take the antilog to get the odds Odds=exp( )= Divide the odds by (1+odds) to get the P (P means probability in cohort or population, in case-control study P means proportion) P = /( )= = 200/( ) P is related to α in Logistic Model

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Dichotomous Predictor Alcohol Consumption (alcgrp): 0=0-39 gm/day; 1=40+ gm/day SAS Program proc logistic data=case_control978 descending; model status=alcgrp; run; SAS Output Model Fit Statistics Criterion Intercept Only Intercept and Covariates -2 Log L Likelihood Ratio Test G = – = df = 1 The model with variable ‘alcgrp’ is significantly.

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Dichotomous Predictor SAS Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 alcgrp <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits alcgrp OR = exp(β) = exp(1.7641) = Heavy drinkers (alcgrp=1) are about 6 times more likely to get cancer than light drinkers (alcgrp=0). OR is not related to α in Logistic Model

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Dichotomous Predictor 1. Calculate the log odds Light drinkers (alcgrp=0), log odds= Heavy drinkers (alcgrp=1), log odds= = Take the antilog to get the odds Light drinkers, Odds=exp( )= Heavy drinkers, Odds=exp(-0.827)= Divide the odds by (1+odds) to get the P(x) Light drinkers, P(x)=0.0749/( )= Heavy drinkers, P(x)=0.4374/( )=0.3043

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Ordinal Predictor Alcohol Consumption (alcgrp4): 0=0-39 gm/day; 1=40-79 gm/day 2= gm/day; 3=120+ gm/day SAS Program proc logistic data=case_control978 descending; model status=alcgrp4; run; SAS Output Model Fit Statistics Criterion Intercept Only Intercept and Covariates -2 Log L Likelihood Ratio Test G = – = df = 1 The model with variable ‘alcgrp4’ is significantly.

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Ordinal Predictor SAS Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 alcgrp <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits alcgrp OR = exp(1.0453) = Men with alcgrp4=1 are about 3 times more likely to get cancer than men with alcgrp4=0. This OR is also for alcgrp4= 1 vs. alcgrp4=2; or alcgrp4=2 vs. alcgrp4=3. OR = exp[(3-1)*1.0453] = exp(2.0906) = for alcgrp4=1 vs. alcgrp4=3 OR = exp[(3-0)*1.0453] = exp(3.1359) = for alcgrp4=0 vs. alcgrp4=3

Applied Epidemiologic Analysis - P8400 Fall 2002 OR=exp(β x ) is a special case when 1. X is a binary variable 2. No interactions between X and other variables If X is not a binary variable OR=exp[β x (X*-X**)] If X is not a binary variable, and there is a interaction between X and W, OR=exp[(X*-X**)(β x + β xw W)]

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Continuous Predictor Alcohol Consumption (alcohol): daily consumption in grams SAS Program proc logistic data=case_control978 descending; model status=alcohol; run; SAS Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 alcohol <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits alcohol

Applied Epidemiologic Analysis - P8400 Fall 2002 Logistic Regression for Continuous Predictor OR = exp(0.0261) = The odds of cancer increase by a factor of for each unit in alcohol consumption OR = exp[40*(0.0261)] = exp(1.044) = for a 40-grams increase in alcohol consumption per day OR = exp[120*(0.0261)] = for a man who drinks 160 grams per day compare with a man who is similar in other respects but drinks 40 grams per day.

Applied Epidemiologic Analysis - P8400 Fall 2002 Interaction in Logistic Regression model status = α + β 1 alcgrp + β 2 tobgrp β 1 : the effect of alcohol on cancer, controlling for tobacco (i.e., the same OR across levels of tobacco) β 2 :the effect of tobacco on cancer, controlling for alcohol (i.e., the same OR across levels of alcohol) model status = α + β 1 alcgrp + β 2 tobgrp + β 3 alcgrp*tobgrp β 1 : the effect of alcohol on cancer among non-smokers (tobgrp=0) β 2 :the effect of tobacco on cancer among non-drinkers (alcgrp=0) β 3 : interaction between smokers and drinkers

Applied Epidemiologic Analysis - P8400 Fall 2002 Interaction in Logistic Regression model status = (alcgrp) (tobgrp) –0.98 (alcgrp*tobgrp) Log odds odds A: alcgrp=0 & tobgrp=0 2.28* *0 – 0.98*0*0 = B: alcgrp=1 & tobgrp=0 2.28* *0 – 0.98*1*0 = C: alcgrp=0 & tobgrp=1 2.28* *1 – 0.98*0*1 = D: alcgrp=1 & tobgrp=1 2.28* *1 – 0.98*1*1 = Odds Ratio A vs. B9.78 = 9.78/1.00 A vs. C3.97 = 3.97/1.00 A vs. D14.59 = 14.59/1.00 B vs. D1.49 = 14.59/9.78 C vs. D3.68 = 14.59/3.97