Higher Order Contingency Tables and Logistic Regression Copyright © 1999-2007 Leland Stanford Junior University. All rights reserved. Warning: This presentation.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Three or more categorical variables
Brief introduction on Logistic Regression
M2 Medical Epidemiology
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Simple Logistic Regression
HSRP 734: Advanced Statistical Methods July 24, 2008.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Merging with SQL HRP223 – 2011 October 31, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
EPI 809/Spring Multiple Logistic Regression.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Summary of Quantitative Analysis Neuman and Robson Ch. 11
SAS for Categorical Data Copyright © 2004 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Simple Linear Regression
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Categorical Data Analysis: When life fits in little boxes AnnMaria DeMars, PhD.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Simple linear regression Tron Anders Moger
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
01/20151 EPI 5344: Survival Analysis in Epidemiology Confounding and Effect Modification March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
HRP 223 – 2007 lm.ppt - Linear Models Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Nonparametric Statistics
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Analysis of matched data Analysis of matched data.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
If we can reduce our desire,
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Lab 2 and Merging Data (with SQL)
15.1 The Role of Statistics in the Research Process
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Applied Statistics Using SPSS
Applied Statistics Using SPSS
Logistic Regression.
Presentation transcript:

Higher Order Contingency Tables and Logistic Regression Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Predicting an Outcome A major goal for Epidemiology is quantifying the relationship between sets of disease predictors and binary outcomes like diseased/disease free. The first step in describing the relationship between your predictor(s) and your outcome is to do univariate analyses. That is, test for an association between each of your predictors and the outcome.

Predicting an Outcome After you assess the univariate relationships between your predictors and your outcome, you will want to look for effect modification (what everyone else calls interactions) and confounding. You may want to look for higher order interactions. You should look for all interactions which could be there based on subject matter knowledge. Do not test for interactions that you can not explain (in your native language)! I can think about three-way interactions but I can not get my brain around four-way.

Predicting an Outcome Prior to doing any analyses, write out on a spreadsheet all of the effects that interest you. You then will test for those, and only those, effects.

Univariate Analysis: Is There an Association? The way you assess the relationship between your predictors and the outcome depends on your data. If you have a 2x2 table, you just look at the confidence interval for the odds ratio (OR). Otherwise: Row Variable Column Variable Proc Freq Switch Statistic Nominal chisq Pearson  2 NominalOrdinalcmh2Mean Score Ordinal cmh1 Mantel Haenszel  2

Univariate Analysis (2) Strength of an Association If you are looking at a 2x2 table, you can assess the strength of the association with the odds ratio. Otherwise: You can get them from the /measures switch on the tables statement of proc freq. Row Variable Column Variable Statistic NominalNominal or Ordinal Uncertainty Coefficient c|r Ordinal Spearman Correlation

Sets of Univariate Statistics You can request all the univariate measures like this: proc freq data=blah; tables (sex pih_total)*pre_term_l /chisq cmh measures; run; List all your predictors here…and your outcome here.

Sets of 2x2 Tables (1) Cochran Mantel-Haenszel You will need to do analyses where the relationship between the predictor and disease is (at least partially) influenced by a third factor. This third factor can be a stratification factor from your study design or a confounder that you did not block for (i.e., group on). Regardless of the source, you can use the Cochran Mantel-Haenszel method to neutralize the third variable.

Sets of 2x2 Tables (2) Confounding & Interaction Invoking the CMH technique is simple. You add the extra variable (potential confounder) to the left side of the tables statement and add /cmh to the end of the line: tables school * exposure * spots / cmh; This will cause SAS to print out a contingency table for each of the levels of the confounder and the common OR/RR. It does not print out the OR and RR for the subtables. To get them, add the measures switch with the CMH.

(2x)2x2 Ignoring Strata Data spots2; input school exposure $ health $ count datalines; 1 Exposed diseased 38 1 Exposed healthy 4 1 NotExp diseased 10 1 NotExp healthy 21 2 Exposed diseased 20 2 Exposed healthy 57 2 NotExp diseased 10 2 NotExp healthy 17 ;run; proc freq data = spots2; weight count; tables exposure*health/norow nocol chisq cmh; run;

This is the crude odds ratio.

2x2x2 Using Strata Tables (3) Results proc freq data = spots2; weight count; tables school*exposure*health /norow nocol chisq cmh measures; run; All is not well! Do not use the summary table. This is the adjusted odds ratio.

Simpson’s Paradox It is possible to have significant (but opposed) effects in the levels of the covariate, and the overall CMH statistic will indicate NO effects. The moral is to always look at your partial tables.

Exact Tests By default, SAS gives you approximate tests and p values for almost all statistics in proc freq. You can request exact measures. proc freq data = spots2; weight count; exact or; tables exposure*health/norow nocol chisq cmh; run;

Exact Tests (2) Exact tests take time and computer power but run them if you can.

Which CMH Summary? If you have a 2x2 table, then all of the CMH values will be the same. tables treat*response / chisq cmh; If you have a 2xN table, then use nonzero correlation or row mean scores differ. tables treat*response / cmh cmh2; If you have a Nx2 table, then use the nonzero correlation. tables treat*response / cmh cmh1 ;

Test for Trend Looking for a dose response in your predictor is important. If you would like to test for an increasing or decreasing trend in the binomial proportions across the levels of your ordinal variable, you can tell SAS to do a Cochran-Armitage test for trend. To do this, just include the keyword trend on the tables line: tables expLevel*hasCancer/cmh chisq measures trend;

Beyond Contingency Tables SAS provides you powerful ways of analyzing contingency table data. yProc freq provides you with all the tools you need to analyze 2x2 tables. yProc freq becomes more and more awkward as your table sizes increase. yInstead, you will use multiple/redundant modeling techniques.

Predicting Outcomes In other disciplines where outcomes are not dichotomous (e.g., alive or dead) or ordinal (e.g., high, medium or low risk), predictions are regularly done using linear regression techniques. yOutcome = base level + some relationship of the predictors to the outcome.

Problems with Regression Ordinary (least squares) linear regression is not well suited to predict a binary outcome, frequency counts or percentiles. yvalues outside of the possible range ynon-integer values yissues with variance Instead, epidemiologists typically use two other types of regression techniques. yLogistic or Poisson

When to Use Logistic Regression You use LR when you want to predict a binary outcome, say diseased vs. not diseased, and you know that you have numeric covariates (confounding variables) that you want to account for. It is analogous to ANCOVA for continuous outcomes. You choose one outcome and call it the ‘event.’ yMost people have a variable for each ‘bad thing’ in their data sets and code the event as a 1.

Age and Wisdom (1) Continuous Outcome Let’s say you have a complex measure of ‘wisdom’ and you want to predict it with age.

Age and Wisdom (2) Continuous Prediction Conceptually, you can see that a line predicts this data nicely. Percent wise = 1.63+age*.96

Age and Wisdom (3) Categorical Outcome If it is scored as a binary measure, no matter how well you place a line, your predictions are going to be way off.

Age and Wisdom (4) Categorical Prediction Ideally, you want some function that is close to a step function.

Logistic Fit With logistic regression you get the probability of going into the event group (which is the wise group in this case) expressed in terms of odds. yComplete separation of groups is actually a problem…. More on that later.

Odds and Probabilities I have a hard time thinking in terms of odds. Fortunately, it is easy to convert back and forth between probabilities and odds. prob = odds/(odds+1); odds = prob/(1-prob);

Why Odds Anyway? Odds are used to counteract the fact that linear regression produces probability values outside the range of 0 and 1. Going with an odds forces the upper bound on the probability. The lower bound is achieved by taking the natural log of the regression value.

Why Odds Anyway? (2) So whereas from ordinary linear regression you get: Probability = baseline+(predictor*weight value) wise=1.63+age*.96 In logistic regression you calculate: LN(probability/1-probability)= baseline+(predictor*weight value)

What Values Do You Want? With LS regression you get beta weights (parameter estimates) that tell you how much the outcome changes with each unit of the predictor.  wise=1.63+age*.96 With LR your parameter estimates are in log odds terms which no one can understand, but if you raise the values to the log base e, then the values make some sense.  odds wise=e baseline +age*e value Every unit of age increases your wisdom by about 1. Every unit of age increases your odds of being wise by this amount.

Enough! How Do I Do It? SAS provides you with five procedures that all do logistic regression. ylogistic – quick and friendly ygenmod – much more powerful yprobit – this is the only time I’ll mention it… ycatmod – more than binary outcomes yphreg – conditional logistic for matched case- control data

Fitting a Model Fitting a logistic model is easy with the logistic procedure. But there is one trick. For some (stupid) reason SAS wants to predict group membership into the lowest category (i.e., it wants events to be 0 and non-events to be 1). Typically people use the descending (abbreviated desc) option to make SAS call the events “1” and non-events “0.” proc logistic data = blah descending; model outcome = predict1 predict2; run;

The goal here is to predict who would get severe eclampsia using two of the mothers’ blood chemistries. The primary hypothesis for the study says that these two factors are related to eclampsia. Later I will show you how to choose a good set of predictors from a large set. proc logistic data = ana_temp desc; model severe_pre=dsl_igf dsl_insuli; run; A Real Example Notice the abbreviation of descending.

A Real Example (2) Logistic regression uses a mathematical technique called maximum likelihood estimation, which is not guaranteed to produce a result. Rather, it tries to converge on a valid solution through successive approximations. If it fails to converge on an answer, you have a problem that statisticians like to call infinite parameters.

A Real Example (3) For now, only pay attention to these two sections. Verify that your cases are listed first by looking at the frequency. Check the convergence.

A Real Example (4) This tests whether the model is any good at all. You want to reject the hypothesis of a worthless model. This tells you about the value of the predictors. The “point estimate” is e estimate. It tells you the impact on the predicted odds based on a one unit increase in the predictor. Notice that neither is a statistically significant predictor.

Beta = 0 Statistics These statistics test to see if all your predictors are not good. They are all asymptotically equivalent. If they are wildly different, like this example, you probably have power problems. yThe Likelihood Ratio statistic (AKA: –2 Log L) is preferable for smaller samples. They usually do not differ.

Proc Logistic Improved Students don’t like specifying descending because it is confusing. In modern versions of proc logistic you can specify the event explicitly. model cancer = pack; strata center; proc logistic data=ana_temp; model severe_pre (event = "Sick") = dsl_igf dsl_insuli/plcl plrl; units dsl_igf = 10; run;

Enterprise Guide

Categorical Predictors You interpret the exponentiated parameter estimates as the change in odds of an event associated with a one unit increase in the predictor. What happens when you have a categorical predictor? You want to have a model that tells you the change in your odds of an event when you are in a group relative to a referent group.

Categorical Predictors You can get SAS to give you the odds of an event given in a category relative to a referent group. Say you have packs of cigarettes smoked per day as a variable called “packs” with the values: none, half, full, many. proc logistic data = lung; class packs (ref="none")/ param = ref; model cancer (event = "Sick") = pack; Run;