Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.

Slides:



Advertisements
Similar presentations
Lecture 11 (Chapter 9).
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Simple Logistic Regression
Departments of Medicine and Biostatistics
PROC GLIMMIX: AN OVERVIEW
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression

Instructor: K.C. Carriere
Generalised linear models
Final Review Session.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Log-linear and logistic models
EPI 809/Spring Multiple Logistic Regression.
Mixed models Various types of models and their relation
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Modeling clustered survival data The different approaches.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Generalized Linear Models
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
GEE and Generalized Linear Mixed Models
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)
Overall agenda Part 1 and 2  Part 1: Basic statistical concepts and descriptive statistics summarizing and visualising data describing data -measures.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Lecture 8: Generalized Linear Models for Longitudinal Data.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
Analyses of Covariance Comparing k means adjusting for 1 or more other variables (covariates) Ho: u 1 = u 2 = u 3 (Adjusting for X) Combines ANOVA and.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
HSRP 734: Advanced Statistical Methods June 19, 2008.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
Nonparametric Statistics
1 Statistics 262: Intermediate Biostatistics Mixed models; Modeling change.
Analysis of matched data Analysis of matched data.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
April 18 Intro to survival analysis Le 11.1 – 11.2
Generalized Linear Models
Generalized Linear Models
Introduction to logistic regression a.k.a. Varbrul
Nonparametric Statistics
What is Regression Analysis?
Presentation transcript:

Different Distributions David Purdie

Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression Survival data: –Cox regression

General form for distributions from the exponential family Outcome for subject i at time j = Y ij E(Y ij )=  ij Generalized linear model g(  ij )=X i  where X i =(x i1,…,x ij ) is the matrix of covariates for subject i

Binary outcomes: logistic regression Outcome: Pr(Y ij = 1) =  ij (probability of an event) Pr(Y ij = 0) = 1-  ij. Logit link function: Logistic model: where  ij = E(Y ij |X i )

Events over time: Poisson regression Outcome: Y i = number of events in time period t i E(Y i ): i t i Var(Y i t i )= i t i (were i is the event rate) Log link function: log ( i ) Poisson model:

Survival data: Cox regression Parameter: t ij (time to event y ij ) Based on a hazard function: h t Outcome: T ij = time till event y ij Log link function: log (h t ) Cox model: where  t is the baseline hazard rate.

Alternating logistic regression If the responses are binary, it may make more sense to use a matrix of odds ratios rather than correlations. Replace corr(Y ij, Y ik ) with: The ALR algorithm models  ijk = log{OR(Y ij,Y ik )} as:  ijk =z ijk  where  are regression parameters and z is fixed and needs to be specified

Mixed Models for Non-Normal Data E(y|u)= , var(y|u)=  V(  ), g(  )=X  +Zu Random coefficients u have dist f(u) y|u has the usual glm distribution Binary outcome: –binomial for y|u and beta for u Count outcome: –Poisson for y|u and gamma for u

Example - binary Study of bladder cancer All patients had superficial bladder tumours on entry which were removed Two randomly allocated treatments ( group  ): – Placebo (n=47), Thiotepa (n=38) Many multiple recurrences of tumours Month  is month since treatment (1 to 53) Baseline covariates of number of initial tumours ( number  ) & size of largest tumour ( size  ) Lots of missing data: 3585 out of 4505 potential observations (80%) are missing Model missing data (yes/no) using a binomial GEE to assess if data is missing at random (logit link function)  Name in data set

Visits per subject NMeanMinMax Placebo Thiotepa Total

Plot of missing proportion over time

Format for the data in SAS Subjectgroupnumbercountmonthmissingsize

Logistic GEE in SAS proc genmod data=tumour_miss descending; class group subject month; model missing=group month size number / dist=binomial type3; repeated subject=subject / type=ind corrw within=month; estimate 'effect of thiotepa' group -1 1/ exp; run;

ORs for group (Thiotepa vs plac) Corr structureOR95% CIP-value ind exch (  =0.12) AR(1) mdep(1) mdep(3) – unstr--- Log OR structure Logor=exch (OR=1.05) –

Example - Poisson Response: number of new tumours ( count  ) Month  is month since treatment (1 to 53) Baseline covariates of number of initial tumours ( number  ) & size of largest tumour ( size  ) Timesince  is the number of months since the last visit Missing data are dependent upon treatment group and time Model new tumour counts using a Poisson GEE to assess treatment effect (log link function)  Name in data set

Count of tumours by treatment group NMeanStdMinMax Placebo Thiotepa

New tumour counts over time by treatment group

Plot of observed means over time

Poisson GEE in SAS proc genmod data=tumour_count; class group subject month; model count=group size number timesince / dist=poisson scale=deviance; repeated subject=subject / type=exch withinsubject=month corrw; estimate 'effect of thiotepa' group -1 1/ exp; run;

RRs for group (Thiotepa vs plac) Corr structureOR95% CIP-value ind exch (  =0.08) AR(1) mdep(1) mdep(5) unstr* *WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.

Using an offset data tumour_count; set tumour_count; off=log(timesince+1); run; proc genmod data=tumour_count; class group subject month; model count=group size number / dist=poisson scale=deviance offset=off type3; repeated subject=subject / type=unstr withinsubject=month; estimate 'effect of thiotepa' group -1 1/ exp; run;

RRs for group (Thiotepa vs plac) Corr structureOR95% CIP-value ind exch (  =0.07) AR(1) mdep(1) mdep(5) unstr* *WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.

Interpretation and Presentation Descriptive: plots of means or tables of means (percentages, etc.) Tables of parameter estimates and confidence intervals (odds ratios or relative risks) P-values for effects or interactions (possibly just in the text) Emphasize results from descriptive analysis and effect estimates.

Statistical Methods What is the distribution of the outcome? How were the data summarized? Due to the repeated nature of the data, a generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups. What was the form of the correlation structure? What hypotheses were being tested? How were missing data handled? How were variances calculated? What statistical package was used?

Example: Statistical Methods Mean numbers of new tumours were used to summarise the data. Poisson regression was used to model tumour counts using the time between successive observations as an offset. Due to the repeated nature of the data, a generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups. The main hypothesis being tested was whether Thiotepa affected the numbers of new tumours. The correlation between successive observations was examined and an appropriate correlation structure was specified. Drop outs and non-attendance was examined to assess for differences between the treatment groups. Robust variance estimate techniques were used to calculate standard errors and confidence intervals. All analysis were performed using SAS version 8.2.