How does Biostatistics at Roche typically analyze longitudinal data

Slides:



Advertisements
Similar presentations
MCMC estimation in MlwiN
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Bayesian posterior predictive probability - what do interim analyses mean for decision making? Oscar Della Pasqua & Gijs Santen Clinical Pharmacology Modelling.
Departments of Medicine and Biostatistics
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multilevel modeling in R Tom Dunn and Thom Baguley, Psychology, Nottingham Trent University
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Bayesian Methods for Benefit/Risk Assessment
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
A Longitudinal Study of Maternal Smoking During Pregnancy and Child Height Author 1 Author 2 Author 3.
Sample Size Determination
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Standard Error of the Mean
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Repeated measures: Approaches to Analysis Peter T. Donnan Professor of Epidemiology and Biostatistics.
Evaluating and quantifying benefit of exposure-response modeling for dose finding José Pinheiro and Chyi-Hung Hsu Novartis Pharmaceuticals PAGE Satellite.
Introduction to Multilevel Modeling Using SPSS
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Simple Linear Regression
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Model Inference and Averaging
Lecture 8: Generalized Linear Models for Longitudinal Data.
Scientific question: Does the lunch intervention impact cognitive ability? The data consists of 4 measures of cognitive ability including:Raven’s score.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Bayesian Analysis and Applications of A Cure Rate Model.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.
Federal Institute for Drugs and Medical Devices The BfArM is a Federal Institute within the portfolio of the Federal Ministry of Health (BMG) The use of.
Medical Statistics as a science
Mixed models. Concepts We are often interested in attributing the variability that is evident in data to the various categories, or classifications, of.
Lecture 12: Cox Proportional Hazards Model
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 4: An Alternative to Last-Observation-Carried-Forward:
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.
Sample Size Determination
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Tutorial I: Missing Value Analysis
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Bursts modelling Using WinBUGS Tim Watson May 2012 :diagnostics/ :transformation/ :investment planning/ :portfolio optimisation/ :investment economics/
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Simulation setup Model parameters for simulations were tuned using repeated measurement data from multiple in-house completed studies and baseline data.
Repeated measures: Approaches to Analysis
Sample Size Determination
CHAPTER 7 Linear Correlation & Regression Methods
Model Inference and Averaging
How to handle missing data values
Nonlinear Regression KNNL – Chapter 13.
An Introductory Tutorial
Statistical inference for the slope and intercept in SLR
Yu Du, PhD Research Scientist Eli Lilly and Company
How Should We Select and Define Trial Estimands
Presentation transcript:

Longitudinal Data Analysis at Roche: A Biostatistics Perspective Ulrich Beyer

How does Biostatistics at Roche typically analyze longitudinal data How does Biostatistics at Roche typically analyze longitudinal data? Discussion of different approaches based on an example Example: 12 weeks dose finding study, 5 active doses and placebo Visits at week 1,2,4,8 and 12 Continuous endpoint is the change from baseline in parameter xxx   What do we usually do?

Simple Descriptive Analyses, like Summary Tables Summary of xxx Changes from Baseline, by Dose _________________________________________________________________________________   Dose Week N Mean STD Q1 Median Q3 Min Max _________________________________________________________________________________   0.0 0 65 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 64 -0.1 0.9 -0.7 0.0 0.5 -2.1 1.9 2 64 -0.2 1.1 -0.6 -0.1 0.5 -3.5 2.1 4 63 -0.3 1.2 -1.0 -0.3 0.2 -2.8 3.2 8 62 -0.3 1.6 -1.2 -0.3 0.6 -4.2 4.6 12 56 -0.7 1.8 -1.7 -0.5 0.3 -5.5 4.6 LOCF 65 -0.6 1.8 -1.6 -0.5 0.3 -5.5 4.6 2.5 0 64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 59 -0.2 1.2 -0.7 -0.1 0.5 -4.0 3.5 2 62 -0.3 1.1 -1.0 -0.4 0.5 -3.4 3.0 4 63 -0.6 1.4 -1.2 -0.6 0.0 -5.4 3.0 8 61 -1.2 1.8 -2.1 -1.1 0.0 -7.0 4.5 12 58 -1.4 1.7 -2.3 -1.3 -0.5 -5.6 2.5 LOCF 64 -1.5 1.8 -2.3 -1.3 -0.5 -7.0 2.5 ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ 40.0 0 66 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 64 -0.8 0.8 -1.2 -0.6 -0.2 -3.0 1.0 2 65 -1.0 1.2 -1.7 -1.0 -0.1 -4.5 1.9 4 65 -1.5 1.4 -2.3 -1.3 -0.5 -5.9 0.9 8 63 -2.4 1.7 -3.3 -1.9 -1.2 -7.9 0.2 12 61 -2.6 2.1 -3.6 -2.1 -1.2 -8.4 0.9 LOCF 66 -2.6 2.0 -3.6 -2.2 -1.4 -8.4 0.9

Simple Descriptive Analyses Line Plots

Simple Descriptive Analyses Scatter Plots with Regression Lines

Analyses using LOCF 2.1 Anova/Ancova Analyses

2.2 Non-Linear Dose Response Model using Last Visit Observations only Parameter estimation can be done frequentistic or Bayesian with non-informative or weak informative priors:

Bayesian parameter estimation using MCMC methods: Inference for Bugs model at "emax_1.bug", fit using OpenBUGS, 1 chains, each with 80000 iterations (first 30000 discarded) n.sims = 50000 iterations saved mean sd 2.5% 25% 50% 75% 97.5% e0 -0.7 0.2 -1.2 -0.9 -0.7 -0.5 -0.2 emax -2.3 0.4 -3.2 -2.6 -2.3 -2.1 -1.6 ed50 7.9 6.1 2.0 4.2 6.3 9.6 23.7 tau 0.3 0.0 0.2 0.3 0.3 0.3 0.3 deviance 1482.7 2.9 1479.0 1480.5 1482.0 1484.2 1489.9 DIC info (using the rule, pD = Dbar-Dhat) pD = 3.5 and DIC = 1486.0 DIC is an estimate of expected predictive error (lower deviance is better). • Predictions for not observed dose-groups (in-between or higher) possible • Several different model-based dose-response models could be used • Analysis can be done Frequentistic or Bayesian, allowing (weak) informative priors • Complete available information, that means dose response over time not used • Aim for a better dose selection by using the complete information

General comments regarding Analyses using LOCF LOCF assumed to be conservative however biased treatment effect estimates dealing of missing values is problematic. Assumes that the drop-outs are MCAR If dose groups treated as categories, extrapolation to unobserved doses problematic SD’s might be reduced when all data are used (within vs. between patient variability) No longer generally accepted by the Health Authorities (at least FDA) Anova´s/Ancova´s based on LOCF replaced by MMRM analyses Similar: Non-linear dose-response models based on LOCF can be replaced by longitudinal non-linear mixed-effects models

Mixed Effects Model Repeated Measurement Model (MMRM) Yijk denotes the measurement of the effect at week k from subject j under treatment i αi the treatment i bj the random effect of patient j tk the visit effect for week k (at)ik the interaction between treatment and time The Covariance structure in this example is selected as compound symmetry: other covariance structures like AR(1), Toeplitz structures… could be modeled as well

Mixed Effects Repeated Measurement Model (MMRM) Summary of Adjusted Least-Square Means Differences for the changes in xxx from baseline _________________________________________________________________________________________ Dose Week LsMeansDiff StdErr Lower CL Upper CL t-statistics p-value 2.5 8 -0.843 0.231 -1.295 -0.390 -3.649 0.000 12 -0.699 0.240 -1.170 -0.229 -2.914 0.004 5.0 8 -0.937 0.227 -1.382 -0.491 -4.120 0.000 12 -0.997 0.233 -1.455 -0.539 -4.270 0.000 10.0 8 -1.448 0.228 -1.895 -1.000 -6.344 0.000 12 -1.483 0.237 -1.948 -1.018 -6.259 0.000 20.0 8 -1.488 0.230 -1.939 -1.037 -6.470 0.000 12 -1.804 0.239 -2.273 -1.336 -7.553 0.000 40.0 8 -2.092 0.229 -2.541 -1.642 -9.130 0.000 12 -1.958 0.237 -2.423 -1.493 -8.262 0.000 __________________________________________________________________________________________

Comments regarding MMRM Use of the complete information Can deal with missing’s under the assumption of missing at random Drop-out rate important for the precision of the contrast estimates Biased to a completer analysis Sensitivity analyses are required to address the problem of missing values Dose groups still treated as categories, therefore the model does not allow the estimate of a non-linear dose-response curve, i.e. predictions for responses of unobserved doses or responses at not observed time-points are still a problem Use of Non-Linear (model based) Longitudinal Mixed Effects Models

Non-Linear Longitudinal Mixed Effects Models Example: A longitudinal Emax Model There is a time-dependent placebo effect A time-dependent Emax The Ed50 is assumed to be constant over time

This example is derived from an article for a binary response variable, which used a probit-link in addition: A case study of model-based Bayesian dose response estimation; Huaming Tan, David Gruben, Jonathan French and Neal Thomas; Statistics in Medicine 2011;V.30; I.29 Parameter estimation can be done via MCMC using weak informative priors like: E0 ~ dnorm(0,0.001) I(-15,15) Emax ~ dnorm(0,0.001) K0 ~ dbeta(1,1) Kmax ~ dbeta(1,1) P0 ~ scaled beta (P0<-PS0*30 -15; PS0~dbeta(1,1)) Ed50 ~ scaled beta (Ed50<-ud50*120 +0.5; ud50~dbeta(1,1)) Dmax ~ scaled beta (Dmax<- DS0*30-15; DS0 ~ dbeta(1,1))

MCMC-Result: Inference for Bugs model at "emax_mixed.bug", fit using OpenBUGS, 1 chains, each with 80000 iterations (first 30000 discarded) n.sims = 50000 iterations saved mean sd 2.5% 25% 50% 75% 97.5% E0 0.0 0.1 -0.2 -0.1 0.0 0.1 0.2 PS0 0.3 0.1 0.1 0.3 0.3 0.4 0.5 P0 -5.1 2.7 -11.6 -6.7 -4.6 -3.2 -1.1 K0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 ud50 0.1 0.0 0.0 0.1 0.1 0.1 0.1 Ed50 8.9 3.1 4.6 6.8 8.4 10.4 17.1 Emax -0.1 0.2 -0.5 -0.3 -0.1 0.0 0.3 DS0 0.4 0.0 0.4 0.4 0.4 0.4 0.4 Dmax -2.2 0.3 -2.7 -2.3 -2.1 -2.0 -1.7 Kmax 0.4 0.1 0.2 0.3 0.4 0.4 0.7 tau 1.1 0.0 1.0 1.0 1.1 1.1 1.1 precd 1.3 0.1 1.1 1.2 1.3 1.4 1.5 deviance 6276.5 31.2 6217.3 6255.2 6275.8 6297.0 6339.5 DIC info (using the rule, pD = Dbar-Dhat) pD = 220.4 and DIC = 6497.0

MCMC-Diagnostics:

MCMC-Diagnostics:

Model plot based on median parameter estimates: Complete available information, that means all observed dose response values over time used Predictions for not observed dose-groups as well as time points possible

How good are the predictions? Model re-estimated without using the week 12 data Predicted dose response curves (based on MCMC) compared with the observed mean values Inference for Bugs model at "emax_mixed.bug", fit using OpenBUGS, 1 chains, each with 80000 iterations (first 30000 discarded), n.thin = 10 n.sims = 5000 iterations saved mean sd 2.5% 25% 50% 75% 97.5% E0 0.1 0.1 -0.2 0.0 0.1 0.1 0.3 PS0 0.5 0.0 0.4 0.5 0.5 0.5 0.5 P0 -0.6 0.5 -2.0 -0.6 -0.5 -0.4 -0.2 K0 0.6 0.3 0.0 0.2 0.7 0.9 1.0 ud50 0.1 0.0 0.0 0.0 0.1 0.1 0.1 Ed50 8.5 4.6 4.0 6.0 7.5 9.7 17.6 Emax -0.2 0.2 -0.6 -0.3 -0.2 -0.1 0.2 DS0 0.4 0.0 0.3 0.4 0.4 0.4 0.4 Dmax -2.8 0.8 -5.1 -3.1 -2.6 -2.2 -1.8 Kmax 0.2 0.1 0.1 0.1 0.2 0.3 0.6 tau 1.3 0.0 1.2 1.3 1.3 1.3 1.4 precd 1.6 0.2 1.4 1.5 1.6 1.7 1.9 deviance 4877.3 31.5 4817.3 4856.3 4876.4 4898.2 4942.6 DIC info (using the rule, pD = Dbar-Dhat) pD = 183.9 and DIC = 5061.0 DIC is an estimate of expected predictive error (lower deviance is better).

How good are the predictions? Model re-estimated without using the week 12 data

MCMC-Diagnostics:

MCMC-Diagnostics:

Without week 12 data, the parameters which model the time course of the placebo and maximum effect over time can not be estimated reliable It is obvious, that predictions outside the observed time period must be interpreted with caution Essential are sufficient data also outside of the first nearly linear part of the dose-response time curve This is illustrated by simulations of the following more extreme design: The effect of 5 different dose levels on the parameter xxx should be explored in an MAD study with 5 cohorts consisting of always 9 active and 3 placebo patients and only 4 weeks observation time 1000 studies were simulated by sampling 9 active and 3 Placebo without replacement out of the data discussed before for each dose level using the first 4 weeks only Treatment effects were estimated using a mixed effect model with a linear time effect and compared with the estimates based on the longitudinal Emax-model:

Comparison of the 40 mg treatment effects estimates after 4 weeks:

Both models lead to a similar 4 weeks treatment effect with the 40 mg dose The reason is, that many different functions can be fitted through the few available observations with a similar fit First 4 weeks are mainly in the linear part of the dose-response time curve Therefore the precision of the parameter estimates is essential to judge the model predictions of unobserved doses or time-points should be performed with caution and the problem and risk of imprecise predictions should be discussed Interpolation in general no problem With extrapolation there is a risk, that decisions are based on predictions from an imprecise estimated model and therefore they might be wrong

Summary for Longitudinal Data Analysis at Roche from a Biostatistics Point of View Biostatistics in Roche analyses all kind of longitudinal dose-response or exposure response models for all possible different types of endpoints Longitudinal Analysis is a broad topic, additional methods not described here are e.g. all kind of survival type analyses like:   Kaplan-Meyer Analyses Cox-Models (also time-dependent Cox Models) Other Counting process type analyses …… Biostatistics analyses in Roche are mainly restricted to dose-response or exposure–response models More complex PK-PD models or mechanistic models are not handled within Biostatistics at Roche, this is handled in the Modeling and Simulation group