Lecture 20: Study Design and Sample Size Estimation in TTE Studies.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Tests of Hypotheses Based on a Single Sample
Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Estimating the Effects of Treatment on Outcomes with Confidence Sebastian Galiani Washington University in St. Louis.
November 10, 2010DSTS meeting, Copenhagen1 Power and sample size calculations Michael Væth, University of Aarhus Introductory remarks Two-sample problem.
Statistical Issues in Research Planning and Evaluation
Statistical Decision Making
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Sample size computations Petter Mostad
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
8-2 Basics of Hypothesis Testing
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Sample size calculations
Sample Size Determination
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Hypothesis Testing.
Sample Size Determination Ziad Taib March 7, 2014.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Chapter 10 Hypothesis Testing
Lecture Slides Elementary Statistics Twelfth Edition
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Inference for a Single Population Proportion (p).
CI - 1 Cure Rate Models and Adjuvant Trial Design for ECOG Melanoma Studies in the Past, Present, and Future Joseph Ibrahim, PhD Harvard School of Public.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
Chapter 10 Hypothesis Testing
Chapter 20 Testing hypotheses about proportions
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
1 An Interim Monitoring Approach for a Small Sample Size Incidence Density Problem By: Shane Rosanbalm Co-author: Dennis Wallace.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Issues concerning the interpretation of statistical significance tests.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Statistics for Decision Making Basic Inference QM Fall 2003 Instructor: John Seydel, Ph.D.
Sample Size Determination
Biostatistics Basics: Part I Leroy R. Thacker, PhD Associate Professor Schools of Nursing and Medicine.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Sample Size and Power Considerations.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Hypothesis Testing Concepts of Hypothesis Testing Statistical hypotheses – statements about population parameters Examples mean weight of adult.
Inference for a Single Population Proportion (p)
Logic of Hypothesis Testing
Sample Size Determination
BIOST 513 Discussion Section - Week 10
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Chapter 8: Inference for Proportions
Hypothesis Testing.
Sampling and Power Slides by Jishnu Das.
Sample Size and Power Part II
Lecture 19: Study Design and Sample Size Estimation in TTE Studies
Presentation transcript:

Lecture 20: Study Design and Sample Size Estimation in TTE Studies

Study Design and Sample Size Ideally we are involved in a study from the beginning As statisticians (an epidemiologist) part of our role is to ensure the study is designed to address the primary hypothesis under consideration – Ensure proper study design – Ensure appropriate sample size Power and significance level

Study Design We’ve already talked (informally) about study design… Our job is to ask appropriate questions – What is the study population? – Primary hypothesis? – Is the event recurrent? – Competing risks? – …

Components of Sample Size General Considerations in any hypothesis test 1.Hypothesis to be tested 2.Test statistic 3.Size of the test (i.e.  ) 4.Desired power 5.Assumed effect size

Specific for Time To Event Data Additional considerations 1.Probability of an event during the study 2.Expected rate of loss (i.e. censoring) 3.Enrollment rate 4.Competing risks

Basic Considerations Ensure precise specification of the hypothesis Select a significance level and power appropriate for the study What is the test statistic that will be used to test the hypothesis? – Many statistics have well known properties upon which we base our calculations – Deviations from assumptions complicate these calculations…

Null Hypothesis to be Tested Recall for log-rank test Where HR is assumed to be proportional for all t

Test-Statistic Can use either – Log-rank test (or some variation there of) – HR estimated from Cox PHM To test the null hypothesis of no difference consider the log of the hazard ratio – log(HR) ~ N when comparing two groups

Significance Level The probability that a statistical test will reject H 0 when H 0 is actually true – Significance =  Interpretation: For a given value under the null hypothesis, we’re going to reject the null in favor of the alternative in error (  )100% of the time.

Power The probability that a statistical test will reject H 0 when H 0 is false – Power=1-  Interpretation: For a given value under the alternative hypothesis, we’re going to correctly reject the null in favor of the alternative (1-  )100% of the time.

Choice of  and 1-  Generally choose  = 0.05 Other values can be used but should be justified – Say choose  = 0.01, we want very string evidence of a treatment effect – Alternatively  = 0.10 or 0.20 might be chosen for something like a pilot study Power generally set to 80-99%

Effect Size Generally assume proportional hazards Hazard ratio: – Null: HR = 1 – HR < 1 implies longer survival in treatment B – HR > 1 implies longer survival in treatment A Base sample size calculation on having sufficient power to detect minimum clinically important effect – For example, maybe a 30% reduction in incidence for Trt A vs. Trt B (i.e. HR = 0.7) is clinically meaningful

Minimum Scientifically Important Difference Definition: the smallest difference which would mandate, in the absence of serious side effects and/or excessive cost, a change in scientific practice/ understanding. This is a scientific question, not a statistical question.

Significance, Power, and Sample Size Sample size impacted not only by significance level, power, and research question but also practical considerations – Number of available patients – Study duration – Cost

Basic Sample Size Formula Generic formula for total # subjects/group Under restrictions

Design Consideration in TTE Studies How will sample size be collected? – Enroll fixed number of patients and follow for some specified period of time – Continue study until a sufficient number of events have been observed Other considerations – Expected event rate in each group – How much loss to follow-up expected

Study Type Type I study – All subjects experience an event by the end of the study Type II study – A study terminates at fixed time T resulting in administratively censored subjects. Administrative censoring – Right censoring that occurs of subject fails to experience the event prior to the end of the study Loss to follow-up – Occurs when a subject fails to complete the study for reasons unrelated to the event of interest

Sample Size in Time to Event Data For many power calculations we specify significance level, power, variability, and our minimum clinically relevant difference to get our total N In TTE studies, it is easier to specify the number of events we need to observe rather than the total number of people

Simplest Case for TTE Trial comparing Treatment A to Treatment B Simplest case assumes all subjects are followed to the end of the study Also assumes all subjects in within treatment group have the same probability of experiencing the event Finally assumes hazard rates are proportional

Required Number of Events Z  /2 and Z  are standard normal percentiles p A and p B are the proportion of subjects allocated to each treatment group – i.e. equal allocation both are ½  is the minimum clinically relevant difference we want to detect – In this case it is the log of the hazard ratio under the alternative hypothesis

Example: Required # Events Suppose  = 0.05 and  = 0.10 (90% power) – z  /2 = 1.96 and z  = Equal allocation – p A p B = ¼ Always round up. Hazard Ratio# Events Required

How Many People? We’ve not yet discussed event rate(s) We’ve determined for 90% power at a significance level  = 0.05 – To detect a 50% reduction in hazard, we need to observe 88 events But… How many people do we actually need to enroll to get that many events?

A Simplification Consider only the case in which each patient is followed for some specified time period, T More general case – Patients recruited during accrual period, a – After recruitment, there is an additional follow-up period, f – First patient followed a + f – Last patient followed for f – Requires slightly more complex power calculations

Calculating Number of People Need to consider probability of an event during the study period Once we have this, estimating the total number of people needed is easy NOTE: This still ignores loss to follow-up

Overall Probability of Event Recall p A and p B are the proportion of subjects allocated to each treatment group S A (T) and S B (T) are the survival distributions for the two treatment groups How do we get values for S A (T) and S B (T) ?

Estimating Survival Distributions Crude rate – How many events are expected in each group over the course of the study? Alternatively, we could assume S(T) ~Exp( ) Recall S(t) = exp{- t} Use the assumed distribution to calculate failure probability – e.g. if = 0.1/ unit time, then S(1) = – Thus, assume by t = 1, 9.5% of subjects will have experienced the event

Example 1 One-year study ( T = 1 ) with p A = p B,  = 0.1/yr for control group, and   = 0.5

Example 2 5 year study ( T = 5 ) with p A = p B,  = 0.1/yr for control group, and  A = 0.5

More General Case Consider if subjects enroll over time – First patient followed a + f – Last patient followed for f Proportion of patients that will survive is the average survival curve from time f to a + f P(event) can be estimated by

Example 2 Revisited 5 year study with p A = p B,  = 0.1/yr for control group, and  = 0.5 But accrue patients for 2 years and follow for the remaining 3 years

Beyond Basic Considerations Many factors may cause statistic to deviate from expected behavior – Loss to follow-up (non-administrative censoring) – Failure to comply with treatment – Non-Uniform patient entry – Non-constant hazard ratio Failure time differ greatly from exponential – Competing risks… Violations generally require an increase in sample size to achieve desired power

Non-Uniform Entry Previously assumed uniform entry of subjects into the trial, which could be an error – For example patients enter trial in staggered fashion – Sample size related to total number of person years observed Alternative “general” sample size equation proposed by Lachin and Foulkes

Lachin and Foulkes Formula cont and trt are hazard rates for each group is overall hazard rate  ( ) is a component of the variance of  2 ( ) where

Non-Uniform Entry using L-F Formulation Assume patient entry times follow g(t) – If it is Uniform (for recruitment period a ) – If it is not Uniform If recruitment is faster than expected, power will be greater If recruitment is slower than expected, power will be reduced

Example Alternate Distribution Truncated exponential entry distribution

Example of Impact on Power  Probability of Event in Trt E(d | l trt ) Probability of Event in Trt E(  |  l trt ) Power (assuming Uniform entry) N to maintain 90% power Trial Conditions: trt = 0.2 & cont = 0.3 a = 3 & f = 2  = 0.05 (one-sided) & 1-  = 0.9

Loss To Follow-Up Again, our early expressions assumed only administrative censoring BUT clearly this is not always the case. The Lakin-Foulkes expression can also be adapted to address random right censoring

Right Censoring Formulation Easiest “case” – censoring times ~Exp(  ) – Uniform entry into trial

Impact of LTFU on Sample Size Trial Conditions: trt = 0.2, cont = 0.3, &  trt =  cont =  a = 3 & f = 2  = 0.05 (one-sided) & 1-  = 0.9 entry times ~U(a)  E(  | trt )E(  | cont )E(  | trt,  )E(  | cont,  ) PowerSample Size

Competing Risk Setting Same idea- we want to compare 2 treatments but now we have some competing risk(s) Latouche et. al (2004) developed an approach for estimating sample size in data with competing risks – Extension of Schoenfeld formula – Also based on Fine and Gray model for competing risks

Recall the Fine and Gray Model Model and Partial likelihood

Competing Risk Sample Size Number of events Number of subjects

Additional Considerations There are extensions for >2 groups If interim analyses are planned, these also need to be accounted for in the sample size analysis Multi-center trial? Non-proportional hazards – Methods by Halpern and Lakatos allows for non- proportional hazards – Specify event rates for groups within specific time intervals

Additional Considerations Non-compliance – Differs from LTFU in that these people don’t comply to treatment but are still followed – There are adjustments that can be made to the Lakin- Foulkes formula if non-compliance expected Covariate adjustments? – If balanced, adjustment for covariate shouldn’t impact power – However, in cases of extreme in-balance, formulas are not valid

If Complications Exist Consider a 3-stage approach – Use basic formula as first estimate – Refine sample size calculation based on likely deviations from assumptions – If necessary, develop simulation to address more complicated deviations

Implementation The Lakin and Foulkes implemented in the gsDesign package in R – Includes ability to power for stratified analysis – Can power for interim analyses Can use proc power in SAS – TwoSampleSurvival statement Based on log-rank but allows for different weights (Gehan, log-rank, or Tarone-Ware)

Implementation Alternative software? – PASS Can power for binary or non-binary covariate Can power for scenario where an additional correlated covariate is considered – Nquery Basic calculation based on Schoenfeld equation

All R Packages Related to Survival Analysis

Statistician Vs. Lab Researcher

Next Time 1.Nate O’Connell Sequential Designs for Phase I Clinical Trials with Late-Onset Toxicities 2. Lutffiya Muhammad Covariate-adjusted non-parametric survival curve estimation 3. James Small Simulating biologically plausible complex survival data 4. Cameron Miller Tutorial in biostatistics: Competing risks and multi-state models 5. Jamie Speiser Bagging survival trees