Statistical Issues in Contraceptive Trials

Slides:



Advertisements
Similar presentations
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Advertisements

Comparing Two Proportions (p1 vs. p2)
Hypothesis testing Another judgment method of sampling data.
U.S. Food and Drug Administration Notice: Archived Document The content in this document is provided on the FDA’s website for reference purposes only.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Confidence Intervals © Scott Evans, Ph.D..
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter Seventeen HYPOTHESIS TESTING
Point and Confidence Interval Estimation of a Population Proportion, p
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
The Analysis of Variance
Chapter 9 Hypothesis Testing.
Today Concepts underlying inferential statistics
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Richard M. Jacobs, OSA, Ph.D.
Sample Size Determination Ziad Taib March 7, 2014.
Analysis of Complex Survey Data
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Psy B07 Chapter 8Slide 1 POWER. Psy B07 Chapter 8Slide 2 Chapter 4 flashback  Type I error is the probability of rejecting the null hypothesis when it.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Inference for a Single Population Proportion (p).
CI - 1 Cure Rate Models and Adjuvant Trial Design for ECOG Melanoma Studies in the Past, Present, and Future Joseph Ibrahim, PhD Harvard School of Public.
Chapter 8 Introduction to Hypothesis Testing
PARAMETRIC STATISTICAL INFERENCE
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Challenges of Non-Inferiority Trial Designs R. Sridhara, Ph.D.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 Statistical Review Dr. Shan Sun-Mitchell. 2 ENT Primary endpoint: Time to treatment failure by day 50 Placebo BDP Patients randomized Number.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Economic evaluation of health programmes Department of Epidemiology, Biostatistics and Occupational Health Class no. 19: Economic Evaluation using Patient-Level.
1 An Interim Monitoring Approach for a Small Sample Size Incidence Density Problem By: Shane Rosanbalm Co-author: Dennis Wallace.
1 OTC-TFM Monograph: Statistical Issues of Study Design and Analyses Thamban Valappil, Ph.D. Mathematical Statistician OPSS/OB/DBIII Nonprescription Drugs.
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
How to Read Scientific Journal Articles
August 20, 2003FDA Antiviral Drugs Advisory Committee Meeting 1 Statistical Considerations for Topical Microbicide Phase 2 and 3 Trial Designs: A Regulatory.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Issues concerning the interpretation of statistical significance tests.
1 Study Design Issues and Considerations in HUS Trials Yan Wang, Ph.D. Statistical Reviewer Division of Biometrics IV OB/OTS/CDER/FDA April 12, 2007.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
© Copyright McGraw-Hill 2004
Sample Size Determination
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Statistical Criteria for Establishing Safety and Efficacy of Allergenic Products Tammy Massie, PhD Mathematical Statistician Team Leader Bacterial, Parasitic.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
O Make Copies of: o Areas under the Normal Curve o Appendix B.1, page 784 o (Student’s t distribution) o Appendix B.2, page 785 o Binomial Probability.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Inference for a Single Population Proportion (p)
Chapter 8: Inference for Proportions
When we free ourselves of desire,
Chapter 9 Hypothesis Testing.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Type I and Type II Errors
Presentation transcript:

Statistical Issues in Contraceptive Trials Daniel L. Gillen, PhD Department of Statistics University of California, Irvine FDA Reproductive Drugs Advisory Committee Meeting, Jan 23-24 D. Gillen, FDA Repro, Jan 23-24

Minimum requirements of a clinical trial Appropriate target population Use of appropriate comparison groups Use of appropriate outcome measure Ability to maintain statistical criteria for evidence Controlling type I and II errors in the Frequentist setting D. Gillen, FDA Repro, Jan 23-24

Outline Outcome measures Comparison populations Pearl Index vs. life-table methods Comparison populations Historical vs. active control trials Defining statistical evidence Testing for superiority vs. non-inferiority D. Gillen, FDA Repro, Jan 23-24

Outcome Measures: Pearl Index vs. Life Table Methods D. Gillen, FDA Repro, Jan 23-24

The Pearl Index The Pearl Index (number of pregnancies per 100 woman years) is a common measure used to summarize contraceptive effectiveness However, a drawback of the Pearl Index is that in most situations it is dependent on time and must be interpreted accordingly Such dependence occurs because of the changing baseline risk of pregnancy within study samples as time marches forward D. Gillen, FDA Repro, Jan 23-24

Ex: Sensitivity of Pearl Index to duration of follow-up Suppose our study population consists of two groups “Low risk” group (90% of population): Constant risk of pregnancy 1 year probability of pregnancy is 5% “High risk” group (10% of population): 1 year probability of pregnancy is 50% D. Gillen, FDA Repro, Jan 23-24

Ex (cont’d): One-year Pearl Index Now consider the Pearl Index calculated over the first year Expected number of pregnancies 5000*(0.90*0.05 + 0.10*0.50) = 475 Expected person-years at risk with censoring for pregnancy 4525*1 + 475*.5 = 4762.5 Pearl Index (475 / 4762.5)*100 = 9.97 pregnancies per 100 per year D. Gillen, FDA Repro, Jan 23-24

Ex (cont’d): Two-year Pearl Index For the Pearl Index calculated over 2 years, we need to consider the impact of censoring the “high risk” group at pregnancy By the end of one year Number left in low risk group: 5000*0.90*(1-0.05) = 4275 Number left in high risk group: 5000*0.10*(1-0.50) = 250 Percent of total population in high risk group at one year is 250/4275 = 5.8% D. Gillen, FDA Repro, Jan 23-24

Ex (cont’d): Two-year Pearl Index Now consider the Pearl Index calculated between years 1 and 2 Expected number of pregnancies occurring between 1 and 2 years of follow-up 4525*(0.942*0.05 + 0.058*0.50) = 344.4 Expected person-years at risk between year 1 and year 2 4180.6*1 + 344.4*.5 = 4352.8 person-years Pearl Index calculated between years 1 and 2 (344.4 / 4352.8)*100 = 7.92 pregnancies per 100 per year D. Gillen, FDA Repro, Jan 23-24

Ex (cont’d): Two-year Pearl Index Now consider the Pearl Index calculated over 2 years Expected number of pregnancies observed over 2 years 475 + 344.4 = 819.4 Expected person-years at risk over 2 years 4762.5 + 4352.8 = 9115.3 person-years Pearl Index calculated over 2 years (819.4 / 9115.3)*100 = 8.99 pregnancies per 100 per year D. Gillen, FDA Repro, Jan 23-24

When is the Pearl Index independent of study support? The Pearl Index will change with the length of follow-up unless: The rate of pregnancies is homogeneous across all possible subgroups This rate remains constant with time D. Gillen, FDA Repro, Jan 23-24

When is the Pearl Index independent of study support? In the previous example, it should be noted that even if we allow participants with failures to re-enter the risk set the Pearl Index will still depend upon time This is because a failure results in less at-risk time, thus total years of follow-up will be proportionately less in the “high risk” group as duration of maximal follow-up increases D. Gillen, FDA Repro, Jan 23-24

A further issue in quantifying the Pearl Index… Most confidence intervals for the Pearl Index assume a Poisson Distribution This distribution is defined as having variance equal to the mean (or rate) However, count or rate data is typically characterized as stemming from an overdispersed Poisson distribution That is, the true variance in the rate that we observe is more that we assume from the Poisson distribution Overdispersion in Poisson rates typically arises from heterogeneity of patient populations D. Gillen, FDA Repro, Jan 23-24

Computation of confidence intervals for the Pearl Index Consider our previous example with a “low risk” and a “high risk” group Low risk group (90% of population): Constant risk of pregnancy 1 year probability of pregnancy is 5% High risk group (10% of population): 1 year probability of pregnancy is 50% D. Gillen, FDA Repro, Jan 23-24

Computation of confidence intervals for the Pearl Index We previously calculated the (true) 1 year Pearl Index to be 9.97 pregnancies per 100 per year Suppose that in reality, we observed 457 pregnancies over 1 year with a total of 4763 years of followup, resulting in a Pearl Index of 9.60 per 100 per year Assuming a Poisson distribution the corresponding 95% confidence interval for the 1 year Pearl Index would be (8.73, 10.51) D. Gillen, FDA Repro, Jan 23-24

Computation of confidence intervals for the Pearl Index However, because the Pearl Index is really composed of a mixture of Poisson distributions (from the high and low risk groups) the true variance is actually 19.2% larger than assumed by the usual (single) Poisson model This means that we have underestimated the variance, ie. Our confidence interval is shorter than it should be! In this case, a 95% confidence interval accounting for the heterogeneity of groups is (8.63, 10.55). This is approximately 8% wider than the previous interval D. Gillen, FDA Repro, Jan 23-24

How to deal with the changing composition of the risk set? We illustrated one way in our example Consider the probability of failure at specific time points by using conditional probability For example, if T is the time of failure we can compute the probability of failure within two years as Pr[T<2] = 1-Pr[T>2] = 1 - Pr[T>2|T>1]Pr[T>1] = 1-(1-0.0792)*(1-0.0997) = 0.171 D. Gillen, FDA Repro, Jan 23-24

How to deal with the changing composition of the risk set? This is called a life-table estimate In the setting of contraceptive failure, these conditional probabilities are typically computed monthly to more accurately incorporate the risk set (see eg. Potter, 1966) When the life-table estimate is evaluated at all (distinct) failure times, this is called a Kaplan-Meier estimate. D. Gillen, FDA Repro, Jan 23-24

Are there any benefits of to using the Pearl Index? Clearly, the Pearl Index has been in wide use The reasons for this are Ease of interpretation Although the Kaplan-Meier estimator also has a clinically relevant interpretation (probability of failure over T years of use) For historically controlled trials, there is a great deal of data summarized in terms of the Pearl Index This will, of course, change as the popularity of Kaplan-Meier estimates grow in the field D. Gillen, FDA Repro, Jan 23-24

Can we incorporate changing treatment regiments? Patients may discontinue use or use additional contraceptives for some intervals of time Technically, the Kaplan-Meier estimator could incorporate such left and right censoring. However, it is not clear when patients should re-enter the risk set D. Gillen, FDA Repro, Jan 23-24

Can we incorporate changing treatment regiments? For example, consider the case where a participant uses back-up contraception during the interval (t1, t2). This individual could be considered at risk for the interval (0, t1) then re-entered into the risk set at time t2. However, by doing this we are implicitly making the assumption that this person’s hazard (or risk of pregnancy) at time t2 is the same as all others who have been at risk from (0, t2) This is not a reasonable assumption to me and I would advise against it D. Gillen, FDA Repro, Jan 23-24

Can we incorporate changing treatment regiments? Another option for incorporating changing treatment regiments would come from post-hoc analyses Stratified Kaplan-Meier estimates Number of strata could become large Time-dependent covariates Eg. Consider a proportional hazards framework D. Gillen, FDA Repro, Jan 23-24

Regardless of the measure, what defines a failure and who is at risk? For all new interventions we must consider: Safety: Are there adverse effects that clearly outweigh any potential benefit? Efficacy: Can the intervention reduce the probability of unintended pregnancy in a beneficial way? Effectiveness: Would adoption of the intervention as a standard reduce the probability of unintended pregnancy in the population? D. Gillen, FDA Repro, Jan 23-24

Regardless of the measure, what defines a failure and who is at risk? One difference between evaluation of efficacy and effectiveness is in what defines a failure and who should be included in the risk set In a clinical trial setting we can truly only evaluate efficacy because of possible selection bias of patients entering contraceptive trials However, even in the clinical trial setting it is useful to evaluate Intervention failure rates during actual use (including inconsistent or incorrect use) Intervention failure rates during perfect use (see eg. Trussell, Contraception, 2004) D. Gillen, FDA Repro, Jan 23-24

Regardless of the measure, what defines a failure and who is at risk? To assess true method efficacy, counting only “method failures” during perfect use, we must only include perfect use exposure patients in the risk set Also, need to consider if those who are lost to follow-up should be considered at risk all the way up to the time of drop-out One reasonable approach is to censor patients three months prior to the time at which they become lost to follow-up (Trussell, SIM, 1991) D. Gillen, FDA Repro, Jan 23-24

Historical vs. Active Control Trials D. Gillen, FDA Repro, Jan 23-24

Historical control trials vs. active control trials In the past many methods have been assessed via a historical control trial Eg. Criteria such as a Pearl Index of 1.5 (or more recently 2) or less has been used an efficacy criteria Such criteria stems from the experience of historical controls However, biases resulting from historical control studies can be numerous. Particularly when study samples are not comparable with respect to baseline risk, evaluative measure of outcome, or duration of study. D. Gillen, FDA Repro, Jan 23-24

Criteria for superiority in historical control trials As noted, past studies have considered point estimates of the (one year) Pearl Index of less than 1.5 or 2 unintended pregnancies per 100 per year However, we must also acknowledge uncertainty of these estimates EMEA requires sufficient sample size to guarantee the width of the 95% CI for the Pearl Index to be no larger than 1 Better (in my opinion) to require that upper bound of CI is less than the chosen threshold In either case, if the Pearl Index is used the previous notes on computation of the CI need to be considered D. Gillen, FDA Repro, Jan 23-24

Historical control trials vs. active control trials Because it is impossible to guarantee comparability between historical controls and current study samples, it is almost always advantageous to employ randomization when ethically feasible Given a wide use of standard contraceptives, it is not feasible to consider a placebo controlled trial However, one can (and should) consider the use of an active control when comparable interventions are in use Also allows for comparison of entire survival curve (logrank test or proportional hazards model?) D. Gillen, FDA Repro, Jan 23-24

Superiority vs. Non-Inferiority in Active Control Trials D. Gillen, FDA Repro, Jan 23-24

Superiority vs. non-inferiority in active control trials Statistical criteria for evidence in a superiority trial Evidence to rule out equality of effect as measured by the chosen parameter (eg. Pearl Index, 1-year survival estimate, or a hazard ratio) Example: Contrast may be difference in 1-year failure rates as measured by the Kaplan-Meier estimator KMTx(1) - KMAC(1) Test: H0: KMTx(1) - KMAC(1)  0 Vs. H1: KMTx(1) - KMAC(1) < 0 Rejection of null hypothesis corresponds to upper bound of CI for KMTx(1) - KMAC(1) being less than 0 D. Gillen, FDA Repro, Jan 23-24

Superiority vs. non-inferiority in active control trials Statistical criteria for evidence in a non-inferiority trial Evidence to rule out some margin of efficacy less than the active control Example: Contrast may be difference in 1-year failure rates as measured by the Kaplan-Meier estimator KMTx(1) - KMAC(1) Test: H0: KMTx(1) - KMAC(1)   Vs. H1: KMTx(1) - KMAC(1) <  for some  > 0 Rejection of null hypothesis corresponds to upper bound of CI for KMTx(1) - KMAC(1) being less than  D. Gillen, FDA Repro, Jan 23-24

Superiority vs. non-inferiority in active control trials When is it reasonable to consider non-inferiority instead of superiority? ICH E-10 Guidelines Active control treatment must truly be active in the study population If active control is truly active in the study population Can a margin to define non-ineferiority be established? If active control is standard of care, is new treatment also superior on secondary endpoints? D. Gillen, FDA Repro, Jan 23-24

Superiority vs. non-inferiority in active control trials Issues in setting the non-inferiority “margin”? What measure compares distributions? Is the treatment effect random? How much of a decrease in effect is acceptable? How to account for variability in the estimate(s) from historical trials? D. Gillen, FDA Repro, Jan 23-24

Superiority vs. non-inferiority in active control trials Precedence for setting the non-inferiority “margin” Is the treatment effect random? Ideally use meta-analysis of multiple trials Careful! Do trials have same duration of follow-up? How much of a decrease in effect is acceptable? 10%, 20%, 50% of active control effect? How to account for variability in the estimate(s) from historical trials? Use worst case from historical 95% CI? Explicitly account for variability in historical trial D. Gillen, FDA Repro, Jan 23-24

Summary D. Gillen, FDA Repro, Jan 23-24

Summary Need to define appropriate target population, comparison group, outcome measure, and maintain statistical criteria for evidence Pearl Index is (usually) implicitly dependent on the length of follow-up, whereas Kaplan-Meier (life table) estimates make this dependence explicit In either case, we need to obtain correct inference (CI’s) and the definition of the risk set must correspond to the definition of failure When ethically and logistically possible, active controls should be used If historical controls are used, uncertainty should be accounted for in defining superiority criteria D. Gillen, FDA Repro, Jan 23-24