The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA,

Slides:

Advertisements

Similar presentations

WPA-WHO Global Survey of Psychiatrists' Attitudes Towards Mental Disorders Classification Results for the Spanish Society of Psychiatry.

Advertisements

Chapter 4 Sampling Distributions and Data Descriptions.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Chapter 1 The Study of Body Function Image PowerPoint

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.

Author: Julia Richards and R. Scott Hawley

STATISTICS Joint and Conditional Distributions

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.

Objectives: Generate and describe sequences. Vocabulary:

Design of Dose Response Clinical Trials

Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Sep. 16, 2005 Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD * No official support.

1 Superior Safety in Noninferiority Trials David R. Bristol To appear in Biometrical Journal, 2005.

1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center.

FDA/Industry Workshop September, 19, 2003 Johnson & Johnson Pharmaceutical Research and Development L.L.C. 1 Uses and Abuses of (Adaptive) Randomization:

UNITED NATIONS Shipment Details Report – January 2006.

NTTS conference, February 18 – New Developments in Nonresponse Adjustment Methods Fannie Cobben Statistics Netherlands Department of Methodology.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×

Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Overview of Lecture Parametric vs Non-Parametric Statistical Tests.

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION

Chapter 7 Sampling and Sampling Distributions

Solve Multi-step Equations

REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.

PP Test Review Sections 6-1 to 6-6

Bright Futures Guidelines Priorities and Screening Tables

Chapter 16 Goodness-of-Fit Tests and Contingency Tables

Chi-Square and Analysis of Variance (ANOVA)

5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.

Hypothesis Tests: Two Independent Samples

Sample Service Screenshots Enterprise Cloud Service 11.3.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.

CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.

© 2012 National Heart Foundation of Australia. Slide 2.

Adding Up In Chunks.

Understanding Generalist Practice, 5e, Kirst-Ashman/Hull

1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.

PROCESS vs. WA State SCS Study A Comparison of Study Design, Patient Population, and Outcomes August 29,2007.

Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M

25 seconds left…...

1 Using one or more of your senses to gather information.

Statistical Inferences Based on Two Samples

Analyzing Genes and Genomes

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Essential Cell Biology

Chapter Thirteen The One-Way Analysis of Variance.

Chapter 8 Estimation Understandable Statistics Ninth Edition

PSSA Preparation.

Experimental Design and Analysis of Variance

Essential Cell Biology

Immunobiology: The Immune System in Health & Disease Sixth Edition

Simple Linear Regression Analysis

Multiple Regression and Model Building

Energy Generation in Mitochondria and Chlorplasts

9. Two Functions of Two Random Variables

1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.

Commonly Used Distributions

Advanced Statistics for Interventional Cardiologists.

Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.

Presentation transcript:

The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD 20850 *No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred.

Outline Randomized clinical trials Non-randomized studies and a potential problem Propensity scores methods for bias reduction Practical issues with the application of propensity score methodology Limitations of propensity score methods Conclusions

Randomized Trials All patients have a specified chance of receiving each treatment. Treatments are concurrent. Data collection is concurrent, uniform, and high quality. Expect that all patient covariates, measured or unmeasured, e.g., age, gender, duration of disease, …, are balanced between the two treatment groups.

Randomized Trials Assumptions underlying statistical comparison tests are met. So, the two trt groups are comparable and observed treatment difference is an unbiased estimate of true treatment difference. But, the above advantages are not guaranteed for small, poorly designed or poorly conducted randomized trials.

Nonrandomized Studies and a Potential Problem None of advantages provided by randomized trials is available in non-randomized studies. A potential problem: Two treatment groups were not comparable before the start of treatment. i.e., not comparable due to imbalanced covariates between two treatment groups. So, direct treatment comparisons are invalid.

Adjustments for Covariates Three common methods of adjusting for confounding covariates: Matching Subclassification (stratification) Regression (Covariate) adjustment

Each covariate: 2 categories 5 covariates: 32 subclasses Question: When there are many confounding covariates needed to adjust for, e.g., age, gender, … Matching based on many covariates is not practical. Subclassification is difficulty: As the number of covariates increases, the number of subclasses grows exponentially: Each covariate: 2 categories 5 covariates: 32 subclasses Regression adjustment may not be possible: Potential problem: over-fitting

Propensity Score Methodology Replace the collection of confounding covariates with one scalar function of these covariates: the propensity score. Age Gender Duration ……. 1 composite covariate: Propensity Score Balancing score

Propensity Score Methodology (cont.) Propensity score (PS): conditional prob. of receiving Trt A rather than Trt B, given a collection of observed covariates. Purpose: simultaneously balance many covariates in the two trt groups and thus reduce the bias.

Propensity scores construction Statistical modeling of relationship between treatment membership and covariates Statistical methods: multiple logistic regression or others Outcome: event -- actual trt membership: A or B Predictor variables: all measured covariates, some interaction terms or squared terms, e.g., age, gender, duration of disease,…, age*duration,…

Propensity scores construction Clinical outcome variable, e.g., major complication event, is NOT involved in the modeling No concern of over-fitting Obtain a propensity score model: a math equation PS = f (age, gender, …) Calculate estimated propensity scores for all patients

Properties of propensity scores A group of patients with the same propensity score are equally likely to have been assigned to trt A. Within a group of patients with the same propensity score, e.g., 0.7, some patients actually got trt A and some got trt B, just as they had been randomly allocated to whichever trt they actually received.

“Randomized After the Fact” PS=0.7 Trt A Trt B

When the propensity scores are balanced across two treatment groups, the distribution of all the covariates are balanced in expectation across the two groups. Use the propensity scores as a diagnostic tool to measure treatment group comparability. If the two treatment groups overlap well enough in terms of the propensity scores, we compare the two treatment groups adjusting for the PS.

Compare treatments adjusting for propensity score Matching Subclassification (stratification) Regression (Covariate) adjustment

PS Trt A vs. Trt B Compare treatments based on matched pairs Matching based on propensity scores (PS) PS Trt A vs. Trt B Compare treatments based on matched pairs Problem: may exclude unmatched patients PS1 PS2 PSm

Stratification PS All patients are sorted by propensity scores. Divide into equal-sized subclasses. Compare two trts within each subclass, as in a randomized trial; then estimate overall trt effect as weighted average. It is intended to use all patients. But, if trial size is small, some subclass may contain patients from only one treatment group. PS 1 2 ……. 5

Regression (covariate) adjustment Treatment effect estimation model fitting: the relationship of clinical outcome and treatment Outcome: Clinical outcome, e.g., adverse events Predictor variables: trt received, propensity score, a subset of important covariates Statistical method: e.g., regression or logistical regression

Propensity Score Methods Summary Fit propensity score (PS) model using all measured covariates Estimate PS for all patients using PS model Compare treatments adjusting for propensity scores

Practical Issues Issues in propensity score estimation How to handle missing baseline covariate values What terms of covariates should be included Evaluation of treatment group comparability Assessment of the resulting balance of the distributions of covariates Issues in treatment comparison: Which method: matching, stratification, regression Issues in study design with PS analysis Pre-specified vs. post hoc PS analysis Pre-specify the covariates needed to collect in the study and then included in PS estimation Sample size estimation adjusting for the propensity scores

Example – Device A Non-concurrent, two-arm, multi-center study Control: Medical treatment without device, N=65, hospital record collection Treatment: Device A, N = 130 Primary effectiveness endpoint: Treatment success Hypothesis testing: superiority in success rate 20 imbalanced clinically important baseline covariates, e.g., prior cardiac surgery 22% patients with missing baseline covariate values

Enrollment Time

Two treatment groups are not comparable Imbalance in multiple baseline covariates Imbalance in the time of enrollment So, any direct treatment comparisons on the effectiveness endpoint are inappropriate. And, p-values from direct treatment comparisons are un-interpretable. What about treatment comparisons adjusting for the imbalanced covariates? Traditional covariate analysis Propensity score analysis

Performed propensity score (PS) analysis Handed missing values MI: generate multiple data sets for PS analysis Generate one data set: generalized PS analysis Others Included all statistically significant and/or clinically important baseline covariates in PS modeling. Checked comparability of two trt groups through estimated propensity score distributions. Found that the two trt groups did not overlap well.

Estimated Propensity Scores (with time)

Estimated Propensity Scores (w/o time)

Patients in Propensity Score Quintile 1 2 3 4 5 Total Ctl 38 18 8 1 0 65 (w/time) 58% 28% 12% 2% 0% Trt 1 21 31 38 39 130 1% 16% 24% 29% 30% Ctl 29 24 8 4 0 65 (w/o time) 45% 37% 12% 6% 0% Trt 10 14 32 35 39 130 8% 11% 24% 27% 30%

Treatment Success 1 2 3 4 5 Total Crl S 16 8 1 0 25 N 38 18 8 1 0 65 Trt S 0 14 25 24 23 86 N 1 21 31 38 39 130 Tried Cochran-Mantel-Haenszel test controlling for PS quintile, Logistic regression using PS as a continuous covariate However, the sig. p-values are un-interpretable

Conclusion: The two treatment groups did not overlap enough to allow a sensible treatment comparison. So, any treatment comparisons adjusting for imbalanced covariates are problematic.

Example: Device B New vs. control in a non-randomized study Primary endpoint: MACE incidence rate at 6-month after treatment Non-inferiority margin: 7%, in this study Sample size: new: 290, control: 560 14 covariates were considered.

propensity score stratification adjustment Covariate balance checking before and after propensity score stratification adjustment Mean p-value New Control Before After -------------------------------------------------------------------------------------- Mi 0.25 0.40 <.0001 0.4645 Diab 0.28 0.21 0.0421 0.8608 CCS 2.41 2.75 0.0003 0.3096 Lesleng 11.02 12.16 <.0001 0.5008 Preref 3.00 3.08 0.0202 0.2556 Presten 62.75 66.81 <.0001 0.4053

Model Building The PS is conditional Prob. that a patient would have been assigned to new device, based on his or her baseline covariates. A hierarchical logistic regression model with a stepwise selection process was used to build the propensity score model. The final propensity score model includes all covariates as well as a quadratic term.

Table 2. Distribution of patients at five strata Subclass Control New Total 1 142 28 170 2 127 43 170 3 122 48 170 4 119 51 170 50 120 170 Total 560 290 850

Estimated Propensity Scores N(new)=560, N(control)=290

propensity score stratification adjustment Covariate balance checking before and after propensity score stratification adjustment Mean p-value New Control Before After -------------------------------------------------------------------------------------- Mi 0.25 0.40 <.0001 0.4645 Diab 0.28 0.21 0.0421 0.8608 CCS 2.41 2.75 0.0003 0.3096 Lesleng 11.02 12.16 <.0001 0.5008 Preref 3.00 3.08 0.0202 0.2556 Presten 62.75 66.81 <.0001 0.4053

After adj. balance check: Prior Mi rate: Overall: Group % patients with prior Mi New 25 Control 40 Diff 15 After: Quintile Group 1 2 3 4 5 New 70.4 32.6 25.0 17.6 15.0 Control 75.2 32.8 30.0 24.8 10.4

Percentage of patients with prior Mi

Adjusted Difference: Mew – Control: Point estimate: -1.5% 2-sided 95% C.I. : (-6.6%, 3.6%) Non-inferiority margin: 7% Claim: Non-inferiority w.r.t. Mace 6-month

Study Design Plan in advance Pre-specify clinically relevant baseline covariates: as many as possible Sample size estimation: Ignore the propensity score adjustment? Could be inappropriate

Limitations Propensity score methods can only adjust for observed confounding covariates and not for unobserved ones. Propensity score is seriously degraded when important variables influencing selection have not been collected. Propensity score may not eliminate all selection bias.

Limitations Propensity score methods work better in larger samples. Propensity score is not only way of adjusting for covariates. And, it may or may not be helpful in a particular comparison study. Randomized trials are considered the highest level of evidence for trt comparison. Propensity score methods lack the discipline and rigor of randomized trials, and not as definitive as randomized trials.

Conclusions Propensity score methods generalize technique with one confounding covariate to allow simultaneous adjustment for many covariates and thus reduce bias. Propensity score methodology is an addition to, not a substitute of traditional covariate adjustment methods. Plan ahead and carefully consider the practical issues discussed above. Randomized studies are still preferred and strongly encouraged whenever possible!

References Rubin, DB, Estimating casual effects from large data sets using propensity scores. Ann Intern Med 1997; 127:757-763 Rosenbaum, PR, Rubin DB, Reducing bias in observational studies using subclassification on the propensity score. JASA 1984; 79:516-524 D’agostino, RB, Jr., Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group, Statistics in medicine, 1998,17:2265-2281

References Blackstone, EH, Comparing apples and oranges, J. Thoracic and Cardiovascular Surgery, January 2002; 1:8-15 Grunkemeier, GL and et al, Propensity score analysis of stroke after off-pump coronary artery bypass grafting, Ann Thorac Surg 2002; 74:301-305 Wolfgang, C. and et al, Comparing mortality of elder patients on hemodialysis versus peritoneal dialysis: A propensity score approach, J. Am Soc Nephrol 2002; 13:2353-2362

Thanks!