Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
This presentation will: Describe all relevant assumptions and decisions Specify the type of hypothesis, the clinically important inferiority margin or minimum clinically important excess/difference, and the level for the confidence interval Specify the statistical software and command or the formula to calculate the expected confidence interval Specify the expected precision (or statistical power) for any subgroup analyses Specify the expected precision (or statistical power) as sensitivity analyses in special situations Outline of Material
Study feasibility relies on whether the projected number of accrued patients is adequate to address the scientific aims of the study. Many journal editorial boards endorse reporting of study size rationale. However, this rationale is often missing from study protocols and proposals. Interpreting study findings in terms of statistical significance in relation to the null hypothesis implies a prespecified hypothesis and adequate statistical power. Without the context of a numeric rationale for the study size, readers may misinterpret the results. Introduction
Reporting on study size rationale in the study protocol is often required by institutional review boards before data collection can begin. The rationale for study size depends on calculations of the study size needed to achieve a specified level of statistical power. Statistical power is defined as the probability of rejecting the null hypothesis when an alternative hypothesis is true. Software packages and online tools can assist with these calculations. Study Size and Power Calculations in Randomized Controlled Trials (1 of 3)
Specify the clinically meaningful or minimum detectable difference. Identify the size of the smallest potential treatment effect that would be of clinical relevance. Calculate the study size, assuming the value represents the true treatment effect. Specify a measure of data variability. For continuous outcomes, make assumptions about the standard deviation. For occurrence of event outcomes (e.g., death), estimation of the assumed event rate in the control group is necessary. Study Size and Power Calculations in Randomized Controlled Trials (2 of 3)
Needed study size depends on the chosen type 1 error rate (α) and required statistical power. Use a conventional statistical significance cutoff of α = 0.05 and a standard required power of 80 percent. Consider potential reductions in the number of recruited patients available for analysis. Study Size and Power Calculations in Randomized Controlled Trials (3 of 3) Scenario Effect of Interest Therapy 1 Risk Therapy 2 Risk Desired Power Needed Study Size Needed Recruitment %10,79513, % 2,005 2, % % An example of adequately reported consideration of study size under several potential scenarios that vary the baseline risk of the outcome, the minimum clinically relevant treatment effect, and the required power.
Sample size and power calculations in the context of randomized controlled trials are relevant for observational studies, but their application may differ. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines Funding agencies often ask for statistical power calculations, while journal editors ask for confidence intervals. Considerations for Observational Comparative Effectiveness Research Study Size Planning
Confounding bias, measurement error, and other biases should concern investigators more than the expected precision when they consider the feasibility of an observational comparative effectiveness study. Controlling for confounding can also reduce the precision of estimated effects (often seen in studies with propensity score matching). Retrospective studies often suffer from a higher frequency of missing data, which can limit precision and power. Considerations That Differ From Nonrandomized Studies
To ensure adequate study size and appropriate interpretation of results, provide a rationale for study size during the planning and reporting stages. All definitions and assumptions should be specified, including primary study outcome, clinically important minimum effect size, variability measure, and type I and type II error rates. Consider loss to followup, reductions due to statistical methods to control for confounding, and missing data to ensure the sample size is adequate to detect clinically meaningful differences. Conclusions
Summary Checklist (1 of 2) GuidanceKey Considerations Describe all relevant assumptions and decisions. Report the primary outcome on which the study size or power estimate is based. Report the clinically important minimum effect size (e.g., hazard ratio ≥1.20). Report the type I error level. Report the statistical power or type II error level (for study size calculations) or the assumed sample size (for power calculations). Report the details of the sample size formulas and calculations including correction for loss to followup, treatment discontinuation, and other forms of censoring. Report the expected absolute risk or rate for the reference or control cohort, including the expected number of events. Specify the type of hypothesis, the clinically important inferiority margin or minimum clinically important excess/difference, and the level of confidence for the interval (e.g., 95%). Types of hypotheses include equivalence, noninferiority, and inferiority.
Summary Checklist (2 of 2) GuidanceKey Considerations Specify the statistical software and command or the formula to calculate the expected confidence interval. Examples include Stata , Confidence Interval Analysis, and Power Analysis and Sample Size (PASS). Specify the expected precision (or statistical power) for any planned subgroup analyses. Specify the expected precision (or statistical power) as sensitivity analyses in special situations. Special situations include: The investigators anticipate strong confounding that will eliminate many patients from the analysis (e.g., when matching or trimming on propensity scores). The investigators anticipate a high frequency of missing data that cannot (or will not) be imputed, which would eliminate many patients from the analysis.