1 Clinical Investigation and Outcomes Research Statistical Issues in Designing Clinical Research Marcia A. Testa, MPH, PhD Department of Biostatistics Harvard School of Public Health
2 Objective of Presentation Introduce statistical issues that are critical for designing a clinical research study and developing a research protocol, with a special focus on Power and sample size –Readings: Textbook, Designing Clinical Research, Chapter 6, Estimating Sample Size and Power: Applications and Examples and Chapter 19, Writing and Funding a Research Proposal.
3 Research Proposal Carefully planning the analytical and statistical methods is critical to any clinical research study. An outline of the main elements of a research proposal are listed in Table 19.1 of your textbook. Two very important components of the “Research Methods” section are “Measurements” and “Statistical Issues”.
4 Measurement and Statistical Components of the Research Proposal Measurements – you first must define: –Main predictor/independent variables (intervention, if an experiment) –Potential confounding variables –Outcome/dependent variables Statistical Issues – you should outline: –Approach to statistical analyses –Hypothesis, sample size and power
5 Power and Sample Size Depends upon: –measurements and study hypotheses –statistical test used on primary outcome –study design –variability and precision of the dependent measure –alpha (type 1 error) –effect size –number of hypotheses that you want to test
6 Types of Errors Confidence
7 Statistical power: –the probability of correctly identifying a trend or effect (Being correct that there is a trend or effect) Statistical confidence: –the probability of not identifying a false trend or effect (false alarm) (Being correct that there is no trend) What is power analysis?
8 Clinical research is primarily concerned with detecting improvements or worsening due to interventions or risk factors. Power analysis answers the question: Why is power analysis useful in research planning? “How likely is my statistical test to detect important clinical effects given my research design?”
9 Variability (stochastic noise in the data) Sample Size (accumulated information) time horizon (e.g.,survival analysis) –sampling frequency –replication –Confidence level/statistical test Elements of power analysis Beyond our control Within our control
10 Dealing with Variability Variability is often a barrier to detection Minimizing variability is often the goal Choose variables with a high signal to noise ratio Caution: these variables may be less sensitive to change Sample within a more homogeneous population Caution: greater homogeneity often means we are limiting the inferences we can make. At the extreme we would have highly reliable results that are for the most part clinically irrelevant
11 optimal use of resources effective but inefficient use of resources low return on investment Power Curve The Balancing of Cost and Power Low Cost High Cost
12 Power analysis is only as good as the information you provide: –How appropriate is the statistical test? –How accurate are estimates of variability? Power analysis can’t tell you: –How much power is enough? –What’s a meaningful change? Limitations of power analysis
13 There is no universal standard What is more important? Not missing a trend? Power > Confidence Reporting a false trend? Confidence > Power Usual range for confidence and power: 80-95% How much power is enough?
14 What’s a meaningful change? effect size Power = 95% for declines = -17% Example: You want to be able to detect the withdrawal (decline in participation) from a diet and exercise program under “usual care”.
15 What’s a meaningful change? effect size Power = 80% for decline = -13%
16 What’s a meaningful change? effect size Power = 60% for decline = -10%
17 Is a 17% annual withdrawal rate clinically meaningful? Example – Start with 100 patients Year No. of individuals After 5 years, more than 50% of your original population has withdrawn for the program 17% withdrawal after one year
18 Most people would concur that a withdrawal of 17% per year from a diet and exercise is large enough to be considered clinically meaningful. However, how meaningful are smaller withdrawal rates (13%, 10%, 5% 1%) ? This can not be answered using a formula. The answer will depend on the research objectives and clinical objectives, and the research budget. What is a meaningful change?
19 1. Chose Statistical Hypothesis Set up Null Hypotheses: Examples 1. Compare sample group mean to a known value 0 –Mean of group = Known population mean (H 0 : 0 ) vs (H A : 0 ) 2. Compare two sample group means –Mean Group (1) = Mean Group (2) (H 0 : 1 2 ) (H A : 1 2 ) Note – because you are testing “not equal” in the alternative hypothesis ( ) you have selected a “two-tailed test”.
20 2. Chose Statistical Test There are many statistical tests that are used in clinical research, however, for this presentation we will restrict ourselves to the following:
21 3. Chose Alpha Level and Effect Size Alpha = 0.05 – probability of rejecting the null when the null is true = 5% –You will conclude that there was a difference 5% of the time when there really was no difference You would like to detect a difference of X units or higher (effect size) in one group as compared to the other
22 4. Need SD of the Dependent Variable Use historical data if available Use the sample data from a feasibility study (e.g. 15 subjects) If you have no data to serve as a reference, you have to make an educated guess. Here’s a trick if your data is mound shaped and approximately normal. –Choose a representative low and high from your clinical experience, take the difference and divide by 4. –= ((high) – (low))/4 = SD estimate
23 5. Calculate a Standard Effect Size Effect size/standard deviation = standardized effect size Choose the error –Remember Power = 1 - , so a type 2 error of 0.20 yields a power of 0.80 –Power is the probability of failure to reject the null hypothesis when the null hypothesis is false concluding no difference when there really is a difference.
24 Power and Sample Size Example Continuous Glucose Monitoring Diabetes Study
25 CGM Study Two-group Comparison How many subjects do we need to be able to detect a difference in CGM mean daily glucose between patients on Lantus and Apidra insulin versus Premix analogue insulin? –Before you can answer this question, you must gather some more information.
26 Break down the problem CGM glucose at Week 12 = dependent variable of interest Want to compare two groups – each group has different patients Simple independent t-test Need SD of daily glucose Need to specify how large an effect you want to detect
27 Data from feasibility study Week 12 Data
28 CGM Study Two-group Comparison Compare Lantus & Apidra to Premix at 12 weeks Feasibility data available on 15 patients Independent t test will be used Alpha = 0.05, beta = 0.20, 2-tailed test Power = 0.80 –Null: Mean L & A = Mean Premix (H 0 : 1 2 ) (H A : 1 2 )
29 CGM Study Two-group Comparison SD from 15 patient feasibility study = 33
30 Estimating Sample Size of CGM Study Alpha = 0.05 for 1-sided, for 2-sided test Beta = 0.20, hence, power = 0.80 Clinically meaningful effect = 10 mg/dL difference (based upon clinical judgement) SD CGM glucose = 33 (from feasibility study) Standardized effect = 10/33 = 0.30 Check Appendix 6A in textbook for power Table 6A says you need 176 subjects per treatment group for a total of 352 subjects.
31 This is a directory of where you can find sample size and power programs
32 Useful Power Calculator Website
33 Online Power/Sample Size Power = 0.8, detect ES = 0.3 (10 mg/dL) N = 175 per group Power = 0.9, detect ES = 0.35 (11.6 mg/dL) N = 175 per group
34 Online Power/Sample Size Power = 0.8, detect ES = 0.5 (16.5 mg/dL) Sample size = 64/group Power = 0.8, detect ES = 1.57 (52 mg/dL) Sample size = N1 = 7, N2 = 8
35 CGM Study Paired Comparison Useful for longitudinal assessments CGM Study – You want to detect a decrease between Week 12 and Week 24 of 10 mg/dL You only have one group of patients, but they are measured on two separate occasions (Week 12 and Week 24).
36 15 patient feasibility study What is the mean glucose, parameter for the subjects at Week 12 versus Week 24? For simplicity, we are going to use the single value summary mean glucose levels at Wk 12 and Wk 24. Wk 0 Wk 12 Wk 24
37 Power and Sample Size for Paired t-test Power = 0.8, detect ES = 0.30 Need 92 subjects or “pairs” (Wk 12 and Wk 24) data. Remember with two independent groups we needed 175 subjects per group for a total of 350 subjects. When patients serve as their own control, you need “fewer” subjects to detect an equivalent effect size (ES) with the same power.
38 HRV Study Correlation and Multiple Regression Single-Group Study –Session 1 – Signal 1 HRV –Session 1 – Signal 2 BP –Demographic variables = Age, Gender –Clinical characteristics = Disease Status Suppose you want to look at associations between HRV, BP, demographic and clinical characteristics -- use bivariate correlation coefficient for 2 variables of multiple regression R 2 multiple predictors.
39 Power and Sample Size for Correlations (H 0 : r = 0) Power = , r = 0.3, ES = R 2 = 0.09, Sample size = 85 Power = 0.97, r = 0.4, ES = R 2 = 0.16, Sample size = 85 Only 1 “regressor” or predictor
40 Power and Sample Size for Correlations (H 0 : r = 0) Power = 0.80, r = 0.3, ES = R 2 = 0.09, Sample size = 139, if number of ipredictor variables = 5 Power = 0.80, r = 0.3, ES = R 2 = 0.09, Sample size = 177, if number of predictor variables = 10
41 Power and Sample Size for Test of Two Proportions You want to detect a difference between two proportions. Example: How many patients do you need in each group to detect a difference in the numbers of patients who adhere to diet and exercise at the end of 5 years. Old Program = 0.5 Adhere New Program= 0.7 Adhere Alpha = 0.05, Power = 0.8. You will need 103 individuals in each group.
42 Final Points Design your study such that you will have a sufficient number of subjects to be able to detect the effects that are clinically meaningful (high power). If you have a limited budget, and you can not afford to increase your sample size to the necessary levels, and lowering the variability is not feasible, you should consider alternative designs and hypotheses rather than proceeding with a study design with low power.