Sample size calculation Ahmed Hassouna, MD Sample size calculation Ahmed Hassouna, MD. Diploma of biostatistics applied on clinical trials CESAM, Paris 6 University, France.
Risks of error in statistics: concluding upon a random event is a double jeopardy:
Group 1 TTTa Group 2 TTTb We begin by assuming that study groups are comparable BEFORE the beginning of the study. Hence, any difference collected AFTER study, can be attributed to effect treatment.
1- We can conclude upon a difference that does not exist. TTTa Group 2 TTTb The risk to conclude upon an evidence that “does not exist”. The difference recorded AFTER the study was already present BEFORE the study; i.e. groups were not initially (primarily) non comparable from the start. Primary risk of error (α)
2- We can miss an existing difference General hospital Small hospital If the known percentage of baby boys to baby girls is 1:1, in which hospital do you think that one day will be born as much as twice baby girls to baby boys? Skewed data are more probable in a small sample than in large ones; carrying the risk “to miss” an existing relation. Secondary risk of error (β) 100 boys/ 100 girls 1-3 boys and girls Absence of evidence is not an evidence of absence
Factors governing sample size calculation 1- How much likely we want to be wrong in our conclusion (α) ? 2- How much likely we are ready to miss a conclusion (β) ? 3- What is the magnitude of the difference that we want to put into evidence? 4- What is the variability of outcome measure ? 5- What is the strength of our comparator: a competitor or just a placebo ?
1- How likely we want to make a “wrong” conclusion? What is the limit of our P value? We have to accept the “universal limit of being wrong in 5% of cases. Why? Because this is the maximum limit allowed to give to the world “a useless or harmful drug”
2- How likely we want to miss the evidence? The power of a study is less strict: 80-90%...Why? Because it is a question of money. Because the produced harm is less than that associated with the primary risk of error; i.e. missing a “potentially beneficial drug”. Reality (+) (-) Study (+) α (-) β Power of the study
3- Magnitude of the difference “The classic example” is the comparison of “THE CURE” and “THE POISON”. How many patients are you willing to sacrifice to show the huge difference between giving and taking life? “The real life” is that the differences are usually small and, the smaller is the difference; the larger your sample size should be. Always remember the mega trials arranged by pharmaceutical companies.
4- Variability of outcome The extreme example is that any patient taking the treatment will survive and every one who will be deprived from it will eventually die. The common example is that some patients improve on treatment, others show “some” improvement and “few” patients do not improve at all. The more is the variability of outcome, the more patients we will have to include in our study.
5- The direction of the study A bilateral study compares 2 treatments; aiming to show which is better: A or B. A unilateral study is usually organized to test the efficiency of a treatment by comparing it to placebo. It offers the opportunity to reach statistical significance with a smaller difference that is achieved with a smaller sample.
In order to maintain a total α of 5% α/2 = 2.5% α/2 = 2.5% α = 5% The IC at 95% The IC at 95% m m We are interested to know whether the birth weight is significantly different (smaller or larger) from the rest of the group; which is the typical question posed by a bilateral design: is “A” significantly better than “B” or significantly worse? In a unilateral design, our only interest is to know if the child is significantly heavier than the other children? A typical treatment vs. placebo study is only interested to prove that treatment is better than placebo but never the reverse.
In order to maintain a total α of 5% α/2 = 2.5% α/2 = 2.5% α = 5% The null hypothesis The null hypothesis 2 SD 1.65 SD m m H0 = both treatments are comparable. H1= TTTa is better than TTTb or TTTb is better than TTTa. The critical point of rejection is 2 SD from either sides of mean. H0= both treatments are comparable + placebo is better H1= treatment is better than placebo The critical point of rejection is 1.65 SD on one side only.
Single Outcome measure HF/not Single Outcome measure Qualitative Ordinal Categorical Number of groups 1 group 2 groups Multiple groups Number of measurements Once Repeated Quantitative Time-related event NYHA class peptide CRT
How to intelligently reduce your sample size? d/SD
Maximize the target difference (d) Avoid choosing CRUDE OUTCOMES as an end point. As example taking mortality as an indicator for successful myocardial preservation during CABG or PCI will be a total failure; for the fact that mortality figures are as low as 1-2% with only decimals differences, regardless of the preservation techniques or type of stent. A more sensitive measure of ischemia as Troponin I or a score made of a large number of indicators of morbidity (Syntax score) is capable to show larger detectable differences among compared groups.
Reduce variability (SD) Randomization and Stratification that allows the even distribution of sources of variability between compared groups. Adopting a unilateral or cross-over design if feasible or by reducing power of the study from 80 to 70%; which reduces size by 20%? Blocking some sources of variation and narrowing the scope of the study, like excluding females, known for bad outcome after CABG, from a minimal resources study on the subject.
Reduce variability (SD) Choosing a low variability outcome. Transforming a binary outcome into an ordered categorical one, reduces sample size by 30%. Repeatedly measuring a quantitative outcome by comparing pre to post values, taking the average of post values or better using the repeated measures ANOVA procedure.
Online sample size calculators. Survey study: http://www.raosoft.com/samplesize.html Comparison of 2 means or 2 proportions: http://www.stat.ubc.ca/~rollin/stats/ssize/n2.html http://www.stat.ubc.ca/~rollin/stats/ssize/b2.html Relative risk: http://www.stat.ubc.ca/~rollin/stats/ssize/caco.html.
Online sample size calculators Correlation or regression: www.hedwig.mgh.harvard.edu/sample_size/js/js_associative_quant.html Cross-over study: http://hedwig.mgh.harvard.edu/sample_size/js/js_crossover_quant.html
Thank you