Chapter 12 Survival Analysis.

Slides:



Advertisements
Similar presentations
Surviving Survival Analysis
Advertisements

Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness,
Statistical Issues in Contraceptive Trials
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Analysis of Time to Event Data
Intermediate methods in observational epidemiology 2008 Instructor: Moyses Szklo Measures of Disease Frequency.
Main Points to be Covered
Lecture 3 Survival analysis. Problem Do patients survive longer after treatment A than after treatment B? Possible solutions: –ANOVA on mean survival.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Main Points to be Covered Cumulative incidence using life table method Difference between cumulative incidence based on proportion of persons at risk and.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Cox Proportional Hazards Regression Model Mai Zhou Department of Statistics University of Kentucky.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Survival Analysis: From Square One to Square Two
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
1 Survival Analysis Biomedical Applications Halifax SAS User Group April 29/2011.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Lecture 3 Survival analysis.
Statistical approaches to analyse interval-censored data in a confirmatory trial Margareta Puu, AstraZeneca Mölndal 26 April 2006.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Prevalence The presence (proportion) of disease or condition in a population (generally irrespective of the duration of the disease) Prevalence: Quantifies.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
INTRODUCTION TO SURVIVAL ANALYSIS
Lecture 20: Study Design and Sample Size Estimation in TTE Studies.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring.
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Lecture 5: The Natural History of Disease: Ways to Express Prognosis
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 2: Aging and Survival.
01/20151 EPI 5344: Survival Analysis in Epidemiology Quick Review from Session #1 March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health &
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Some survival basics Developments from the Kaplan-Meier method October
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
Hazlina Hamdan 31 March Modelling survival prediction in medical data By Hazlina Hamdan Dr. Jon Garibaldi.
Topic 19: Survival Analysis T = Time until an event occurs. Events are, e.g., death, disease recurrence or relapse, infection, pregnancy.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
02/20161 EPI 5344: Survival Analysis in Epidemiology Hazard March 8, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
Methods and Statistical analysis. A brief presentation. Markos Kashiouris, M.D.
1 Analysis of Survival Data with Demographic Applications (Spring term 2006) Lecture 3: Non-Parametric Comparison of two or more Survival Curves.
Carolinas Medical Center, Charlotte, NC Website:
BIOST 513 Discussion Section - Week 10
Comparison of Two Survival Curves
Comparing Cox Model with a Surviving Fraction with regular Cox model
April 18 Intro to survival analysis Le 11.1 – 11.2
Survival Analysis Rick Chappell, Ph.D. Professor,
Survival curves We know how to compute survival curves if everyone reaches the endpoint so there is no “censored” data. Survival at t = S(t) = number still.
Survival Analysis: From Square One to Square Two Yin Bun Cheung, Ph.D. Paul Yip, Ph.D. Readings.
Statistics 103 Monday, July 10, 2017.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Where are we?.
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

Chapter 12 Survival Analysis

Survival Analysis Terminology Concerned about time to some event Event is often death Event may also be, for example 1. Cause specific death 2. Non-fatal event or death, whichever comes first death or hospitalization death or MI death or tumor recurrence

Survival Rates at Yearly Intervals YEARS At 5 years, survival rates the same Survival experience in Group A appears more favorable, considering 1 year, 2 year, 3 year and 4 year rates together

Beta-Blocker Heart Attack Trial LIFE-TABLE CUMULATIVE MORTALITY CURVE

Survival Analysis Discuss 1. Estimation of survival curves 2. Comparison of survival curves I. Estimation Simple Case All patients entered at the same time and followed for the same length of time Survival curve is estimated at various time points by (number of deaths)/(number of patients) As intervals become smaller and number of patients larger, a "smooth" survival curve may be plotted Typical Clinical Trial Setting

Staggered Entry T years 1 T years 2 Subject T years 3 T years 4 T 2T Time Since Start of Trial (T years) Each patient has T years of follow-up Time for follow-up taking place may be different for each patient

• * * Subject o Administrative 1 Censoring 2 Failure 3 Censoring Loss to Follow-up • * 4 T 2T Time Since Start of Trial (T years) Failure time is time from entry until the time of the event Censoring means vital status of patient is not known beyond that point

* • * Subject Administrative Censoring 1 o 2 Failure 3 Censoring Loss to Follow-up 4 * T Follow-up Time (T years)

Clinical Trial with Common Termination Date Subject 1 o 2 * 3 • 4 • o 5 • * • • 6 7 • • 8 • * 9 • o o 10 • o * 11 • o o T 2T Follow-up Time (T years) Trial Terminated

Reduced Sample Estimate (1) Years of Cohort Follow-Up Patients I II Total Entered 100 100 200 1 Died 20 25 45 Entered 80 75 155 2 Died 20 Survived 60

Reduced Sample Estimate (2) Suppose we estimate the 1 year survival rate a. P(1 yr) = 155/200 = .775 b. P(1 yr, cohort I) = 80/100 = .80 c. P(1 yr, cohort II) = 75/100 = .75 Now estimate 2 year survival Reduced sample estimate = 60/100 = 0.60 Estimate is based on cohort I only Loss of information

Actuarial Estimate (1) Ref: Berkson & Gage (1950) Proc of Mayo Clinic Cutler & Ederer (1958) JCD Elveback (1958) JASA Kaplan & Meier (1958) JASA - Note that we can express P(2 yr survival) as P(2 yrs) = P(2 yrs survival|survived 1st yr) P(1st yr survival) = (60/80) (155/200) = (0.75) (0.775) = 0.58 This estimate used all the available data

Actuarial Estimate (2) In general, divide the follow-up time into a series of intervals I1 I2 I3 I4 I5 t0 t1 t2 t3 t4 t5 Let pi = prob of surviving Ii given patient alive at beginning of Ii (i.e. survived through Ii -1) Then prob of surviving through tk, P(tk)

Actuarial Estimate (3) - Define the following Ii ti-1 ti ni = number of subjects alive at beginning of Ii (i.e. at ti-1) di = number of deaths during interval Ii li = number of losses during interval Ii (either administrative or lost to follow-up) - We know only that di deaths and losses occurred in Interval Ii

Estimation of Pi a. All deaths precede all losses b. All losses precede all deaths Deaths and losses uniform, (1/2 deaths before 1/2 losses) Actuarial Estimate/Cutler-Ederer - Problem is that P(t) is a function of the interval choice. - For some applications, we have no choice, but if we know the exact date of deaths and losses, the Kaplan‑Meier method is preferred.

Actuarial Lifetime Method (1) Used when exact times of death are not known Vital status is known at the end of an interval period (e.g. 6 months or 1 year) Assume losses uniform over the interval

Actuarial Lifetime Method (2) Lifetable At Number Number Adjusted Prop Prop. Surv. Up to Interval Risk Died Lost No. At Risk Surviving End of Interval (ni) (di) (li) 0-1 50 9 0 50 41/50-0.82 0.82 1-2 41 6 1 41-1/2=40.5 34.5/40.5=0.852 0.852 x 0.82=0.699 2-3 34 2 4 34-4/2=32 30/32=0.937 0.937 x 0.699=0.655 3-4 28 1 5 28-5/2=25.5 24.5/25.5=0.961 0.961 x 0.655=0.629 4-5 22 2 3 22-3/2=20.5 18.5/20.5=0.902 0.902 x 0.629=0.567

Actuarial Survival Curve 100 80 60 40 20 X ___ X___ X___ X___ X___ X___ 1 2 3 4 5

Kaplan-Meier Estimate (1) (JASA, 1958) Assumptions 1. "Exact" time of event is known Failure = uncensored event Loss = censored event 2. For a "tie", failure always before loss 3. Divide follow-up time into intervals such that a. Each event defines left side of an interval b. No interval has both deaths & losses

Kaplan-Meier Estimate (2) (JASA, 1958) Then ni = # at risk just prior to death at ti Note if interval contains only losses, Pi = 1.0 Because of this, we may combine intervals with only losses with the previous interval containing only deaths, for convenience X———o—o—o——

Estimate of S(t) or P(t) Suppose that for N patients, there are K distinct failure (death) times. The Kaplan-Meier estimate of survival curves becomes P(t)=P (Survival  t) K-M or Product Limit Estimate ti  t i = 1,2,…,k where ni = ni-1 - li-1 - di-1 li-1 = # censored events since death at ti-1 di-1 = # deaths at ti-1

Estimate of S(t) or P(t) Variance of P(t) Greenwood’s Formula

KM Estimate (1) Example (see Table 14-2 in FFD) Suppose we follow 20 patients and observe the event time, either failure (death) or censored (+), as [0.5, 0.6+), [1.5, 1.5, 2.0+), [3.0, 3.5+, 4.0+), [4.8], [6.2, 8.5+, 9.0+), [10.5, 12.0+ (7 pts)] There are 6 distinct failure or death times 0.5, 1.5, 3.0, 4.8, 6.2, 10.5

KM Estimate (2) 1. failure at t1 = 0.5 [.5, 1.5) n1 = 20 d1 = 1 l1 = 1 (i.e. 0.6+) If t [.5, 1.5), p(t) = p1 = 0.95 V [ P(t1) ] = [.95]2 {1/20(19)} = 0.0024 ^ ^

KM Estimate (4) Data [0.5, 0.6+), [1.5, 1.5, 2.0+), 3.0 etc. 2. failure at t2 = 1.5 n2 = n1 - d1 - l1 [1.5, 3.0) = 20 - 1 - 1 = 18 d2 = 2 l2 = 1 (i.e. 2.0+) If t  [1.5, 3.0), then P(t) = (0.95)(0.89) = 0.84 V [P(t2)] = [0.84]2 { 1/20(19) + 2/18(18-2) } = 0.0068

Kaplan-Meier Life Table for 20 Subjects Followed for One Year Interval Interval Time Number of death nj dj lj [.5,1.5) 1 .5 20 1 1 0.95 0.95 0.0024 [1.5,3.0) 2 1.5 18 2 1 0.89 0.84 0.0068 [3.0,4.8) 3 3.0 15 1 2 0.93 0.79 0.0089 [4.8,6.2) 4 4.8 12 1 0 0.92 0.72 0.0114 [6.2,10.5) 5 6.2 11 1 2 0.91 0.66 0.0135 [10.5, ) 6 10.5 8 1 7* 0.88 0.58 0.0164 nj : number of subjects alive at the beginning of the jth interval dj : number of subjects who died during the jth interval lj : number of subjects who were lost or censored during the jth interval : estimate for pj, the probability of surviving the jth interval given that the subject has survived the previous intervals : estimated survival curve : variance of * Censored due to termination of study

Kaplan-Meier Estimate Survival Curve Kaplan-Meier Estimate 1.0 o * 0.9 ^ * * o 0.8 * o o * Estimated Survival Cure [P(t)] 0.7 * o o o o 0.6 o * o o o o 0.5 2 4 6 8 10 12 Survival Time t (Months)

Comparison of Two Survival Curves Assume that we now have a treatment group and a control group and we wish to make a comparison between their survival experience 20 patients in each group (all patients censored at 12 months) Control 0.5, 0.6+, 1.5, 1.5, 2.0+, 3.0, 3.5+, 4.0+, 4.8, 6.2, 8.5+, 9.0+, 10.5, 12+'s Trt 1.0, 1.6+, 2.4+, 4.2+, 4.5, 5.8+, 7.0+, 11.0+, 12+'S

Kaplan-Meier Estimate for Treatment 1. t1 = 1.0 n1 = 20 p1 = 20 - 1 = 0.95 d1 = 1 20 l1 = 3 p(t) = .95 2. t2 = 4.5 n2 = 20 - 1 - 3 p2 = 16 - 1 =0 .94 = 16 16 d2 = 1 ^

Kaplan-Meier Estimate 1.0 * o TRT 0.9 * ^ * * o 0.8 * o o * Estimated Survival Cure [P(t)] 0.7 CONTROL * o o 0.6 o * o o o o 0.5 2 4 6 8 10 12 Survival Time t (Months)

Comparison of Two Survival Curves Comparison of Point Estimates Suppose at some time t* we want to compare PC(t*) for the control and PT(t*) for treatment The statistic has approximately, a normal distribution under H0 Example:

Comparison of Overall Survival Curve H0: Pc(t) = PT(t) A. Mantel-Haenszel Test Ref: Mantel & Haenszel (1959) J Natl Cancer Inst Mantel (1966) Cancer Chemotherapy Reports - Mantel and Haenszel (1959) showed that a series of 2 x 2 tables could be combined into a summary statistic (Note also: Cochran (1954) Biometrics) - Mantel (1966) applied this procedure to the comparison of two survival curves - Basic idea is to form a 2 x 2 table at each distinct death time, determining the number in each group who were at risk and number who died

Comparison of Two Survival Curves (1) Suppose we have K distinct times for a death occurring ti i = 1,2, .., K. For each death time, Died At Risk at ti Alive (prior to ti) Treatment ai bi ai + bi Control ci di ci + di ai + ci bi + di Ni Consider ai, the observed number of deaths in the TRT group, under H0

Comparison of Two Survival Curves(2) E(ai) = (ai + bi)(ai + ci)/Ni C Mantel-Haenszel Statistic

Comparison of Survival Data for a Control Group and an Intervention Group Using the Mantel-Haenszel Procedure Rank Event Intervention Control Total Times j tj aj + bj aj lj cj + dj cj lj aj + cj bj + dj 1 0.5 20 0 0 20 1 1 1 39 2 1.0 20 1 0 18 0 0 1 37 3 1.5 19 0 2 18 2 1 2 35 4 3.0 17 0 1 15 1 2 1 31 5 4.5 16 1 0 12 0 0 1 27 6 4.8 15 0 1 12 1 0 1 26 7 6.2 14 0 1 11 1 2 1 24 8 10.5 13 0 1 8 1 1 20 aj + bj = number of subjects at risk in the intervention group prior to the death at time tj cj + cj = number of subjects at risk in the control group prior to the death at time tj aj = number of subjects in the intervention group who died at time tj cj = number of subjects in the control group who died at time tj lj = number of subjects who were lost or censored between time tj and time tj+1 aj + cj = number of subjects in both groups who died at time tj bj + dj = number of subjects in both groups who are at risk minus the number who died at time tj

Mantel-Haenszel Test Operationally 1. Rank event times for both groups combined 2. For each failure, form the 2 x 2 table a. Number at risk (ai + bi, ci + di) b. Number of deaths (ai, ci) c. Losses (lTi, lCi) Example (See table 14-3 FFD) - Use previous data set Trt: 1.0, 1.6+, 2.4+, 4.2+, 4.5, 5.8+, 7.0+, 11.0+, 12.0+'s Control: 0.5, 0.6+, 1.5, 1.5, 2.0+, 3.0, 3.5+, 4.0+, 4.8, 6.2, 8.5+, 9.0+, 10.5, 12.0+'s

1. Ranked Failure Times - Both groups combined 0.5, 1.0, 1.5, 3.0, 4.5, 4.8, 6.2, 10.5 C T C C T C C C 8 distinct times for death (k = 8) 2. At t1 = 0.5 (k = 1) [.5, .6+, 1.0) T: a1 + b1 = 20 a1 = 0 lT1 = 0 c1 + d1 = 20 c1 = 1 lC1 = 1 1 loss @ .6+ D A R T 0 20 20 C 1 19 20 1 39 40 E(a1)= 1•20/40 = 0.5 V(a1) = 1•39 • 20 • 20 402 •39

3. At t2 = 1.0 (k = 2) [1.0, 1.5) T: a2 + b2 = (a1 + b1) - a1 - lT1 a2 = 1.0 = 20 - 0 - 0 = 20 lT2 = 0 C. c2 + d2 = (c1 + d1) - c1 - lC1 c2 = 0 = 20 - 1 - 1 = 18 lC2 = 0 so D A R T 1 19 20 C 0 18 18 1 37 38 E(a2)= 1•20 38 V(a2) = 1•37 • 20 • 18 382 •37

* Number in parentheses indicates time, tj, of a death in either group Eight 2x2 Tables Corresponding to the Event Times Used in the Mantel-Haenszel Statistic in Survival Comparison of Treatment (T) and Control (C) Groups 1. (0.5 mo.)* D† A‡ R§ 5. (4.5 mo.)* D A R T 0 20 20 T 1 15 16 C 1 19 20 C 0 12 12 1 39 40 1 27 28 2. (1.0 mo) D A R 6. (4.8 mo.) D A R T 1 19 20 T 0 15 15 C 0 18 18 C 1 11 12 1 37 38 1 26 27 3. (1.5 mo.) D A R 7. (6.2 mo.) D A R T 0 19 19 T 0 14 14 C 2 16 18 C 1 10 11 2 35 37 1 24 25 4. (3.0 mo.) D A R 8. (10.5 mo.) D A R T 0 17 17 T 0 13 13 C 1 14 15 C 1 7 8 1 31 32 1 20 21 * Number in parentheses indicates time, tj, of a death in either group † Number of subjects who died at time tj ‡ Number of subjects who are alive between time tj and time tj+1 § Number of subjects who were at risk before the death at time tj R=D+A)

Compute MH Statistics Recall K = 1 K = 2 K = 3 t1 = 0.5 t2 = 1.0 t3 = 1.5 D A 0 20 20 1 19 20 1 39 40 D A 1 19 20 0 18 18 1 37 38 D A 0 19 19 2 16 18 2 35 37 a. ai = 2 (only two treatment deaths) b. E(ai ) = 20(1)/40 + 20(1)/38 + 19(2)/37 + . . . = 4.89 c. V(ai) = = 2.22 d. MH = (2 - 4.89)2/2.22 = 3.76 or ZMH =

B. Gehan Test (Wilcoxon) Ref: Gehan, Biometrika (1965) Mantel, Biometrics (1966) Gehan (1965) first proposed a modified Wilcoxon rank statistic for survival data with censoring. Mantel (1967) showed a simpler computational version of Gehan’s proposed test. 1. Combine all observations XT’s and XC’s into a single sample Y1, Y2, . . ., YNC + NT 2. Define Uij where i = 1, NC + NT j = 1, NC + NT -1 Yi < Yj and death at Yi Uij = 1 Yi > Yj and death at Yj 0 elsewhere 3. Define Ui i = 1, … , NC + NT

Gehan Test Note: Ui = {number of observed times definitely less than i} {number of observed times definitely greater} 4. Define W = S Ui (controls) 5. V[W] = NCNT Variance due to Mantel 6. Example (Table 14-5 FFD) Using previous data set, rank all observations

The Gehan Statistics, Gi involves the scores Ui and is defined as G = W2/V(W) where W = Ui (Uis in control group only) and

Example of Gehan Statistics Scores Ui for Intervention and Control (C) Groups Observation Ranked Definitely Definitely = Ui i Observed Time Group Less More 1 0.5 C 0 39 -39 2 (0.6)* C 1 0 1 3 1.0 I 1 37 -36 4 1.5 C 2 35 -33 5 1.5 C 2 35 -33 6 (1.6) I 4 0 4 7 (2.0) C 4 0 4 8 (2.4) I 4 0 4 9 3.0 C 4 31 -27 10 (3.5) C 5 0 5 11 (4.0) C 5 0 5 12 (4.2) I 5 0 5 13 4.5 I 5 27 -22 14 4.8 C 6 26 -20 15 (5.8) I 7 0 7 16 6.2 C 7 24 -17 17 (7.0) I 8 0 8 18 (8.5) C 8 0 8 19 (9.0) C 8 0 8 20 10.5 C 8 20 -12 21 (11.0) I 9 0 9 22-40 (12.0) 12I, 7C 9 0 9 *Censored observations

Gehan Test Thus W = (-39) + (1) + (-36) + (-33) + (4) + . . . . = -87 and V[W] = (20)(20) {(-39)2 +12 + (-36)2 + . . . } (40)(39) = 2314.35 so Note MH and Gehan not equal

Cox Proportional Hazards Model Ref: Cox (1972) Journal of the Royal Statistical Association Recall simple exponential S(t) = e-lt More complicated If l(s) = l, get simple model Adjust for covariates Cox PHM l(t,x) =l0(t) ebx

Cox Proportional Hazards Model So S(t1,X) = = Estimate regression coefficients (non-linear estimation) b, SE(b) Example x1 = 1 Trt 2 Control x2 = Covariate 1 indicator of treatment effect, adjusted for x2, x3 , . . . If no covariates, except for treatment group (x1), PHM = logrank

Survival Analysis Summary Time to event methodology very useful in multiple settings Can estimate time to event probabilities or survival curves Methods can compare survival curves Can stratify for subgroups Can adjust for baseline covariates using regression model Need to plan for this in sample size estimation & overall design