Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Advertisements

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Sample size estimation
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Departments of Medicine and Biostatistics
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Topic 6: Introduction to Hypothesis Testing
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter Seventeen HYPOTHESIS TESTING
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Sample Size Determination In the Context of Hypothesis Testing
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
BS704 Class 7 Hypothesis Testing Procedures
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Sample Size Determination
Sample size and study design
Chapter 14 Inferential Data Analysis
Sample Size Determination Ziad Taib March 7, 2014.
Introduction to Biostatistics and Bioinformatics
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Inference for a Single Population Proportion (p).
Comparing Two Population Means
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
1 Statistics in Drug Development Mark Rothmann, Ph. D.* Division of Biometrics I Food and Drug Administration * The views expressed here are those of the.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Chapter 20 Testing Hypothesis about proportions
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
Medical Statistics as a science
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Sample Size Determination
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 7
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Hypothesis Tests for 1-Proportion Presentation 9.
Chapter 9 Introduction to the t Statistic
Chapter Nine Hypothesis Testing.
Sample Size Determination
BIOST 513 Discussion Section - Week 10
Two-Sample Hypothesis Testing
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Statistical Core Didactic
Hypothesis testing using contrasts
Sample Size Estimation
Chapter 8: Inference for Proportions
Hypothesis Testing: Hypotheses
Power, Sample Size, & Effect Size:
Comparing Populations
Type I and Type II Errors
Presentation transcript:

Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

NIH Study Section l Significance l Approach l Innovation l Environment l Investigators

Approach l Feasibility l Study Design: Controls, Interventions l Study Size: Sample Size, Power l Data Analysis

Sample Size l # of Animals l # of Measurement Sites/Animal l # of Replications

Sample Size l What #s are Proposed? l Adequacy of #s? l Compelling Rationale for Adequacy? l Do We need More? l Can We Answer Questions With Fewer?

Sample Size l Simple Question to Ask l Answer May Involve: Assumptions Pilot Data Simplification of Overall Aims to a Single Question

Simplification l What Is The Question? l What Is The Primary Outcome Variable? l What Is The Principal Hypothesis?

Pilot Data l Relationship To Question. l Relationship To Primary Variable. l Relationship To Hypothesis.

Sample Size/Power Freeware on Web: l wer/ wer/ l l e_size/size.html e_size/size.html l l l l

Sample Size l Purchase Software m/ m/ Nquery:

Animal Studies l Differences usually large l Variability usually small l Small sample sizes l Many groups l Repeated measures

Sample Size (# Animals Required) l Excerpts from the MUSC Vertebrate Animal Review Application Form: “ A power analysis or other statistical justification is required where appropriate. Where the number of animals required is dictated by other than statistical considerations… justify the number… on this basis.”

Sample Size: Ethical Issues in Animal Studies l Ethical Issues Study too large implies some animals needlessly sacrificed Study too small implies potential for misleading conclusions, unnecessary experimentation l Mann MD, Crouse DA, Prentice ED. Appropriate animal numbers in biomedical research in Light of Animal Welfare Considerations. Laboratory Animal Science, 1991, 41:

Ethical Issues Cont. l Human studies - same rationale hold for studies that are too large or too small.

Sample Size: Specifying the Hypothesis l Specifying the hypothesis difference from control? differences among groups over time? differences among groups at a particular point in time? l A “non-hypothesis” Animals in Group A will do better than animals in Group B

Sample Size: Specifying the Hypothesis l H o : Mean blood pressure on drug A = mean blood pressure on drug B measured six hours after start of treatment. l H a : Mean blood pressure on drug A < mean blood pressure on drug B measured six hours after start of treatment.

Example (SHR ) l Animal blood pressures measured at baseline l Animals randomly assigned to placebo or minoxidil l Animals measured 6 hours post treatment l Changes from baseline calculated for each animal

Example (Continued) l Placebo changes thought to be centered at 0 l Expect minoxidil to lower blood pressure, we think by 10 mm Hg l Blood pressure changes have a standard deviation of 5 mm Hg

Example (Continued) l How many animals/group needed to have 90 % power to detect the 10 mm Hg mean difference? l How would this sample size change if the standard deviation is 10 mm Hg rather than 5 mm Hg?

Example (Continued) l Suppose we change the endpoint to, did the animal achieve a reduction in blood pressure of 10 or more mm Hg. l Therefore 50 % of those on minoxidil would be expected to have reduction of 10 or more. l About 2.5 % of those on placebo would have reduction of 10 or more.

Example (Continued) l How many animals/group required to have 90 % power to detect the 50 % vs %? l Why the difference in sample sizes for the same experiment?Comment on: Assumptions Endpoint Specific hypothesis.

Sample Size: Distribution of Response l Nominal/binary (Binomial) dead, alive l Ordinal (Non-parametric) inflammation (mild, moderate, severe) l Continuous (Normal*) blood pressure * may require transformation

Sample Size: Distribution of Response Binomial N is a function of probability of response in control and probability of response in treated animals Normal N is a function of difference in means and standard deviation

Sample Size: One Sample or 2-sample Test l One sample Change from baseline in one group Comparison to standard (historical controls) l Two sample Two independent study groups

Sample Size: One or Two sided test l One sided test : Ha: a > 0 Ha: a < 0 l Two sided test Ha: a not = 0

Sample Size: Choosing  l  = probability of Type 1 error l probability of rejecting H o when H o is true l significance level usually 0.01 or 0.05 l “calling an innocent person guilty” l “concluding two groups are different when they are not”

Sample Size: Choosing  l Multiple testing can lead to  errors. l Pre-specified hypotheses, may not need to adjust; l If all pairwise comparisons are of interest, adjust  (  /#tests)

Sample Size: Choosing  l  = probability of type II error; l probability of failing to reject H o when a true difference exists. l “Calling guilty person innocent” l “Missing a true difference” l Power = 1 -  l Large clinical trials use 0.9 or 0.95; animal studies usually use 0.8 (80% power).

Sample Size: Power l Concluding groups do not differ when power is low is risky. True difference may have been missed. l 80% power implies a 20% chance of missing a true difference. l 40% power implies a 60% chance of missing a true difference.

Sample Size: Calculation l Calculate N specify difference to be detected specify variability (continuous data) OR l Calculate detectable difference: specify N specify variability (continuous) or control %

Sample Size: Putting it all together Continuous (Normal) Distribution Need all but one: , ,  2, , N Z  = 1.96 (2 sided, 0.05); Z  = (always one-sided, 0.05, 95% power)  = difference between means  2 = pooled variance 2 22     ) Z4(Z 2n

Difference (P 1 -P 2 ) (  =0.05, one-sided test, N per group=100, P 1 =0.5) Power

Sample Size (  =0.05, one-sided test, P 1 =.5, P 2 =.3)

Nquery Advisor l About $700 l Many more options than many other programs l Available in student room in our department

Nquery Advisor l Under “file” choose “New” l Choices means proportions agreement survival (time to event) regression l # groups (1,2,>2) l testing, confidence intervals, equivalence

Examples l Continuous response l Binary response

Sample Size: Specifying the Hypothesis l H o : Mean blood pressure on drug A = mean blood pressure on drug B measured six hours after start of treatment. l H a : Mean blood pressure on drug A < mean blood pressure on drug B measured six hours after start of treatment.

Example (SHR ) l Animal blood pressures measured at baseline l Animals randomly assigned to placebo or minoxidil l Animals measured 6 hours post treatment l Changes from baseline calculated for each animal

Example (Continued) l Placebo changes thought to be centered at 0 l Expect minoxidil to lower blood pressure, we think by 10 mm Hg l Blood pressure changes have a standard deviation of 5 mm Hg

Example (Continued) l How many animals/group needed to have 90 % power to detect the 10 mm Hg mean difference? l How would this sample size change if the standard deviation is 10 mm Hg rather than 5 mm Hg?

Example (Continued) l Suppose we change the endpoint to, did the animal achieve a reduction in blood pressure of 10 or more mm Hg. l Therefore 50 % of those on minoxidil would be expected to have reduction of 10 or more. l About 2.5 % of those on placebo would have reduction of 10 or more.

Example (Continued) l How many animals/group required to have 90 % power to detect the 50 % vs. 2.5 %? l Why the difference in sample sizes for the same experiment?Comment on: Assumptions Endpoint Specific hypothesis.

Sample Size: More Than One Primary Response l Use largest sample size.

Sample Size: Food for Thought l Is detectable difference biologically meaningful? l Is sample size too small to be believable? l N = 5 “rule of thumb” but is this valid for the experiment being planned.

Sample Size: Misunderstandings l “Larger the difference, smaller the sample size” ignores contribution of variability l failing to report power for negative study calculate based on hypothesized difference and observed variability

Sample Size: Keeping It Small l Study continuous rather than binary outcome (if variability does not increase) change in tumor size instead of recurrence l Study surrogate outcome where effect is large cholesterol reduction rather than mortality

Examples Of Surrogate Outcome Measures? l Bone density l Quality of life l Patency l Pain relief l Functional Status l Cholesterol

Sample Size: Keeping It Small l Decrease variability Change from baseline or analysis of covariance training equipment choice of animal model

Sample Size: Keeping It Small l  = 0.05, 2-sided test l  = 0.2 ; power = 0.8 (80%) l Difference between two means = 1 l Standard deviation = 2; N = 64/group l Standard deviation = 1; N = 17/group

Sample Size Estimation l Parameters are estimates l Estimate of relative effectiveness based on other populations l Effectiveness overstated l Patients in trials do better l Assuming mathematical models l Compromise between available resources/objectives

Sample Size: Pilot Studies l No information on variability l No information on efficacy l Use effect size from similar studies or gather pilot data for estimation

Simplification l What Is The Question? l What Is The Primary Outcome Variable? l What Is The Principal Hypothesis?

Sample Size/Power Freeware on Web: l wer/ wer/ l l e_size/size.html e_size/size.html l l l l

Sample Size l Purchase Software m/ m/ Nquery:

Additional Comments

Pilot Studies l Complication rate P = 1 – (1 – r) N where r = complication rate N = sample size If know desired P and N can solve for r If know desired r and P, can solve for N

Example to Work l Want to have 90% probability of detecting at least one complication, given a 25% complication rate. What N do you need? l You are studying 25 people and want 80% probability of detecting at least one complication. What is the complication rate that would yield this probability.

Pilot Studies l Use larger alpha (>0.05, e.g or 0.2) to compute sample size If reject null hypothesis will test in future study l Underlying concept – futility; ensure new treatment not worse than standard.

Pilot Studies l Can reformulate hypothesis Ho: new treatment = placebo Ha: new treatment < placebo Continue to larger study if fail to reject Ho.

Avoid Data Driven Comparisons l Test here 

Randomization: Bias Due to Order of Observations l Learning effect l Change in laboratory techniques l Different litters l Carry-over effects under estimate carry-over two treatments, same animal give A & B; can only test effect of B after A

Randomization: Order Effects Continued l System fatigue rabbit heart’s ability to function after two different treatments

Randomization: Order Effects Continued l Seasonal variability All rats male, same weight, same age, media temperature and other incubation conditions identical, housed in identical conditions Outcome - unstimulated renin release from kidneys (in vitro) samples at 30 minutes Outcome - Metastasis - winter 16% (n=767; summer 8% (n=142) ; logistic regression p<0.03 for season