Sample size and study design

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Inference Sampling distributions Hypothesis testing.
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
An Inference Procedure
Evaluating Hypotheses
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics
Statistics for Managers Using Microsoft® Excel 5th Edition
Sample Size Determination Ziad Taib March 7, 2014.
Statistical considerations for grants Brian Healy.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Overview Definition Hypothesis
Confidence Intervals and Hypothesis Testing - II
Introduction to Biostatistics and Bioinformatics
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Fundamentals of Hypothesis Testing: One-Sample Tests
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Elementary Statistical Methods André L. Souza, Ph.D. The University of Alabama Lecture 22 Statistical Power.
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
1 Lecture note 4 Hypothesis Testing Significant Difference ©
Hypothesis testing Summer Program Brian Healy. Last class Study design Study design –What is sampling variability? –How does our sample effect the questions.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Section 10.1 Confidence Intervals
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Issues concerning the interpretation of statistical significance tests.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
© Copyright McGraw-Hill 2004
Statistical Techniques
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Hypothesis test flow chart
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Central Limit Theorem, z-tests, & t-tests
Presentation transcript:

Sample size and study design Brian Healy, PhD

Comments from last time We did not cover confounding Too much in one class/Not enough examples/Superficial level I wanted to show one example for each type of analysis so that you can determine what your data matches. This way you can speak to a statistician knowing the basic ideas. My hope was for you to feel confident enough to learn more about the topics relevant to you Worked example lectures This is not basic biostatistics I did Teach for America

Objectives Type II error How to improve power? Sample size calculation Study design considerations

Review Previous classes we have focused on data analysis AFTER data collection Hypothesis testing allowed us to determine whether there was a statistically significant: Difference between groups Association between two continuous factors Association between two dichotomous factors

Example We know that the heart rate for healthy adult is 80 beats per minute and this has an approximately normal distribution (according to my wife) Some elite athletes, like Lance Armstrong, have lower heart rate, but it is not known if this is true on average How could we address this question?

Experimental design One way to do this is to collect a sample of normal controls and a sample of elite athletes and compare their mean What test would you use? Another way is to collect a sample of elite athletes and compare their mean to the known population mean This is a one sample test Null hypothesis: meanelite=80

Question How large a sample of elite athletes should I collect? What is the benefit of having a large sample size? More information More accurate estimate of the population mean What is the disadvantage of a large sample size? Cost Effort required to collect What is the “correct” sample size?

Effect of sample size Let’s say we wanted to estimate the blood pressure of people at MGH If we sampled 3 people, would we have a good estimate of the population mean? How much will sample mean vary from sample to sample? Does our estimate of the improve if we sampled 30 people? Would the sample mean to vary more or less from sample to sample? What about 300 people?

Simulation http://onlinestatbook.com/stat_sim/sampling_dist/index.html What is the shape of the distribution of sample means? Where is the curve centered? What happens to curve as sample size increases? Technical: Central limit theorem

Standard error of the mean There are two measures of spread in the data Standard deviation: measure of spread of the individual observations The estimate of this is the standard deviation of the observations: Standard error: standard deviation of the sample mean The estimate of this is the standard deviation of the observations divided by the sample size

Technical: Distribution of sample mean under the null If we took repeated samples and calculated the sample mean, the distribution of the sample means would have a distribution Spread in distribution is based on standard error Mean of distribution=80

Type I error We could plot the distribution of the sample means under the null before collecting data Type I error is the probability that you reject the null given that the null is true a = P(reject H0 | H0 is true) Notice that the shaded area is still part of the null curve, but it is in the tail of the distribution a

Hypothesis test-review After data collection, we can calculate the p-value If the p-value is less than the pre-specified a-level, we reject the null hypothesis

As the sample size increases, the standard error decreases p-value is based on the standard error As you sample size increases, the p-value decreases if the mean and standard deviation do not change With an extremely large sample, a very small departure from the null is statistically significant What would you think if you found the sample mean heart rate of three elite athletes was 70 beats per minute? Do your thoughts change if you sampled 300 athletes and found the same sample mean?

How much data should we collect? Depends on several factors: Type I error Type II error (power) Difference we are trying to detect (null and alternative hypotheses) Standard deviation Remember this is decided BEFORE the study!!!

Type II error Definition: when you fail to reject the null hypothesis when the alternative is in fact true (type II error) This type of error is based on a specific alternative b= P(fail to reject the H0 | HA is true)

Power Definition: the probability that you reject the null hypothesis given that the alternative hypothesis is true. This is what we want to happen. Power = P(reject Ho | HA is true) = 1 - b Since this is a good thing, we want this to be high

Fail to reject H0 Reject Ho This is the population distribution under the null hypothesis The location of the curve is m0 and the spread in the curve is the standard error This is the cut-off value. This is the population distribution under the alternative hypothesis m0 m1

Fail to reject H0 Reject Ho a = P(reject H0| H0 is true) m0 Power = P(reject H0| HA is true) = P(fail to reject H0| HA is true) m1

Life is a trade off These two errors are related We usually assume that the type I error is 0.05 and calculate the type II error for a specific alternative If you are want to be more strict and falsely reject the null only 1% of the time (a=0.01), the chance of a type II error increases Sensitivity/specificity or false positive/false negative

Changing the power Note how the power (green) increases as you increase the difference between the null and alternative hypotheses How else do you think we could increase the power?

Another way to increase power is to increase type I error rate Two other ways to increase power involve changing the shape of the distribution Increasing the sample size When the sample size increases, the curve for the sample means tightens Decreasing the variability in the population When there is less variability, the curve for the sample means also tightens

Example For our study, we know that we can enroll 40 elite athletes. We also know that the population mean is 80 beats per minute and the standard deviation is 20 We believe the elite athletes will have a mean of 70 beats per minute How much power would we have to detect this difference at the two-sided 0.05 level? All this information fully defined our curves

Using STATA, we find that we have 88 Using STATA, we find that we have 88.5% power to detect the difference of 10 beats per minute between the groups at the two-sided 0.05 level using a one sample z-test Question: If we were able to enroll more subjects would our power increase or decrease?

Conclusions For a specific sample size, standard deviation, difference between the means and type I error, we can calculate the power Changing any of the four parameters above will change the power Some under the control of the investigator, but others are not

Sample size Up to now we have shown how to find the power given a specific sample size, difference between the means, standard deviation and alpha level. We can vary any four of these five factors and find the fifth. Usually the alpha level is required to be two-sided 0.05 How can we calculate the sample size for specific values of the remaining parameters?

Two approaches to sample size Hypothesis testing When you have a specific null AND alternative hypothesis in mind Confidence interval When you want to place an interval around an estimate

Hypothesis testing approach State null and alternative hypothesis Null usually pretty easy Alternative is more difficult, but very important State standard deviation of outcome State desired power and alpha level Power=0.8 Alpha=0.05 for two-sided test State test Use statistical package to calculate sample size

We know the location of the null and alternative curves, but we do not know the shape because the sample size determines the shape. We need to find the sample size that will give the curves the shape so that the a level and power equal the specified values. Alpha=0.025 Power=0.8 Beta=0.2

General form of sample size calculation Here is the general form of the normal sample size One-sided Two-sided Standard deviation Related to Type I error Sample size Mean under null and alternative Related to Type II error

Hypothesis testing approach State null and alternative hypothesis H0: m0=80 HA: m1=70 sd=20 State desired power and alpha level Power=0.8 Alpha=0.05 for two-sided test State test: z-test n=31.36  n=32

Example-more complex In a recently submitted grant, we investigated the sample size required to detect a difference between RRMS and SPMS patients in terms of levels of a marker Preliminary data: RRMS: mean level=0.54 +/- 0.37 SPMS: mean level=0.94 +/- 0.42

Hypothesis testing approach State null and alternative hypothesis H0: meanRRMS=meanSPMS=0.54 HA: meanRRMS=0.54, meanSPMS=0.94, Difference between groups=0.4 sdRRMS=0.37, sdSPMS=0.42 State desired power and alpha level Power=0.8 Alpha=0.05 for two-sided test State test: t-test

Results Use these values in statistical package 17 samples from each group are required Website: http://hedwig.mgh.harvard.edu/sample_size/size.html

Statistical considerations for grant “Group sample sizes of 17 and 17 achieve at least 80% power to detect a difference of -0.400 between the null hypothesis that both group means are 0.540 and the alternative hypothesis that the mean of group 2 is 0.940 with estimated group standard deviations of 0.370 and 0.420 and with a significance level (alpha) of 0.05 using a two-sided two-sample t-test.”

Technical remarks So we have shown that we can calculate the power for a given sample size and sample size for a given power. We can also change the clinically meaningful difference if we set the sample size and power. In many grant applications, we show the power for a variety of sample sizes and differences in the means in a table so that the grant reviewer can see that there is sufficient power to detect a range of differences with the proposed sample size.

Confidence interval approach If we do not have a set alternative, we can choose the sample size based on how close to the truth we want to get In particular we choose the sample size so that the confidence interval is of a certain width

Under a normal distribution, the confidence interval for a single sample mean is We can choose the sample size to provide the specified width of the confidence interval

Conclusions Sample size can be calculated if the power, alpha level, difference between the groups and standard deviation are specified For more complex setting than those presented here, statisticians have worked out the sample size calculations, but still need estimates of the hypothesized difference and variability in the data

Study design

Reasons for differences between groups Actual effect-when there is a difference between the two groups (ex. the treatment has an effect) Chance Bias Confounding

Chance When we run a study, we can only take a sample of the population. Our conclusions are based on the sample we have drawn. Just by chance, sometimes we can draw an extreme sample from the population. If we had taken a different sample, we may have drawn different conclusions. We call this sampling variability.

Note on variability Even though your experiments are well controlled, not all subjects will behave exactly the same This is true for almost all experiments If all animals acted EXACTLY the same, we would only need one animal Since one is not enough, we observe a group of mice We call this our sample Based on our sample, we draw a conclusion regarding the entire population

Study design considerations Null hypothesis Outcome variable Explanatory variable Sources of variability Experimental unit Potential correlation Analysis plan Sample size

Example We start with a single group (ex. Genetically identical mice) The group are broken into 3 groups that are treated with 3 different interventions An outcome is measured in each individual Questions: What analysis should we do? What is the effect of starting from the same population? Do we need to account for repeated measures?

Original group Condition 1 Condition 3 Condition 2

Generalizability Assume that we have found a difference between our exposure and control group and we have shown that this result is not likely due to chance, bias or confounding. What does this mean for the general population? Specifically, to which group can we apply our results? This is often based on how the sample was originally collected.

Example 2 We want to compare the expression of a marker in patients vs. controls Full sample size is 288 samples Can only run 24 samples (1 plate) per day Questions: What types of analysis should we do? Can we combine across the plates? Could other confounders be important to collect?

Plate 1: 10 patients, 14 controls Estimate of difference in this plate Plate 2: 14 patients, 10 controls Estimate of difference in this plate Plate 3: 12 patients, 12 controls Estimate of difference in this plate We can test if there is a different effect in each plate by investigating the interaction

Example 3 We want to compare the expression of 6 markers We measure the six markers in 5 mice Questions: What types of analysis should we do? How many independent groups do we have? What is the null hypothesis?

Example 4 “In our experiments, we collect 3 measurements. If it is significant, we call it a day. If it is close to significant, we measure 1 more animal” Question: Is this valid? Always more statistically valid if the number is specified BEFORE the experiment

Spreadsheet formation What to collect Everything that might be important for the analysis Plate Batch Technician All potential sources of variability All potential confounders Most accurate version of this you can If it is continuous, collect it as such. Can always dichotomize later

Spreadsheet formation Easiest to move to a statistical package if One row per measurement One column for the outcome, each predictor and potential confounders No open space

Conclusions Sample size for experiment must be considered BEFORE collecting data Can improve power by reducing standard deviation, increasing sample size or increasing difference between groups Important to consider study design as you develop your analysis plan