Sampling for an Effectiveness Study or “How to reject your most hated hypothesis” Mead Over, Center for Global Development and Sergio Bautista, INSP Male.

Slides:



Advertisements
Similar presentations
Designing an impact evaluation: Randomization, statistical power, and some more fun…
Advertisements

Inferential Statistics
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
1 Hypothesis Testing Chapter 8 of Howell How do we know when we can generalize our research findings? External validity must be good must have statistical.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Sample Size Determination In the Context of Hypothesis Testing
Sampling Designs and Techniques
Today Concepts underlying inferential statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
Agenda: Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration –Who is randomized?
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,
Introduction to Hypothesis Testing for μ Research Problem: Infant Touch Intervention Designed to increase child growth/weight Weight at age 2: Known population:
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
LOT QUALITY ASSURANCE SAMPLING (LQAS). What is LQAS A sampling method that:  Is simple, in-expensive, and probabilistic.  Combines two standard statistical.
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
Chapter 1: Introduction to Statistics
Evidence Based Medicine
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo MIT and Poverty Action Lab.
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 8 Introduction to Hypothesis Testing
STA Statistical Inference
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 If we live with a deep sense of gratitude, our life will be greatly embellished.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
14 Statistical Testing of Differences and Relationships.
Introduction to hypothesis testing Hypothesis testing is about making decisions Is a hypothesis true or false? Ex. Are women paid less, on average, than.
URBDP 591 I Lecture 4: Research Question Objectives How do we define a research question? What is a testable hypothesis? How do we test an hypothesis?
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistical Techniques
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Results: How to interpret and report statistical findings Today’s agenda: 1)A bit about statistical inference, as it is commonly described in scientific.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
What IE is not and is not Male Circumcision Impact Evaluation Meeting Johannesburg, South Africa January 18-23, 2010 Nancy Padian UC Berkeley.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Section 9.1 First Day The idea of a significance test What is a p-value?
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Introduction to Inference
Statistical Core Didactic
Unit 5: Hypothesis Testing
Understanding Results
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
More about Tests and Intervals
Statistical Process Control
Power, Sample Size, & Effect Size:
Significance Tests: The Basics
Sampling and Power Slides by Jishnu Das.
Impact Evaluation Designs for Male Circumcision
Statistical significance using p-value
Sample Sizes for IE Power Calculations.
AP STATISTICS LESSON 10 – 4 (DAY 2)
Nancy Padian UC Berkeley
Presentation transcript:

Sampling for an Effectiveness Study or “How to reject your most hated hypothesis” Mead Over, Center for Global Development and Sergio Bautista, INSP Male Circumcision Evaluation Workshop and Operations Meeting January 18-23, 2010 Johannesburg, South Africa

Outline Why are sampling and statistical power important to policymakers? Sampling and power for efficacy evaluation Sampling and power for effectiveness evaluation The impact of clustering on power and costs Major cost drivers Conclusions

Why are sampling and statistical power important to policymakers? Because they are the tools you can use to reject the claims of skeptics

What claims will skeptics make about MC rollout? They might say: ‘Circumcision has no impact’ ‘Circumcision has too little impact’ ‘Intensive Circumcision Program has no more impact than Routine Circumcision Program’ ‘Circumcision has no benefit for women’ Which of these do you hate the most?

So make sure the researchers design MC rollout so that you will have the evidence to reject your most hated hypothesis when it is false If it turns out to be true, you will get the news before the skeptics and can alter the program accordingly.

Hypotheses to reject Circumcision has no impact Circumcision has too little impact Intensive Circumcision Program has no more impact than Routine Circumcision Program Circumcision has no benefit for women

Efficacy Evaluation

Hypothesis to reject Circumcision has no impact Circumcision has too little impact Intensive Circumcision Program has no more impact than Routine Circumcision Program Circumcision has no benefit for women

Statistical power in the context of efficacy evaluation Objective: To reject the hypothesis of “no impact” in a relatively pure setting, where the intervention has the best chance of succeeding – to show “proof of concept”. In this context, statistical power can be loosely defined as the probability that you find a benefit of male circumcision when there really is a benefit.

Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does True state of the world MC does not change HIV incidence MC reduces HIV incidence Ho = MC does not reduce HIV incidence This column represents MC really working

Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does True state of the world MC does not change HIV incidence MC reduces HIV incidence Estimate MC does not change HIV incidence MC reduces HIV incidence Ho = MC does not reduce HIV incidence This row represents the evaluation finding that MC is working

Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does True state of the world MC does not change HIV incidence MC reduces HIV incidence Estimate MC does not change HIV incidence MC reduces HIV incidence Correct rejection of Ho Ho = MC does not reduce HIV incidence We believe MC works We hope evaluation will confirm that it works If MC works, we want to maximize the chance that evaluation says it works

Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does True state of the world MC does not change HIV incidence MC reduces HIV incidence Estimate MC does not change HIV incidence Correct acceptance of Ho MC reduces HIV incidence Correct rejection of Ho Ho = MC does not reduce HIV incidence But we’re willing to accept bad news, if it’s true

There are two types of error that we want to avoid True state of the world MC does not change HIV incidence MC reduces HIV incidence Estimate MC does not change HIV incidence Correct acceptance of Ho MC reduces HIV incidence Type I Error “False positive” Correct rejection of Ho Ho = MC does not reduce HIV incidence Evaluation says MC works when it doesn’t

There are two types of error that we want to avoid True state of the world MC does not change HIV incidence MC reduces HIV incidence Estimate MC does not change HIV incidence Correct acceptance of Ho Type II Error “False Negative” MC reduces HIV incidence Type I Error “False positive” Correct rejection of Ho Ho = MC does not reduce HIV incidence Evaluation says MC doesn’t work when it really does

Statistical power is the chance that we reject the hated hypothesis when it is false True state of the world MC does not change HIV incidence MC reduces HIV incidence Estimate MC does not change HIV incidence Correct acceptance of Ho Type II Error “False Negative” MC reduces HIV incidence Type I Error “False positive” Correct rejection of Ho Ho = MC does not reduce HIV incidence Power: probability that you reject “no impact” when there really is impact

Confidence, power, and two types of mistakes Confidence describes the test’s ability to minimize type-I errors (false positives) Power describes the test’s ability to minimize type-II errors (false negatives) Convention is to be more concerned with type-I than type-II errors – (ie, more willing to mistakenly say that something didn’t work when it actually did, than to say that something worked when it actually didn’t) We usually want confidence to be 90 – 95%, but will settle for power of 80 – 90%

Power As power increases, the chances of saying “no impact” when in reality there is a positive impact, decline Power analysis can be used to calculate the minimum sample size required to accept the outcome of a statistical test with a particular level of confidence

The problem Sample Size Time of Experiment 1 person, 1 year All men in the country, 20 years Impact? Impact!

The problem Sample Size Time of Experiment

The problem In principle, we would like: – The minimum sample size – The minimum observational time – The maximum power So we are confident enough about the impact we find, at minimum cost

The problem Sample Size Time of Experiment Not enough confidence

The problem Sample Size Time of Experiment Enough confidence

The problem Sample Size Time of Experiment

The problem Sample Size Time of Experiment Minimum Sample Size Time Constraint (usually external)

Things that increase power More person-years – More persons – More years Greater difference between control and treatment – Control group has large HIV incidence – Intervention greatly reduces HIV incidence Good cluster design: Get the most information out of each observed person-year – Increase number of clusters – Minimize intra-cluster correlation

Power is higher with larger incidence in the control group or greater effect

Gaining Precision Person-Years Effectiveness (% reduction In HIV incidence) N in efficacy trial Precision we got Estimated Average

With more person-years, we can narrow in to find the “real” effect Person-Years Effectiveness (% reduction In HIV incidence) REAL N in efficacy trial Could be even this

The “real” effect might be higher than in efficacy trials … Person-Years Effectiveness (% reduction In HIV incidence) REAL N in efficacy trial

… or the “real” effect might be lower Person-Years Effectiveness (% reduction In HIV incidence) REAL N in efficacy trial

1,000 efficacy studies when the control group incidence is 1% Power is 68%

1,000 efficacy studies when the control group incidence is 5% Power is 85%

Sampling for efficacy Population of interest: HIV negative men Relevant characteristics Sample: Inclusion Criteria Respondents Controls Treatment

Effectiveness Evaluation

Hypothesis to reject Circumcision has no impact Circumcision has too little impact Intensive Circumcision Program has no more impact than Routine Circumcision Program Circumcision has no benefit for women

What level of impact do you want to reject in an effectiveness study? For a national rollout, you want the impact to be a lot better than zero! What’s the minimum impact that your constituency will accept? What’s the minimum impact that will make the intervention cost-effective?

The “Male Circumcision Decisionmaker’s Tool” is available online at:

Using the “MC Decisionmaker’s Tool,” let’s compare 60% effect … Suppose the effect is 60 % ???

To a 20 % effect. Suppose the effect is only 20 % ???

Less effective intervention means less reduction in incidence 60% Effectiveness20% Effectiveness

Less effective intervention means less cost-effective 60% Effectiveness 20% Effectiveness At 20 % effectiveness, MC costs about $5,000 per HIV infection averted in example country

Hypothesis to reject in effectiveness evaluation of MC Circumcision has no impact Circumcision has too little impact Intensive Circumcision Program has no more impact than Routine Circumcision Program Circumcision has no benefit for women

Differences between effectiveness and efficacy that affect sampling Main effect on HIV incidence in HIV- men – Null hypothesis: impact > 0 (+) – Effect size because of standard of care (+) Investigate determinants of effectiveness – Supply side (+ / -) – Demand side (+ / -) Investigate impact on secondary outcomes and their determinants (+ / -) Seek “external validity” on effectiveness issues

Sample size must be larger to show that the effect is at least 20 Person-Years Effectiveness (% reduction In HIV incidence) N to be able to reject <20

Sampling for effectiveness Population of interest: HIV negative men Relevant characteristics Sampling frame: All men (+ and -) Sample Respondents Control Treatment

Two levels of effectiveness Person-Years Effectiveness (% reduction In HIV incidence) REAL 2 N to detect difference between 1 and 2 REAL 1

Sampling for effectiveness Population of interest: HIV negative men Relevant characteristics Sampling frame: All men (+ and -) Sample Respondents Control Intensity 1 Intensity 2

Sampling methods for effectiveness evaluation Probability sampling – Simple random: each unit in the sampling frame has the same probability of being selected into the sample – Stratified: first divide the sampling frame into strata (groups, blocks), then do a simple random sample within each strata – Clustered: sample clusters of units. Eg. villages with all the persons that live there One stage: Random sample of villages, then survey all men in selected villages Two stage: Random sample of villages, then random sample of men in selected villages

Sampling (→ Representative data) Representative surveys – Goal: learning about an entire population Ex. LSMS/ national household survey – Sample: representative of the national population Impact evaluation – Goal: measuring changes in key indicators for the target population that are caused by an intervention – In practice: measuring the difference in indicators between treatment and control groups – We sample strategically in order to have a representative sample in the treatment and control groups – Which is not necessarily the same as a representative sample of the national population

Cluster Sampling Design

Cluster Sampling In some situations, individual random samples are not feasible – When interventions are delivered at the facility/community level – When constructing a frame of the observation units may be difficult, expensive, or even impossible Customers of a store Birds in a region – When is of interest to identify community level impact – When budget constraints don’t allow it M.K. Campbell et al. Computers in Biology and Medicine 34 (2004) 113 – 125

Clustering and sample size Clustering reduces efficiency of the design – Standard sample size calculation for individual- based studies only accommodate for variation between individuals – In cluster studies, there are two components of variation Variation among individuals within clusters Variation in outcome between clusters

Clustering and sample size Individual-based studies assume independence of outcomes among individuals In cluster randomization: – Individuals within a cluster are more likely to be similar Measure of this intracluster dependence among individuals is ICC – Based in within-cluster variance High when individuals in cluster are more “similar” Not taking ICC into account may lead to under- powered study (too small sample)

Taking ICC into account In a cluster randomized design, in order to achieve the equivalent power of an individual random study, the sample size must be inflated by a factor called a “design effect”: Deff = 1 + (ñ – 1) ρ to consider cluster effect ñ = average cluster size ρ = ICC Assuming clusters of similar size

How big is the impact of cluster design on sample size Person-Years Effectiveness (% reduction In HIV incidence) 20 At a given number of person-years With 6 clusters, 20 cannot be rejected With12 clusters, 20 is rejected

When 19,950 individuals are in 15 clusters Power is 60 %

When 19,950 individuals are in 150 clusters Power is 97 %

Increasing number of clusters vs increasing number of individuals per cluster Increasing the number of clusters has a much stronger effect on power and confidence – Intuitively, the sample is the number of units (clusters) at the level where the random assignment takes place. It is not the same as the number of people surveyed Challenge is to engineer the logistics to maximize the number of clusters, given budget

How big is the impact of cluster design on sample size N per cluster Power 20 clusters 50 clusters 100 clusters

Major cost drivers

Things that affect costs in an evaluation of MC effectiveness Including HIV positive men Including women Prevalence of HIV Length of questionnaire – To measure more outcomes – To measure implementation of intervention and costs For cost-effectiveness For control quality and other characteristics of the intervention

Sampling for effectiveness Population of interest: HIV negative men Relevant characteristics Sampling frame: All men (+ and -) Sample Control Intensity 1 Intensity 2

Some Scenarios 150 clusters, 100 men per cluster Including women  double number of HIV tests Low and High prevalence  additional men to be surveyed High, medium, low cost – Dispersion of clusters  distance among them – Length of questionnaire  time in fieldwork, data collection staff

Conclusions Philosophy of sample design is different for efficacy and effectiveness studies – Efficacy: narrow & deep – Effectiveness: broad & shallow Many of the special requirements of effectiveness sampling will increase sample size Clustering reduces data collection costs, but at a sacrifice of power Survey costs also affected by: – Number of indicators collected – Number of non-index cases interviewed Most cost-effective way to reject your “hated hypothesis” is through randomized, efficiently powered, sampling