Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006
WBISARHDN 2 Sample Selection in Evaluation Population based representative surveys: Sample representative of whole population Good for learning about the population Not always most efficient for impact evaluation Sampling for Impact evaluation Balance between treatment and control groups Power statistical inference for groups of interest Concentrate sample strategically Survey budget as major consideration In practice, sample size is many times set by budget Concentrate sample on key populations to increase power
WBISARHDN 3 Purposive Sampling: Risk: We will systematically bias our sample, so results don’t generalize to the rest of the population or other sub-groups Trade off between power within population of interest and population representation Results are internally valid, but not generalizable.
WBISARHDN 4 Survey - Sampling Population: all cases of interest Sampling frame: list of all potential cases Sample: cases selected for analysis Sampling method: technique for selecting cases from sampling frame Sampling fraction: proportion of cases from population selected for sample (n/N)
WBISARHDN 5 Sampling Frame Simple Sampling Stratified Sampling Cluster Sampling
WBISARHDN 6 Sampling Methods Random Sampling Systematic Sampling
WBISARHDN 7 The Design Effect in Clustering Necessary to take into account when samples are clustered
WBISARHDN 8 Correlación intracluster ( ) DEFF depends on the size of the cluster and the intra-cluster correlation is the degree of homogeneity in the cluster, and is called the “intra-cluster” correlation
WBISARHDN 9 Tamaño de muestra The necessary sample size will increase in clustered samples But, you have to have some idea of the intra-cluster coefficient to get at this number!
WBISARHDN 10 Power Calculations Test significance of a null hypothesis. For example, whether two means are different.
WBISARHDN 11 Type I and Type II errors Type II error = Significance Level Power = 1- Type I error =
WBISARHDN 12 Type I and type II errors Type I error: Reject the null hypothesis when it is true Significance level probability of rejecting the null when it is true (Type I error) Type II error: Accept (fail to reject) the null hypothesis when it is false Power probability of rejecting the null when an alternative null is true (1-probability of Type II) We want to minimize both types of errors Increase sample size
WBISARHDN 13 Type I and Type II errors Type I error = Probability that you conclude the intervention had an effect if actually it did not Type II error = Probability you conclude that intervention had no effect when it actually did Power = 1 - Probabilty of correctly conluding that the intervention had an effect Fix the type I error and use sample size to increase the power
WBISARHDN 14 Power Calculations for sample size Fix the confidence level and as you increase the size of the sample: Rejection region gets larger The power increases n↑n↑
WBISARHDN 15 What we have so far Clustering increases the required sample size As does the need for statistical testing: if we know The estimated size of the treatment The variance of the distribution We can start making power calculations for evaluations
WBISARHDN 16 In Practice Many, many analytical statistical results May be simpler to use simulations in Stata or similar package Easily accounts for complicated designs
WBISARHDN 17 In Practice: An Example Does Information improve child performance in schools? (Pakistan) Randomized Design Interested in villages where there are private schooling options What Villages should we work in? Stratification: North, Central, South Random Sample: Villages chosen randomly from list of all villages with a private school
WBISARHDN 18 In Practice: An Example How many villages should we choose? Depends on: How many children in every village How big do we think the treatment effect will be What the overall variability in the outcome variable will be
WBISARHDN 19 In Practice: An Example Simulation Tables Table 1 assumes very high variability in test- scores. X,Y: X is for intervention with small effect size; Y for larger effect size N: Significant < 1% of simulations S: Significant < 10% of simulations A: Significant > 99% of simulations
WBISARHDN 20 In Practice: An Example Simulation Tables Table 1 assumes lower variability in test- scores. X,Y: X is for intervention with small effect size; Y for larger effect size N: Significant < 1% of simulations S: Significant < 10% of simulations A: Significant > 99% of simulations
WBISARHDN 21 A smorgasbord of topics Probability proportional to size sampling to pick clusters Using weights Estimating means vs. Estimating regressions Increasing efficiency using matched randomizations Using evaluations to say something about baseline populations Age targeted programs
WBISARHDN 22 When do we really worry about this? IF Very small samples at unit of treatment! Suppose treatment in 20 schools and control in 20 schools But there are 400 children in every school This is still a small sample IF Interested in sub-groups (blocks) Sample size requirements increase exponentially IF Using Regression Discontinuity Designs