The Importance of Sample Size and Its Varying Effects on Precision in Large-Scale Surveys Dipankar Roy, PhD Bangladesh Bureau of Statistics Presented at the International Seminar at Rajshahi University October 2012 Rajshahi, Bangladesh
Sample size determination – the act of choosing the number of observations to be surveyed – The way should be statistically sound and formulae-oriented – Samples should be selected with selection probability (base weight) – Samples should be allocated scientifically (need based) 2
Sampling The process of selecting units Study them Generalize the result (estimate/statistic) Back to population (parameter) Infer about population through sample 3
A major goal of data analysis – sample mean (or proportion) to estimate the corresponding parameters in the respective population – Statistical inference about a population NOT for sample 4
Two approaches – Precision-based approach – Power-based approach 5
How large a sample is needed to – enable statistical judgments that are accurate and reliable? AND – to attain a desirable level of precision? 6
Sample size should not be determined – arbitrarily – without solving the equation Required/optimum samples can ensure accurate, precise and reliable estimates – Too low samples lack the precision – Unnecessary larger samples yield minimal gain 7
Sampling Error Standard Error (SE) Margin of Error (MOE) Confidence Interval (CI) 8
MOE Indicates that a data user can be certain that the estimate (statistic) and the population value (parameter) differ by no more than the value of the MOE 9
There is some margin of error d in the estimated proportion p in relation to the true proportion P There is some risk α that the actual error is larger than d Pr(|p-P|>d)= α OR Pr(|p-P|<=d)= 1-α 10
n=[z^2*P*(1-P)]/d^2 – the level of precision, – the level of confidence or risk, and – the degree of variability in the attributes 11
Sample of size n is required to – estimate an event of p – within d of its true value – with 100(1-α)% confidence level 12
Formulae for MICS Sample Size 13
HIES Coefficient of variation should have been used in determining sample size for a study variable like income Household income, by its nature, seems to be heterogeneous within and/or between localities 14
Template Input Values Value Predicted value of indicator (in target/base population)r0.26 Design Effect (DEFF)f1.4 Margin of error at 95% Confidencee0.09 Proportion of base population in total populationp0.04 Average Household Sizes4.5 Adjustment for Non-Responsek1.05 Output Value Number of Households (Sample Size)n776 15
Sample size vs. coverage rate 16
Sample size vs. margin of error 17
n vs. N 18
Interval width is equal to twice the margin of error and it is directly proportional to If the sample size is increased by a factor of 4, the interval width will be reduced by half High levels of precision require larger sample sizes Higher confidence levels require larger sample sizes 19
Sample size depends on domain-level estimation Sample size does not necessarily depend on how large the population In a certain stage there is no necessity for increasing the sample size for population becoming any larger For any complex design, sample size should be inflated by the design effect. 20