1 A prediction approach to representative sampling Ib Thomsen & Li-Chun Zhang Statistics Norway
The birth of representative method Kruskal and Mosteller (1979a,b,c): origins and development of the concept representative sampling N. Kiær’s representative method (ISI meeting, 1895, Bern) –A three-stage design, with 1890 census as frame: 1st: 128 counties and 23 towns throughout the country 2nd: cohorts of males of age 17, 22, 27, 32, etc. 3rd: persons with surname initial A, B, C, L, M, N –Comparison of sample marginal averages with census averages ISI committee in 1924 & report at the following meeting: “I think I may venture to say that nowadays there is hardly one statistician, who in principle will contest the legitimacy of the representative method”. (Jensen) Bowley (1926) member of the committee.
Rise and fall of the representative method: Balance vs. randomization Kiær did not take a probabilistic point of view. –Representative sample surveys instead of representative sampling –Idea of variability of population over time (quote) –Miniature population multivariate simple balance Design-based approach: –Neyman (1934): representative sampling = randomization (quote) –Subsequent development: Hansen & co., Deming, Kish, Cochran, Mahalanobis, etc. –Godambe (1955): no minimum variance linear estimator –Representative sampling vs. efficient estimation Prediction approach: –Royall (1970): purposive sample –Royall and Eberhardt (1975): Simple balance for bias protection (quote) –Representative sample vs. efficiency
A definition of representative sampling from a prediction point of view Prediction of each individual in the population Representative sampling connected to individual mean squared error of prediction (IMSEP), i.e. Conditional IMSEP: zero inside the sample, positive outside Use randomization design to control unconditional IMSEP, i.e. expected amount of information about each population unit. Control of individual prediction as a design criterion, i.e.
An example under ratio model
Motivating familiar but seemingly unconnected sampling techniques from a unified point of view Constant mean and variance throughout the population: equal prediction epsem/SRS Constant mean and variance in subpopulation groups: stratified equal prediction stratified epsem/SRS; relative equal prediction w.r.t. individual variance stratified epsem/SRS with proportional allocation Business survey: –Division of take-all, take-some and take-none units –Stratified SRS with progressive allocation Two-stage sampling: –PPS-SRS and SRS-SRS are equal prediction designs, respectively, provided zero or unity intra-cluster correlation –Stratified SRS-SRS with progressive first-stage allocation
Three principle advantages of CIP as a design-criterion Model-based inference as a mode of inference –Prediction of individual impossible under design-based perspective Randomization designs motivated by prediction –Simple random sampling (SRS) unmotivated for efficiency –SRS yields non-informative sampling, but so can any randomization. –SRS targets at simple balance, but it is not effective for that. Combination with optimality/efficiency for total (OPT) –Need for population totals –Need for socio-economic micro-data –Need for statistics at more detailed levels