Weighting sample surveys with Bascula Harm Jan Boonstra Statistics Netherlands.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

1 Session 10 Sampling Weights: an appreciation. 2 To provide you with an overview of the role of sampling weights in estimating population parameters.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 1 An Introduction to Business Statistics.
Multiple Indicator Cluster Surveys Data Processing Workshop
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
Analysis of Complex Survey Data Day 2: Univariate and Bivariate analysis.
Analysis of Complex Survey Data Day 5, Special topics: Developing weights and imputing data.
Complex Surveys Sunday, April 16, 2017.
Dr. Chris L. S. Coryn Spring 2012
Why sample? Diversity in populations Practicality and cost.
Fundamentals of Sampling Method
STAT262: Lecture 5 (Ratio estimation)
Ratio estimation with stratified samples Consider the agriculture stratified sample. In addition to the data of 1992, we also have data of Suppose.
Chapter 7 Selecting Samples
A new sampling method: stratified sampling
Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V
Formalizing the Concepts: Simple Random Sampling.
Sampling ADV 3500 Fall 2007 Chunsik Lee. A sample is some part of a larger body specifically selected to represent the whole. Sampling is the process.
Improving Quality in the Office for National Statistics’ Annual Earnings Statistics Pete Brodie & Kevin Moore UK Office for National Statistics.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
CORE Rome Meeting – 3/4 October WP3: A Process Scenario for Testing the CORE Environment Diego Zardetto (Istat CORE team)
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi, Piero Demetrio Falorsi 
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Sampling Design and Analysis MTH 494 Lecture-30 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi.
National design, fieldwork and data harmonization for Labour Force Survey Irena Svetin Statistical Office of the Republic of Slovenia September 2014.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
The Dutch Virtual Census of 2001 A New Approach by Combining Different Sources Eric Schulte Nordholt ECE Census meetings Geneva, November 2004.
Methodology used for estimating Census tables based on incomplete information Eric Schulte Nordholt Senior researcher and project leader of the Census.
Poverty Estimation in Small Areas Agne Bikauskaite European Conference on Quality in Official Statistics (Q2014) Vienna, 3-5 June 2014.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Dag van de Lokale Rekenkamer “Weighting the consequences” Martijn Souren Consistent LFS weighting Statistics Netherlands LFS workshop, Paris.
A Comparison of Variance Estimates for Schools and Students Using Taylor Series and Replicate Weighting Ellen Scheib, Peter H. Siegel, and James R. Chromy.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Sampling Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole -IDEA Brigitte Helynck, Philippe Malfait,
1 Chapter 2: Sampling and Surveys. 2 Random Sampling Exercise Choose a sample of n=5 from our class, noting the proportion of females in your sample.
Rome, May 2014 Structural variables Weighting the Spanish annual subsample.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Tutorial I: Missing Value Analysis
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Jacco Daalmans Estimation of Dutch census tables.
Chapter 12 Vocabulary. Matching: any attempt to force a sample to resemble specified attributed of the population Population Parameter: a numerically.
Replication methods for analysis of complex survey data in Stata Nicholas Winter Cornell University
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
Guillaume Osier Institut National de la Statistique et des Etudes Economiques (STATEC) Social Statistics Division Construction.
Øyvind Langsrud New Challenges for Statistical Software - The Use of R in Official Statistics, Bucharest, Romania, 7-8 April 1 A variance estimation R.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
1 General Recommendations of the DIME Task Force on Accuracy WG on HBS, Luxembourg, 13 May 2011.
Sampling Chapter 5. Introduction Sampling The process of drawing a number of individual cases from a larger population A way to learn about a larger population.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Copyright ©2011 by Pearson Education, Inc. All rights reserved. Chapter 8: Qualitative and Quantitative Sampling Social Research Methods MAN-10 Erlan Bakiev,
Peter Linde, Interviewservice Statistics Denmark
Regression composite estimation for the Finnish LFS from a practical perspective Riku Salonen.
Chapter 8: Weighting adjustment
Marie Reijo, Population and Social Statistics
The European Statistical Training Programme (ESTP)
A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019.
Presentation transcript:

Weighting sample surveys with Bascula Harm Jan Boonstra Statistics Netherlands

Outline General overview –Calibration/weighting –Estimation and variance estimation Demonstration with example data from the Dutch Labour Force Survey (LFS) Other applications at Statistics Netherlands

Bascula Part of Blaise (current version 4.7), a general system for computer-assisted survey processing developed at Statistics Netherlands History: predecessor LINWEIGHT developed by Jelke Bethlehem in the 1980’s

Main features Calibration: computation of weights using auxiliary information encoded in a weighting model Estimation of (sub)population totals, means, proportions and ratios Variance estimation: Taylor linearisation and balanced repeated replication (BRR) for several sampling designs

Weighting Reduction of MSE –Reduction of (non-resonse) bias –Reduction of sampling variance Calibration to auxiliary totals for consistency with known population totals A single set of weights –Easy tabulation –Mutual consistency between estimated tables

‘Small sample’ problems Full consistency with register data or data from related surveys can usually not be achieved (overfitting). Not all information can be used at the same time. Weighting can be ineffective for (small) domain estimates For sufficiently large samples weighting is an effective and convenient way to improve estimates!

Weighting/calibration methods in Bascula Based on the general regression (GREG) estimator: Poststratification, e.g. Region x AgeClass Ratio estimator, e.g. AgeClass x Income Linear weighting, e.g. Region + AgeClass x Income Based on Iterative Proportional Fitting (IPF): Multiplicative weighting, e.g. Region + AgeClass

Further weighting options Bounding of weights for linear weighting, Huang and Fuller algorithm Consistent linear weighting, e.g. for equal weights within households, Lemaître and Dufour

Estimation of totals Based on the calibration weights: General regression estimator: Also ratios of totals, means, proportions, subclasses

Variance estimation Direct/Taylor method (HT and GREG only) Balanced Repeated Replication (BRR) Sampling designs supported: Stratified two-stage element or cluster design with simple random sampling without replacement in both stages Stratified multistage cluster designs with replacement in the first stage and unequal propabilities

Taylor variance Taylor linearisation: Modified variance estimator (default in Bascula):

BRR variance R balanced half samples (partially balanced if R < #strata) Fay factor Grouped BRR (more than 2 PSUs per stratum allowed) –Artificial strata –Repeated grouping

Input Sample data file: Ascii (fixed column or separated), Blaise, other OleDB compatible Blaise meta information; Blaise Textfile Wizard helps in making data model for Ascii files Tables of population totals Selection of weighting scheme and other parameters that influence the weighting Some additional input required for estimation and variance estimation: target tables and sampling design details

Data integrity checks Consistency of set of population tables Sample counts per cell do not exceed population counts Enough sample observations for each cell in weighting model Inclusion weights/sampling fractions compatible with sampling design specified

Output Set of final and correction weights (written to the sample file and to a separate weights file) Optionally: fitted values Tables of estimates (including estimates of standard errors) in export file; format compatible with population data file

Example: Dutch Labour Force Survey Rotating panel design with five waves; CAPI in first wave, CATI in subsequent waves CATI data first calibrated on the most important target variable (employment in several categories) to initial CAPI panel to reduce panel attrition bias Weighted CATI data is combined with CAPI data and together calibrated to population totals of weighting scheme Region44 x Age4 x Sex2 + Age21 x Sex2 + Age5 x MarStat2 + Sex2 x Age5 x Ethnicity8 + CWI3

Dacseis software evaluation report on Bascula: ‘Bascula is a part of Blaise (an integrated system for survey processing), and it might not be reasonable to purchase Blaise only for the use of Bascula. When having Blaise available, Bascula provides an advanced weighting tool (linear or multiplicative weighting) with abilities for proper variance estimation based on Taylor’s linearisation. When the basic order of the weight and estimate calculations of Bascula is understood, the operations can be carried out quite easily.’

Usage menu-based interactive version from Blaise’s script language Manipula from most modern programming languages, e.g. VB, VBA, Delphi, C++, C# from other software able to act as automation client, e.g. S-Plus

Automation Bascula component (dll) can be used to automate weighting/estimation processes  For recurring weighting/estimation processes, batch processing, integration into production systems  Build custom tools utilizing Bascula’s functionality

Tools that use Bascula component Tool that integrates imputation/outlier detection and handling/weighting for the Production Statistics Tool for analysing results of experiments Tool for repeated weighting Simple simulation tools –Variance estimation (Dacseis) –GREG as input for small area estimators

Repeated weighting Practical sequential approach to make tables of estimates consistent between data sources Two step procedure 1.Start with GREG estimates 2.Adjust these estimates such that they are consistent with register totals (not used in the weighting scheme of GREG) and possibly with previously estimated marginal tables from a combination of surveys.

Software tool Source: Systemdocumentation VRD, V.Snijders Dataset, weighting model, population totals Export Estimates StatBase VRD Meta database StatBase VRD Meta database Rectangular datasets Bascula Estimation 15 Micro database

Use of Bascula at Statistics Netherlands Labour Force Survey Repeated weighting for the Social Statistical Database Survey on Household Incomes Budget Survey Survey on Living Conditions Production Statistics and more

Survey on Household Incomes Calibration on both person totals and household totals, both obtained from municipal registrations Consistent linear weighting: Region29 x Age8 x Sex2 + Region29 x HouseholdType9 x OneHH OneHH is auxiliary variable that sums to one over each household

Production Statistics Continuous auxiliary variables available from Tax Office; categorical variables from Business Register Weighting scheme: Activity x SizeClass x Source x Tax + Activity x SizeClass x Source Variable Source indicates whether tax info can be matched to surveyed businesses

Finally, Priorities for further development have not been very high in the last three years, but that may change Possible extensions: variance structure, Newton-Raphson for exponential method, two-phase regression estimator, synthetic estimation for subpopulations, small area estimation?