Download presentation
Presentation is loading. Please wait.
Published byPierce Harper Modified over 6 years ago
1
Estimation methods for the integration of administrative sources
Specific contract n° ESTAT N° under framework contract Lot 1 n° Estimation methods for the integration of administrative sources Responsible person at Commission: Fabrice Gras Eurostat – Unit B1 Authors: Nicoletta Cibella, Ton de Waal, Marco Di Zio, Mauro Scanu, Sander Scholtus, Tiziana Tuoto, Arnout van Delden, Li-Chun Zhang Project manager: Laurent Jacquet Sogeti: Sanja Vujackov Working group on methodology Luxembourg, 5 April 2017
2
Overview of the project
Project “Estimation methods for the integration of administrative sources” Duration 1/06/ /03/2017 Part of ESS.VIP Admin Project (Work package 2 “Statistical methods”) Sogeti is main contractor Experts from ISTAT, Statistics Netherlands and external advisor of University of Southampton
3
Aim of the project Identification and presentation of existing statistical methods as well as the associated contextual framework in order to enable and ease the integration of administrative sources into a statistical production system
4
The main objectives Identification of the main possible use of administrative sources Identification of the different steps of the statistical production system where methods can be used for integrating administrative sources A literature review presenting actual examples in the NSIs Drafting of technical summary sheet for each identified statistical method
5
Overview of the project: 7 tasks
Task 1. Specify usages of admin data Task 2. Identification and description of possible statistical tasks where methods can be envisaged in order to integrate administrative sources Task 3. Comprehensive identification and enumeration of possible estimation methods that could be used for cases identified in Task 2
6
Overview of the project: 7 tasks
Task 4. Literature review presenting examples in NSIs for the type of use of administrative sources and for steps that have been previously identified Task 5a. Provision of template for review of estimation methods Task 5b. Methods description Task 6 & 7. Final presenation and report
7
Task 1: Statistical usages
Direct 1. Direct Tabulation 2. Substitution and supplementation Indirect 1. Creation and maintenance of registers 2. Editing and imputation 3. Indirect estimation 4. Data validation/ confrontation
8
Task 1: Direct Usage Direct Tabulation: Admin data used to produce statistics without resorting to any statistical data. Substitution and supplementation for direct collection (replacement for data collection): Admin data directly used as input observations for production of statistics but are not sufficient for achieving all objectives Split-population approach, Split data approach
9
Task1: Indirect usage Creation and maintenance of registers and survey frames Identification of frame units and their connections to population elements Identification of classification and auxiliary variables Editing and imputation Construction of edit rules Construction of models to find errors in data Auxiliary data to construct imputation models
10
Task 1. Indirect usage Indirect estimation
Creation of population benchmarks for calibration Use administrative data in a predictive setting Estimation where administrative and statistical data are used on an equal footing Data validation/confrontation Validation of survey estimates and/or other administrative data sources Address quality issues
11
Task 2: Statistical tasks
Statistical tasks for using integrated admin data Data editing and imputation Creation of joint statistical micro data Alignment of statistical data Multisource estimation at aggregated level
12
Data editing and imputation
Resolving micro-data inconsistencies and imputing missing data
13
Creation of joint statistical micro data
Data linkage: Identification of the set of unique units residing in multiple datasets Statistical matching: Inference of joint distribution based on marginal observations Hashing techniques
14
Alignment of statistical data
Alignment of units: Harmonisation of relevant units, creation of target statistical units Alignment of measurements: Harmonisation of relevant variables, derivation of target statistical variable
15
Multisource estimation at aggregated level
Population size estimation: multiple lists with imperfect coverage of target population Univalent estimation: numerical and statistical consistent estimation of common variables Coherent estimation: aggregates that relate to each other in terms of accounting equations
16
Relationships among usages and statistical tasks
Su1 one admin datasource, su2 >1 admin data sources,…
17
Task5a: Template for methods description
Template: presentation of the method, the contextual framework for using it as well as the conditions of applications, the pros and cons, an example of use, and a list of related existing software
18
Task5: Data editing Methods
Deductive editing Selective editing Automatic editing Interactive editing Macro-editing Deductive Imputation Model-Based Imputation Donor Imputation Imputation for Longitudinal Data Imputation under Edit Constraints Outlier detection Reconciling Conflicting Micro-data: Prorating, Minimum Adjustment Methods, Generalised Ratio Adjustments
19
Task 5. Creation of joint statistical micro data methods
Matching of Object Characteristics (Unweighted & Weighted Matching) Probabilistic Record Linkage Data Fusion at Micro Level (relevant choice of Statistical Matching Methods) Data hashing & anonymisation techniques
20
Task 5. Alignment of statistical data methods
Alignment of units No general methods are available Alignment of measurements Harmonisation based on latent variable models
21
Task 5. Multisource estimation at aggregated level methods
Generalised Regression Estimator EBLUP Area Level for Small Area Estimation (Fay- Herriot) Method Small Area Estimation Methods for Time Series Data
22
Multisource estimation at aggregated level methods
IVa. Population size estimation: multiple lists with imperfect coverage of target population. Multiple-list models for population size estimation IVb. Univalent estimation: numerical and statistical consistent estimation of common variables. Repeated weighting Mass imputation Repeated imputation Macro-integration
23
Mutlisource estimation at macro level methods
Univalent estimates at longitudinal level State space models (estimation of unobserved variable and possible application to alignment of stat data) Temporal Disaggregation/benchmarking methods: Denton's Method & Chow-Lin Method IVc. Coherent estimation: aggregates that relate to each other in terms of accounting equations Macro-integration
24
Task 4: NSIs actual examples
Statistics on road accidents -Probabilistic Record Linkage The creation of a Social Policy Simulation Database (Stat Canada) - Statistical Matching Modelling measurement error in admin and survey variables on turnover Estimating classification errors under edit-restrictions in combined register-survey data Variable harmonisation based on latent variable models: Dutch Population and Housing Census Statistical methods for achieving univalent estimates for cross- sectional data Macro-Integration (Repeated weighting)
25
Conclusions Documents provide an overview and a quick access to methods used in the production of statistics involving integrated admin data Several methods with different level of maturity Although research is -for all of them- on-going, latent class models for harmonisation are the one requiring more studies. The task of unit harmonisation deserves research in terms of methods and impact of errors
26
Thank you for your attention
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.