Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimation methods for the integration of administrative sources

Similar presentations


Presentation on theme: "Estimation methods for the integration of administrative sources"— Presentation transcript:

1 Estimation methods for the integration of administrative sources
Specific contract n° ESTAT N° under framework contract Lot 1 n° Estimation methods for the integration of administrative sources Responsible person at Commission: Fabrice Gras Eurostat – Unit B1 Authors: Nicoletta Cibella, Ton de Waal, Marco Di Zio, Mauro Scanu, Sander Scholtus, Tiziana Tuoto, Arnout van Delden, Li-Chun Zhang Project manager: Laurent Jacquet Sogeti: Sanja Vujackov Working group on methodology Luxembourg, 5 April 2017

2 Overview of the project
Project “Estimation methods for the integration of administrative sources” Duration 1/06/ /03/2017 Part of ESS.VIP Admin Project (Work package 2 “Statistical methods”) Sogeti is main contractor Experts from ISTAT, Statistics Netherlands and external advisor of University of Southampton

3 Aim of the project Identification and presentation of existing statistical methods as well as the associated contextual framework in order to enable and ease the integration of administrative sources into a statistical production system

4 The main objectives Identification of the main possible use of administrative sources Identification of the different steps of the statistical production system where methods can be used for integrating administrative sources A literature review presenting actual examples in the NSIs Drafting of technical summary sheet for each identified statistical method

5 Overview of the project: 7 tasks
Task 1. Specify usages of admin data Task 2. Identification and description of possible statistical tasks where methods can be envisaged in order to integrate administrative sources Task 3. Comprehensive identification and enumeration of possible estimation methods that could be used for cases identified in Task 2

6 Overview of the project: 7 tasks
Task 4. Literature review presenting examples in NSIs for the type of use of administrative sources and for steps that have been previously identified Task 5a. Provision of template for review of estimation methods Task 5b. Methods description Task 6 & 7. Final presenation and report

7 Task 1: Statistical usages
Direct 1. Direct Tabulation 2. Substitution and supplementation Indirect 1. Creation and maintenance of registers 2. Editing and imputation 3. Indirect estimation 4. Data validation/ confrontation

8 Task 1: Direct Usage Direct Tabulation: Admin data used to produce statistics without resorting to any statistical data. Substitution and supplementation for direct collection (replacement for data collection): Admin data directly used as input observations for production of statistics but are not sufficient for achieving all objectives Split-population approach, Split data approach

9 Task1: Indirect usage Creation and maintenance of registers and survey frames Identification of frame units and their connections to population elements Identification of classification and auxiliary variables Editing and imputation Construction of edit rules Construction of models to find errors in data Auxiliary data to construct imputation models

10 Task 1. Indirect usage Indirect estimation
Creation of population benchmarks for calibration Use administrative data in a predictive setting Estimation where administrative and statistical data are used on an equal footing Data validation/confrontation Validation of survey estimates and/or other administrative data sources Address quality issues

11 Task 2: Statistical tasks
Statistical tasks for using integrated admin data Data editing and imputation Creation of joint statistical micro data Alignment of statistical data Multisource estimation at aggregated level

12 Data editing and imputation
Resolving micro-data inconsistencies and imputing missing data

13 Creation of joint statistical micro data
Data linkage: Identification of the set of unique units residing in multiple datasets Statistical matching: Inference of joint distribution based on marginal observations Hashing techniques

14 Alignment of statistical data
Alignment of units: Harmonisation of relevant units, creation of target statistical units Alignment of measurements: Harmonisation of relevant variables, derivation of target statistical variable

15 Multisource estimation at aggregated level
Population size estimation: multiple lists with imperfect coverage of target population Univalent estimation: numerical and statistical consistent estimation of common variables Coherent estimation: aggregates that relate to each other in terms of accounting equations

16 Relationships among usages and statistical tasks
Su1 one admin datasource, su2 >1 admin data sources,…

17 Task5a: Template for methods description
Template: presentation of the method, the contextual framework for using it as well as the conditions of applications, the pros and cons, an example of use, and a list of related existing software

18 Task5: Data editing Methods
Deductive editing Selective editing Automatic editing Interactive editing Macro-editing Deductive Imputation Model-Based Imputation Donor Imputation Imputation for Longitudinal Data Imputation under Edit Constraints Outlier detection  Reconciling Conflicting Micro-data: Prorating, Minimum Adjustment Methods, Generalised Ratio Adjustments

19 Task 5. Creation of joint statistical micro data methods
Matching of Object Characteristics (Unweighted & Weighted Matching) Probabilistic Record Linkage Data Fusion at Micro Level (relevant choice of Statistical Matching Methods) Data hashing & anonymisation techniques

20 Task 5. Alignment of statistical data methods
Alignment of units No general methods are available Alignment of measurements Harmonisation based on latent variable models

21 Task 5. Multisource estimation at aggregated level methods
Generalised Regression Estimator EBLUP Area Level for Small Area Estimation (Fay- Herriot) Method Small Area Estimation Methods for Time Series Data

22 Multisource estimation at aggregated level methods
IVa. Population size estimation: multiple lists with imperfect coverage of target population. Multiple-list models for population size estimation IVb. Univalent estimation: numerical and statistical consistent estimation of common variables. Repeated weighting Mass imputation Repeated imputation Macro-integration

23 Mutlisource estimation at macro level methods
Univalent estimates at longitudinal level State space models (estimation of unobserved variable and possible application to alignment of stat data) Temporal Disaggregation/benchmarking methods: Denton's Method & Chow-Lin Method IVc. Coherent estimation: aggregates that relate to each other in terms of accounting equations  Macro-integration

24 Task 4: NSIs actual examples
Statistics on road accidents -Probabilistic Record Linkage The creation of a Social Policy Simulation Database (Stat Canada) - Statistical Matching Modelling measurement error in admin and survey variables on turnover Estimating classification errors under edit-restrictions in combined register-survey data Variable harmonisation based on latent variable models: Dutch Population and Housing Census Statistical methods for achieving univalent estimates for cross- sectional data Macro-Integration (Repeated weighting)

25 Conclusions Documents provide an overview and a quick access to methods used in the production of statistics involving integrated admin data Several methods with different level of maturity Although research is -for all of them- on-going, latent class models for harmonisation are the one requiring more studies. The task of unit harmonisation deserves research in terms of methods and impact of errors

26 Thank you for your attention


Download ppt "Estimation methods for the integration of administrative sources"

Similar presentations


Ads by Google