Methods for Data-Integration

Slides:



Advertisements
Similar presentations
Paul Smith Office for National Statistics
Advertisements

Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
Results and next steps from the ESSnet Admin Data Alison Pritchard Business Outputs & Developments, Office for National Statistics, UK 4 December 2012.
UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.
1 Editing Administrative Data and Combined Data Sources Introduction.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
Eurostat Statistical Data Editing and Imputation.
Dutch Virtual Census Presentation at the International Seminar on Population and Housing Censuses; Beyond the 2010 Round November, 2012 Egon Gerards,
1 The system aspect of statistical quality Q2014 european conference on quality in official statistics Special session: Consistency of Concepts and Applied.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
Challenges in Collecting Police-Reported Crime Data Colin Babyak Household Survey Methods Division ICES III - Montreal – June 20, 2007.
The Dutch Virtual Census of 2001 A New Approach by Combining Different Sources Eric Schulte Nordholt ECE Census meetings Geneva, November 2004.
Cristina Casciano, Viviana De Giorgi, Filippo Oropallo Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices.
Methodology used for estimating Census tables based on incomplete information Eric Schulte Nordholt Senior researcher and project leader of the Census.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
TRADE MICRODATA: OECD PERSPECTIVES Dominique Guellec Head, Trade and Business Statistics Statistics Directorate.
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
DDI Methodology. Aims Purpose: To describe the study design specifications underlying the conduct of a research/business project. Possible coverage areas:
Experience and response in developing countries: the twinning project with the Tunisian National Statistical Institute Monica Consalvi ISTAT, Division.
Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.
QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
The business process models and quality issues at the Hungarian Central Statistical Office (HCSO) Mr. Csaba Ábry, HCSO, Methodological Department Geneva,
Slide 1DSS Board 1-2 December 2014 Eurostat ESS.VIP ADMIN project on use of administrative data sources Agenda point 7 DSS Board 1-2 December 2014.
Chapter Two Copyright © 2006 McGraw-Hill/Irwin The Marketing Research Process.
Workshop on Implementing Standards for Statistical Modernisation 2016 Geneva, September 2016 Complementing the GSBPM with Quality Indicators for.
KOMUSO - ESSnet on quality of multisource statistics
Implementation of Quality indicators for administrative data
Development of Strategies for Census Data Dissemination
Dominik Rozkrut Central Statistical Office of Poland
Integrating administrative data – the 2021 Census and beyond
THE BNSI EXPERIENCE IN METADATA COLLECTION AND ORGANIZATION
Rudi Seljak, Aleš Krajnc
Session D12: Multisource statistics New sources: new modelling approaches Author: Gras Fabrice, Eurostat, unit B1, Methodology and corporate architecture.
"Development of Strategies for Census Data Dissemination".
Estimation methods for the integration of administrative sources
Estimation methods for the integration of administrative sources
Guidelines on the use of estimation methods for the integration of administrative sources DIME/ITDG meeting 2018/02/22.
MEASUREMENT OF THE QUALITY OF STATISTICS
Survey phases, survey errors and quality control system
Guidelines on the use of SBR for business demography and entrepreneurship statistics Tammy Hoogsteen (Statistics Canada) and Norbert Rainer (co-chair.
KOMUSO Information for the Big Data society in official statistics
Survey phases, survey errors and quality control system
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Towards a Fully Adjusted Census Database for the 2011 Census
Working on coherence and consistency of an output database
Guidelines on the use of estimation methods for the integration of administrative sources WG Methodology 2018/05/03.
Italian situation in the following areas:
Item 3 of the draft agenda ESS.VIP ADMIN: progress report
SDMX in the S-DWH Layered Architecture
Passenger Mobility Statistics 10 April 2014
Measurement errors Marina Signore
3.4 Modernisation of Social Statistics
ESS.VIP ADMIN EssNet on Quality in Multi-source Statistics, progress report 19TH WORKING GROUP ON QUALITY IN STATISTICS, 6 December 2016 Fabrice Gras,
LAMAS Working Group 6-7 December 2017
Mapping Data Production Processes to the GSBPM
The role of metadata in census data dissemination
A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019.
Quality of Multisource Statistics
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
Technical Coordination Group Budapest, Hungary
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Methods for Data-Integration Ton de Waal 14 March, 2017

Overview of project Project “Estimation methods for the integration of administrative sources” Aim of the project: identifying and presenting statistical methods for the integration of administrative data into a statistical production system April 2016 – (end of) March 2017 Part of ESS.VIP Admin Project (Work package 2 “Statistical methods”, Lot 1 “Methodological support”) Sogeti is main contractor Experts from ISTAT, Statistics Netherlands and University of Southampton

Overview of project Task 1: Specify usages of admin data Task 2: Identification and description of statistical tasks where using estimation methods can be envisaged in order to integrate administrative sources Task 3: Comprehensive identification and enumeration of possible estimation methods that could be used for cases identified in Task 2 Task 4: Literature review presenting actual examples for types of usage and tasks identified in Task 2 or 3 Task 5: Methods description Task 6 & 7: Final presentation and report

Task 1: Usages of admin data: Direct Direct Tabulation: Admin data used to produce statistics without resorting to any statistical data. Exploiting only one administrative data source Exploiting multiple administrative data sources Substitution and supplementation for direct collection: Admin data directly used as input but are not sufficient for achieving all objectives Split-population approach Population is split into two or more parts. Admin data used for units where these data are of sufficient quality, and statistical sources used for the remainder of the units Split data approach Administrative data used to provide some of the variables for all population units

Task 1: Usages of admin data: Indirect Creation and maintenance of registers and survey frames Identification of frame units and their connections to population elements Identification of classification and auxiliary variables Editing and imputation Construction of edit rules Construction of models to find errors in data Auxiliary data to construct imputation models Indirect estimation Creation of population benchmarks to be used for calibration Use administrative data in a predictive setting Estimation where administrative and statistical data are used on an equal footing Data validation/confrontation Validation of survey estimates and/or other administrative data sources Address quality issues

Task 2: Possible statistical tasks We have matched statistical tasks to usages by means of GSBPM Statistical tasks for using integrated administrative data I. Data editing and imputation II. Creation of joint statistical micro data a) Data linkage: Identification of the set of unique units residing in multiple datasets b) Statistical matching: Inference of joint distribution based on marginal observations III. Alignment of statistical data a) Alignment of units: Harmonization of relevant units, creation of target statistical units b) Alignment of measurements: Harmonization of variables, derivation of target variables IV. Multisource estimation at aggregated level a) Population size estimation: multiple lists with imperfect coverage of target population b) Univalent estimation: numerically consistent estimation of common variables c) Coherent estimation: aggregates that relate to each other

Task 3: Possible estimation methods I. Data editing and imputation Most methods usually applied for surveys can also be applied for Admin data There are editing methods developed specifically for data obtained through an integration process (micro-integration) II. Creation of joint statistical microdata Identification of the set of unique units residing in multiple datasets and probabilistic record linkage Inference of joint distribution based on marginal observations (statistical matching)

Task 3: Possible estimation methods III. Alignment of statistical data a) Alignment of units b) Alignment of measurements: recently, latent variable models have been proposed IV. Multisource estimation at aggregated level a) Population size estimation: multiple lists with imperfect coverage of population b) Univalent estimation: numerically consistent estimation of common variables Obtaining univalent estimates at cross-sectional level Obtaining univalent estimates at longitudinal level c) Coherent estimation: aggregates that relate to each other in terms of accounting equations

Task 4: Examples at NSIs We have focused on those examples that offer most interesting information We have given actual examples in NSIs for Direct tabulation Split data approach Indirect estimation Data validation

Task 4: Examples at NSIs Direct tabulation: Use of probabilistic record linkage for statistics on victims and injured people of road accidents (Istat) Spit data approach: Creation of a social policy simulation database by means of statistical matching (Statistics Canada) Indirect estimation: Use of repeated weighting and macro-integration for the construction of the Dutch Population and Housing Census (Statistics Netherlands) Data validation: Estimation of classification errors in admin and survey variables on home ownership (Statistics Netherlands)

Task 5: Methods description We will describe Editing and imputation methods, including micro-integration Methods for creation of joint microdata, including probabilistic linkage and statistical matching Methods for alignment of statistical data, including latent variable models Methods for multi source estimation at aggregated level, including multiple lists with imperfect coverage of population, and methods for obtaining univalent estimates (at cross-sectional level and at longitudinal level)

Thank you Thank you for your attention Any questions or comments?