Supporting the use of administrative data in official statistics.

Slides:



Advertisements
Similar presentations
Corporate Administration Management System CAMS-ITech: Vertical CRM for the Administration/Finance Area CAMS-iTech™ is the technological answer developed.
Advertisements

C6 Databases.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
The use and convergence of quality assurance frameworks for international and supranational organisations compiling statistics The European Conference.
Institutional arrangements and legal framework for energy statistics United Nations Statistics Division International Workshop on Energy Statistics
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
European Conference on Quality in Official Statistics (Q2010) 4-6 May 2010, Helsinki, Finland Brancato G., Carbini R., Murgia M., Simeoni G. Istat, Italian.
REFERENCE METADATA FOR DATA TEMPLATE Ales Capek EUROSTAT.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Assessing the Capacity of Statistical Systems Development Data Group.
Quality framework for the evaluation of administrative data (to be used for statistics) Piet J.H. Daas, Judit Arends-Tóth, Barry Schouten and Léander Kuivenhoven.
Statistics Portugal/ Metadata Unit Monica Isfan « Joint UNECE/ EUROSTAT/ OECD Work Session on Statistical Metadata.
Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
Guidelines on Statistical Business Registers Draft Chapter 8: Quality of SBR Caterina Viviano, Monica Consalvi ISTAT Meeting of the Group of Experts on.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
1 For a Population Statistical Register Characteristics and Potentials for the Official Statistics Central department for administrative data and archives.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.
14-Sept-11 The EGR version 2: an improved way of sharing information on multinational enterprise groups.
13 November, 2014 Seminar on Quality Reports QUALITY REPORTS EXPERIENCE OF STATISTICS LITHUANIA Nadiežda Alejeva Head, Price Statistics.
COBIT. The Control Objectives for Information and related Technology (COBIT) A set of best practices (framework) for information technology (IT) management.
Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.
Statistical Business Register Enterprise Groups in Latvia Sarmite Prole Head of Business Register Section Business Statics Department Central Statistical.
Session topic (i) – Editing Administrative and Census data Discussants Orietta Luzi and Heather Wagstaff UNECE Worksession on Statistical Data Editing.
Quality declarations Study visit from Ukraine 19. March 2015
Efficiency and generalization as drivers
Short Training Course on Agricultural Cost of Production Statistics
Implementation of Quality indicators for administrative data
Development of Strategies for Census Data Dissemination
UNECE Seminar on New Frontiers for Statistical Data Collection, Geneva
Towards more flexibility in responding to users’ needs
Session 2: Institutional arrangements for energy statistics
Reengineering of Administrative Data Acquisition
Quality assurance in official statistics
Civil Registration Process: Place, Time, Cost, Late Registration
The Generic Statistical Information Model (GSIM) and the Sistema Unitario dei Metadati (SUM): state of application of the standard Cecilia Casagrande –
Exchanging Reference Metadata using SDMX
Survey phases, survey errors and quality control system
Generic Statistical Business Process Model (GSBPM)
Survey phases, survey errors and quality control system
Overview of the ESS quality framework and context
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
Sub-regional workshop on integration of administrative data, big data
6.1 Quality improvement Regional Course on
Italian situation in the following areas:
2. An overview of SDMX (What is SDMX? Part I)
SDMX Information Model: An Introduction
Sub-Regional Workshop on International Merchandise Trade Statistics Compilation and Export and Import Unit Value Indices 21 – 25 November Guam.
Metadata The metadata contains
Presentation to SISAI Luxembourg, 12 June 2012
The role of metadata in census data dissemination
Metadata on quality of statistical information
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
Joint UNECE/Eurostat/OECD
Introduction to reference metadata and quality reporting
Compliance for statistics
Modernization of Social statistics: integrated use of survey and
ESS conceptual standards for quality reporting
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Supporting the use of administrative data in official statistics. The System of Integrated Microdata – SIM and the Quality Report Card of Administrative data - QRCA Marina Venturi Grazia Di Bella Italian National Institute of Statistics - Istat NTTS 2017 Brussels, 14 - 16 March 2017 NTTS 2017

New Istat modernization process Innovation of the statistical production system. Use of AdminData The overall Istat modernization process is focused on the new System of Integrated Registries based mainly on AdminData: Business Register Population Register Places Register Activities Register A single Directorate deals with all aspects of the data collection, survey and AdminData, and the first data treatment process. Brussels, 14 -15 March 2017 NTTS 2017

Automation of processes for the management of the AdminData flows Considering the complexity of the system and the need to process large amounts of data Strategy Use of IT tools to acquire, store, integrate and disseminate AdminData Strengthening of the metadata system to automate procedures and to create an updated documentation for AdminData quality Brussels, 14 -15 March 2017 NTTS 2017

AdminData centralized management See 13C001 Istat presentation by Giacomi AdminData management functions IT TOOLS A. Collection of AD requirements from Istat statistics producers B. Formulation of AD requests ​​for each AD holder and for each AD source C. AD acquisition and storing D. Procedures to ensure data confidentiality D. AD formal Concept Analysis/ identification of objects and relations E. Data loading F. AD Integration G. Recoding H. Dissemination to internal users ARCAM AD quality evaluation and documentation SIM EDI, SIM, ARCAM

SIM - The Integrated System of Administrative Microdata SIM is the Istat Database of integrated Admin microdata built with the aim of supporting the Istat statistical production process. Admin source subsets supplied (datasets) that contain microdata referred to statistical target units enter the system Individuals Economic units Places

The SIM process: concept analysis As AD comes from different sources -> different characteristics To make data consistent with the integration system data analysis, according to the entity-relationship model standardized procedures for each dataset delivered periodically data loading into relational tables (object/entity its attributes) SIM integrates 70 different AdminData subsets each year from 2011 and is going to integrate other AdminData subsets in 2017

The SIM process: data loading Aministrative objects/entities recognized as statistical target units of type k, with k=1,2,3 feed the three main subsystems Individuals Economic units Places AdminDatasets may contain in the same instance more basic units Individuals (families relationships) Economic units (local units) SIM relations subsystems: Individuals and Economic units (i.e. Workers Leed, Students- Schools) Places and economic units Places and individuals Statistical Registers: Population/Business/Places/Activities/

The SIM process: integrating data [1] The stage of Integration is incremental: as the datasets are acquired they are progressively integrated in SIM with the data already present. Operationally, the integration of the i-th dataset is made through a series of record linkage procedures between the input dataset and the corresponding Base Bk (k=1,2,3). Depending on the type of unit and on the AD domain defined by several criteria, a suitable integration strategy and a set of algorithms are applied.

The SIM process: integrating data [2] The integration of a dataset provides many integration processes as are the types of units. The integration stage ends with the attribution of an unique identification code (the SIM code) within the respective Base Bk, and each instance becomes part of the Base. Units that the integration procedures do not match with others already recorded in the Base will received a new code.

The Base for Integration B1 YEAR Last year of inclusion in the Base instance ID_ADMINDATASET Identifies the supply of the subset of an AdminSource, together with time reference, identifies the input administrative datasets. PROGRESSIVE (OF THE ACQUIRED ADMIN_DATASET) In hierarchy with ID_ADMINDATASET, enables connection with other data on it (other specific attributes of the units). SIM CODE SIM CODE derived from the integration procedures TAX CODE Linkage variables (Individual representation) LAST NAME NAME SEX DATE OF BIRTH CODCAT BIRTH PROVINCE BIRTH MUNICIPALY BIRTH COUNTRY BIRTH ENTRY DATE   DEATH DATE A2009 Reference year (dummy variables) A2010 ... A2018 STEP Step at which the unit is integrated into the base, as part of the integration strategy adopted for dataset SUBSET In case of Provisional/Final data

The SIM process: a step for data security Identification variables of the individuals are stored in a separate table. Integrated anonymized data are made available to internal users who have requested them and who are authorized to use them. The advantage of centrally integrate data is twofold: To allow users to link datasets simply using the SIM codes (no use of identification variables) avoiding duplicate work To comply with legislation on data privacy

About AdminData quality - What is useful and for whom  AD usability analysis function Information on AD availability and usability  AD monitoring function To identify possible regulatory changes that may induce discontinuity and that are not notify in advance; To detect the presence of unexpected lack of quality For Istat users Support the collection of AD requirements Reporting on AdminData usability AD supply monitoring function To promptly check AD compliance with respect to data requests For the AD acquisition process unit Now somenthing on AdminData quality: what is useful and for whom  Feedback to improving AD quality To share data quality with suppliers in specified manner defined case by case. For AD holders NTTS 2017

Implementation using metadata How to MAKE and UPDATE efficiently and timely AD quality evaluation and documentation in a generalized way for all about 300 datasets acquired yearly by Istat corresponding to about a terabyte of data In the context of Istat modernization – metadata-driven production paradigm Istat IT systems contain many useful metadata NTTS 2017

Useful metadata [1] DB supporting AdminData acquisition - ARCAM Metadata to identify ADsets Agreements for delivery with the AD holders AD Internal Users List of Admin datasets available (source, data holder, dataset name, periodicity) Quality measures Admin dataset Relevance (users, related EU regulations) For each delivery (reference time, period) Punctuality Timeliness

Useful metadata [2] DB Oracle of the Integrated AdminMicrodata - SIM Metadata supporting ETL and integration procedures - SISME Variables list Categories for categorical variables (classifications) Kind of target units and kind of relationships available

Useful metadata [2] DB Oracle of the Integrated AdminMicrodata - SIM With the aim of supporting data quality evaluation in the acquisition process (compliance analysis) a parameterized table contains for each file: n. of records NOT NULL values frequency distribution for categorical variables Technical checks for the data compliance (variables, units), quality monitoring: comparisons of the quality measures with the previous datasets delivered - to promptly contact supplier in case of serious problem Percentage of missing values for each relevant variable (also in time series) Metadata Completeness for Categories descriptions …

Record linkage quality indicators DB Oracle of the Integrated AdminMicrodata - SIM Metadata of the record linkage process Bk data Measures of Linkage variables quality (which variables are available and their quality) Measures for monitoring quality of the integration procedures (deterministic measures) Record linkage quality indicators (false positive and false negative) to estimate for certain values of the monitoring quality measures

Other useful information/metadata ARCAM SIM and also The DB that manages the National Statistical Programme, the legislative measure that define statistics production for the NSS. It holds information on AD that statistical processes can use Measures for compliance with the rules on confidentiality) SIQual is the information system on quality for Istat surveys (users statistics – oriented) Measures of the response burden reduction using AD, output quality documentation about the use of AD… SUM the Unified System of Metadata related to statistical data and processes, modelled according to GSIM. Very useful to standardize metadata and supporting the modernization process, to now used for the statistics dissemination. Admin variables conceptual comparability (comparing input and output)

A critical point is Systems interoperability Each system is designed with a specific functionality and it is necessary to define a line that connects the information From the conceptual point of view From the technical point of view If possible it could be useful to share objectives among IT specialists and statisticians from the beginning, in order to standardize procedures and make metadata reusable

Steps and progress a. Adoption of a AdminData quality framework b. Definition of measures within the framework c. Deep analysis of existing processes, metadata and data flows d. Classification of measures including implementable in the short-term with metadata already available implementable in the medium-term with metadata available but still not accessible implementable in the long-term with information to acquire Propose and support the interoperability of systems f. Measures implementation g. Prototype by the use of IT generalized systems – BI (Microstrategy) to produce the Quality Report

Metadata and Measures by dimension and implementation level HYPERDIMENSIONS Implementation Total short-term mid-term long-term SOURCE 12 7 13 32 AD source definition 1   3 4 Metadata to identify dataset to acquire 5 6 Metadata to identify dataset supplied 2 Data Privacy procedure AD source Relevance Feedback with AD holder METADATA 8 15 Clarity for units and variables Comparability for units and variables Change of Metadata over time Data treatment by the source holder DATA 26 29 62 Technical Checks Accuracy  1  8  9 Completness  4  12 Integrability/Integration 16 23 Time-related dimension 14 19 41 49 109 NTTS 2017

Objectives to achieve Quality report card of administrative data For each ADset (about 300 datasets, of which 100 integrated in SIM) A Report automatically generated using metadata and data from existing DB and periodically updated + Other quality measures carried out by the users to share with others (coverage indicators, accuracy indicators) Further development First quality check of the Integrated System (Activities Register)

Prospects and challenges  Improve efficiency Improve quality Manage complexity Interaction with many actors Need to comply with the processes timeliness NTTS 2017