Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting the use of administrative data in official statistics.

Similar presentations


Presentation on theme: "Supporting the use of administrative data in official statistics."— Presentation transcript:

1 Supporting the use of administrative data in official statistics.
The System of Integrated Microdata – SIM and the Quality Report Card of Administrative data - QRCA Marina Venturi Grazia Di Bella Italian National Institute of Statistics - Istat NTTS 2017 Brussels, March 2017 NTTS 2017

2 New Istat modernization process
Innovation of the statistical production system. Use of AdminData The overall Istat modernization process is focused on the new System of Integrated Registries based mainly on AdminData: Business Register Population Register Places Register Activities Register A single Directorate deals with all aspects of the data collection, survey and AdminData, and the first data treatment process. Brussels, March 2017 NTTS 2017

3 Automation of processes for the management of the AdminData flows
Considering the complexity of the system and the need to process large amounts of data Strategy Use of IT tools to acquire, store, integrate and disseminate AdminData Strengthening of the metadata system to automate procedures and to create an updated documentation for AdminData quality Brussels, March 2017 NTTS 2017

4 AdminData centralized management
See 13C001 Istat presentation by Giacomi AdminData management functions IT TOOLS A. Collection of AD requirements from Istat statistics producers B. Formulation of AD requests ​​for each AD holder and for each AD source C. AD acquisition and storing D. Procedures to ensure data confidentiality D. AD formal Concept Analysis/ identification of objects and relations E. Data loading F. AD Integration G. Recoding H. Dissemination to internal users ARCAM AD quality evaluation and documentation SIM EDI, SIM, ARCAM

5 SIM - The Integrated System of Administrative Microdata
SIM is the Istat Database of integrated Admin microdata built with the aim of supporting the Istat statistical production process. Admin source subsets supplied (datasets) that contain microdata referred to statistical target units enter the system Individuals Economic units Places

6 The SIM process: concept analysis
As AD comes from different sources -> different characteristics To make data consistent with the integration system data analysis, according to the entity-relationship model standardized procedures for each dataset delivered periodically data loading into relational tables (object/entity its attributes) SIM integrates 70 different AdminData subsets each year from 2011 and is going to integrate other AdminData subsets in 2017

7 The SIM process: data loading
Aministrative objects/entities recognized as statistical target units of type k, with k=1,2,3 feed the three main subsystems Individuals Economic units Places AdminDatasets may contain in the same instance more basic units Individuals (families relationships) Economic units (local units) SIM relations subsystems: Individuals and Economic units (i.e. Workers Leed, Students- Schools) Places and economic units Places and individuals Statistical Registers: Population/Business/Places/Activities/

8 The SIM process: integrating data [1]
The stage of Integration is incremental: as the datasets are acquired they are progressively integrated in SIM with the data already present. Operationally, the integration of the i-th dataset is made through a series of record linkage procedures between the input dataset and the corresponding Base Bk (k=1,2,3). Depending on the type of unit and on the AD domain defined by several criteria, a suitable integration strategy and a set of algorithms are applied.

9 The SIM process: integrating data [2]
The integration of a dataset provides many integration processes as are the types of units. The integration stage ends with the attribution of an unique identification code (the SIM code) within the respective Base Bk, and each instance becomes part of the Base. Units that the integration procedures do not match with others already recorded in the Base will received a new code.

10 The Base for Integration B1
YEAR Last year of inclusion in the Base instance ID_ADMINDATASET Identifies the supply of the subset of an AdminSource, together with time reference, identifies the input administrative datasets. PROGRESSIVE (OF THE ACQUIRED ADMIN_DATASET) In hierarchy with ID_ADMINDATASET, enables connection with other data on it (other specific attributes of the units). SIM CODE SIM CODE derived from the integration procedures TAX CODE Linkage variables (Individual representation) LAST NAME NAME SEX DATE OF BIRTH CODCAT BIRTH PROVINCE BIRTH MUNICIPALY BIRTH COUNTRY BIRTH ENTRY DATE DEATH DATE A2009 Reference year (dummy variables) A2010 ... A2018 STEP Step at which the unit is integrated into the base, as part of the integration strategy adopted for dataset SUBSET In case of Provisional/Final data

11 The SIM process: a step for data security
Identification variables of the individuals are stored in a separate table. Integrated anonymized data are made available to internal users who have requested them and who are authorized to use them. The advantage of centrally integrate data is twofold: To allow users to link datasets simply using the SIM codes (no use of identification variables) avoiding duplicate work To comply with legislation on data privacy

12 About AdminData quality - What is useful and for whom
 AD usability analysis function Information on AD availability and usability  AD monitoring function To identify possible regulatory changes that may induce discontinuity and that are not notify in advance; To detect the presence of unexpected lack of quality For Istat users Support the collection of AD requirements Reporting on AdminData usability AD supply monitoring function To promptly check AD compliance with respect to data requests For the AD acquisition process unit Now somenthing on AdminData quality: what is useful and for whom  Feedback to improving AD quality To share data quality with suppliers in specified manner defined case by case. For AD holders NTTS 2017

13 Implementation using metadata
How to MAKE and UPDATE efficiently and timely AD quality evaluation and documentation in a generalized way for all about 300 datasets acquired yearly by Istat corresponding to about a terabyte of data In the context of Istat modernization – metadata-driven production paradigm Istat IT systems contain many useful metadata NTTS 2017

14 Useful metadata [1] DB supporting AdminData acquisition - ARCAM
Metadata to identify ADsets Agreements for delivery with the AD holders AD Internal Users List of Admin datasets available (source, data holder, dataset name, periodicity) Quality measures Admin dataset Relevance (users, related EU regulations) For each delivery (reference time, period) Punctuality Timeliness

15 Useful metadata [2] DB Oracle of the Integrated AdminMicrodata - SIM
Metadata supporting ETL and integration procedures - SISME Variables list Categories for categorical variables (classifications) Kind of target units and kind of relationships available

16 Useful metadata [2] DB Oracle of the Integrated AdminMicrodata - SIM
With the aim of supporting data quality evaluation in the acquisition process (compliance analysis) a parameterized table contains for each file: n. of records NOT NULL values frequency distribution for categorical variables Technical checks for the data compliance (variables, units), quality monitoring: comparisons of the quality measures with the previous datasets delivered - to promptly contact supplier in case of serious problem Percentage of missing values for each relevant variable (also in time series) Metadata Completeness for Categories descriptions …

17 Record linkage quality indicators
DB Oracle of the Integrated AdminMicrodata - SIM Metadata of the record linkage process Bk data Measures of Linkage variables quality (which variables are available and their quality) Measures for monitoring quality of the integration procedures (deterministic measures) Record linkage quality indicators (false positive and false negative) to estimate for certain values of the monitoring quality measures

18 Other useful information/metadata
ARCAM SIM and also The DB that manages the National Statistical Programme, the legislative measure that define statistics production for the NSS. It holds information on AD that statistical processes can use Measures for compliance with the rules on confidentiality) SIQual is the information system on quality for Istat surveys (users statistics – oriented) Measures of the response burden reduction using AD, output quality documentation about the use of AD… SUM the Unified System of Metadata related to statistical data and processes, modelled according to GSIM. Very useful to standardize metadata and supporting the modernization process, to now used for the statistics dissemination. Admin variables conceptual comparability (comparing input and output)

19 A critical point is Systems interoperability
Each system is designed with a specific functionality and it is necessary to define a line that connects the information From the conceptual point of view From the technical point of view If possible it could be useful to share objectives among IT specialists and statisticians from the beginning, in order to standardize procedures and make metadata reusable

20 Steps and progress a. Adoption of a AdminData quality framework
b. Definition of measures within the framework c. Deep analysis of existing processes, metadata and data flows d. Classification of measures including implementable in the short-term with metadata already available implementable in the medium-term with metadata available but still not accessible implementable in the long-term with information to acquire Propose and support the interoperability of systems f. Measures implementation g. Prototype by the use of IT generalized systems – BI (Microstrategy) to produce the Quality Report

21 Metadata and Measures by dimension and implementation level
HYPERDIMENSIONS Implementation Total short-term mid-term long-term SOURCE 12 7 13 32 AD source definition 1 3 4 Metadata to identify dataset to acquire 5 6 Metadata to identify dataset supplied 2 Data Privacy procedure AD source Relevance Feedback with AD holder METADATA 8 15 Clarity for units and variables Comparability for units and variables Change of Metadata over time Data treatment by the source holder DATA 26 29 62 Technical Checks Accuracy  1  8  9 Completness  4  12 Integrability/Integration 16 23 Time-related dimension 14 19 41 49 109 NTTS 2017

22 Objectives to achieve Quality report card of administrative data
For each ADset (about 300 datasets, of which 100 integrated in SIM) A Report automatically generated using metadata and data from existing DB and periodically updated + Other quality measures carried out by the users to share with others (coverage indicators, accuracy indicators) Further development First quality check of the Integrated System (Activities Register)

23 Prospects and challenges
 Improve efficiency Improve quality Manage complexity Interaction with many actors Need to comply with the processes timeliness NTTS 2017


Download ppt "Supporting the use of administrative data in official statistics."

Similar presentations


Ads by Google