Italian National Institute of Statistics - Istat The Italian Integrated System of Statistical Registers On the Design of an Ontology-based Data Integration Architecture R. Radini (radini@istat.it), M. Scannapieco (scannapi@istat.it) , G.Garofalo (garofalo@istat.it) Italian National Institute of Statistics - Istat Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Outline Introduction to ISSR OBDM and examples Data architecture Correspondence with EARF DV vs DW Conclusions Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
ISSR – Italian Integrated System of Statistical Registries Istat engaged a modernization programme aimed at a significant revision of the statistical production One of the main pillars of this revision is the design of production processes based on an Integrated System of Statistical Registers Single logical environment to support the consistency of statistical production processes in Istat, in particular consistency in “identification” and “estimation” for the whole integrated system of units and variables Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
ISSR: Types of Registers RSE (Extended registers) extends the information of a specific RSB on a specific RSB’s population RST (Thematic registers) supports more statistical processes through a consistent and shared treatment on some topics RSB (Base registers) contains several statistical populations and the minimum set of variables useful to characterize stat units Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
OBDM Ontology Based Data Management System Ontology (or computational ontology): conceptual data representation expressed through «computational» languages In mathematical logic: assiomatic first order theory expressable in description logic OBDM is an integration system where the usual ER global schema is replaced by the conceptual model of the application domain formulated as an ontology Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
OBDM Architecture Main features Data source transparency property (called data virtualization by IT platform) Global view Consistency Ontology Mapping Data source 1 Data source 3 Data source 2 Three-level architecture: Ontology, Sources, Mapping Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Excerpt of the Ontology of the Working Relationships Employee Self-employee Worker Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Excerpt of the Population Ontology Family registry Common law family Family Individual Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Data Integration: same concept Individual (Population Ontology) Individual (Working relationships ontology) Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Querying over the ontology Query: We would like to query for people that have the residence in a certain region and classify them by age, educational degree and employment condition We don’t have to know how information are stored in the sources! Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
by employment condition Query Ontology Mapping Mapping Query rewritten over the sources RS of Individuals RS of Labour people that have residence in a certain region classified by age and educational degree by employment condition Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
High expressive power It is possible to give different definition of a concept dependending on the istance It is possible to express different constraints related to each definition CorporationManager- Labour Force Employee Self-employee Corporation Manager NationalAccount CorporationManager has a different semantics according to the domain Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Data architecture Compliance to EARF (Enterprise Architecture Reference Framework) Metadata Management Primary Data Storage Quality Assessment Unitary Metadata System Logical centralization of ISSR Data consistency ODBM Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Data architecture: IT View Features DV DW Storage of Historical Data NO YES Capture Every Change in Production Data (requires integration with CDC) Multi-Dimensional Data Structures Data Pre-Aggregation Query performance on large amounts of data SLOW (relative to DW) FAST (relative to DV) Data Integration on Demand Operational Cost LOW HIGH Time-To-Market Easy to Make Changes Dependence on IT Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Conclusions EA approach for ISSR design and implementation ISSR Data Architecture: Hybrid solution with DV and DW E.g. DV-based data architecture with DW for historical data and dissemination Next steps: Prototypes of RSB Individual, Families and Cohabitations and RST Working Relationships Guidelines for the Management of the Integrated System of Statistical Registers Monica Scannapieco – Brussels, NTTS, 14-16 March 2017