S-DWH Approach to statistical data management: The practice case of the SBS production process in ISTAT Francesco Altarocca, ISTAT Diego Bellisai, ISTAT.

Slides:



Advertisements
Similar presentations
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
Advertisements

Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
Fulvia Cerroni - Serena Migliardo - Enrica Morganti Italian National Institute of Statistics Session 27: Use of administrative sources I Helsinki 5 May.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
Introduction to Database Management
An Integrated Approach to Economic Statistics “ The Canadian Experience” UNSD – IBGE Workshop on Manufacturing Statistics Kevin Roberts Rio de Janeiro,
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
S-DWH Architecture (Recap):
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Setting up a National Warehouse of Official Statistics in India P C Mohanan Deputy Director general National Statistical Organisation Ministry of Statistics.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Carmela Pascucci – Istat - Italy Meeting of the Working Party on International Trade in Goods and Trade in Services Statistics (WPTGS) Linking business.
Electronic reporting in Poland 27th Voorburg Group Meeting Warsaw, Poland October 1st to October 5th, 2012 Central Statistical Office of Poland.
Geneva, 30 October 2009 Giuseppe Sindoni, Istat, Italy An online system for multi-channel, register-based census data collection.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
European Conference on Quality in Official Statistics (Q2010) 4-6 May 2010, Helsinki, Finland Brancato G., Carbini R., Murgia M., Simeoni G. Istat, Italian.
1 Business Register: Quality Practices Eddie Salyers
Data Warehousing at STC MSIS 2007 Geneva, May 8-10, 2007 Karen Doherty Director General Informatics Branch Statistics Canada.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg,
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic 1 Subsystem QUALITY in Statistical Information System Czech.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Francesco Rizzo (ISTAT - Italy) SDMX ISTAT FRAMEWORK GENEVE May 2007 OECD SDMX Expert Group.
Cristina Casciano, Viviana De Giorgi, Filippo Oropallo Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Instituto Nacional de Estadística, Geografía e Informática (INEGI), Mexico National Economic Surveys (NES) Jun 2007.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Data warehouse approach to statistical data management and the prospect of its use for scanner data Antonio Laureti Palma Workshop scanner.
Experience and response in developing countries: the twinning project with the Tunisian National Statistical Institute Monica Consalvi ISTAT, Division.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Metadata Framework for a Statistical Data Warehouse
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
Role of the IMDB in the CBA and IM Strategy Presented to Information Management Committee Standards Division June
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
Harry Goossens Centre of Competence on Data Warehousing.
The business process models and quality issues at the Hungarian Central Statistical Office (HCSO) Mr. Csaba Ábry, HCSO, Methodological Department Geneva,
Developing a metadata system for microdata About the project of developing a system for description of microdata at Statistics Sweden.
Meeting of Task Force on Small and Medium Sized Enterprise Data (SMED ) 13 th April 2015, 10:00-17:00 Inclusion of all economic sectors in SBS Giampiero.
Statistical Business Register Enterprise Groups in Latvia Sarmite Prole Head of Business Register Section Business Statics Department Central Statistical.
1 C.I.A.T. Technical Conference Theme 2 KEY ASPECTS FOR IMPROVING CONTROL CAPACITY OF TAX ADMINISTRATION Mr Luigi Magistro Director of the Central Directorate.
Metadata models to support the statistical cycle: IMDB
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
“The infrastructure for the SBS-Frame production in ISTAT”
Basic Concepts in Data Management
Estimation methods for the integration of administrative sources
Generic Statistical Business Process Model (GSBPM)
ESTP COURSE ON PRODCOM STATISTICS
YTY − an integrated production system for business statistics
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Metadata Framework as the basis for Metadata-driven Architecture
SDMX in the S-DWH Layered Architecture
National Institute of Statistics – Italy
Mapping Data Production Processes to the GSBPM
Country Report of the Statistical Center of Iran for Workshop on Integrated Economic Statistics and Informal Sector for ECO Member Countries November.
SDMX IT Tools SDMX Registry
Presentation transcript:

S-DWH Approach to statistical data management: The practice case of the SBS production process in ISTAT Francesco Altarocca, ISTAT Diego Bellisai, ISTAT Antonio Laureti Palma, ISTAT

Summary What is the SBS-Frame SBS-Frame production process A Statistical Data Warehouse (S-DWH) for supporting the production workflow INSIDE: INtegrated StatIstical Datawarehouse Environment software INSIDE to support the SBS-Frame workflow Features of INSIDE

The Frame allows ISTAT to obtain by sum the main economic aggregates required by the Eurostat SBS (Structural Business Statistics) Regulation The Frame allows ISTAT to overcome the limitations of the estimation domains of the sample surveys; the possibility to have accurate estimates on a relevant number of sub-populations A detailed and multidimensional mapping of the enterprises is possible It represents the new base for the National Accounts SEC 2010 estimates; It will be integrated at the micro level with foreign trade statistics (geographic features of export, foreign controlled enterprises) and labour statistics (employees characteristics, wage levels) SBS-Frame in the contest of business statistics

SBS-Frame variables The SBS Frame is an archive of the main annual economic variables on all the active Italian enterprises (around 4.4 million units) SBS Frame base variables are: from the SBR 1- Enterprise ID number 2- Economic activity 3- Number of employees new variables derived for the SBS-Frame 4- Turnover 5- Labour Cost 6- Total purchases of goods and services 7- Value-added at factor cost 8- Gross operating margin

IDSourceDescriptionSupplierunits#vars FS Financial Statements annual profit and loss statements of limited liability companies Chambers of Commerce 750K~300 SS Sector Studies survey SMEs with Turnover in [30K-7.5M] euros Italian Revenue Agency 3.5M~60 UNTax returns form unified model of tax declarations by legal form, containing economic information for different legal forms Italian Revenue Agency 4.4M~60 IRAP Regional Tax on Productive Activities form Model of declaration for Regional Tax on Productive Activities payment Italian Revenue Agency 4.4M~70 SME Small Medium Ent. Survey sample survey on enterprises with less than 100 employees ISTAT100K~200 RACLI Labour Cost by Enterprise Reg. Register of Labour Cost by EnterpriseISTAT1.5M~20 SBRBusiness Register Italian official Business Register of Active Enterprises ISTAT4.4M~20 The Frame’s Sources

integrated micro-data view RACLI (  33% of SMEs) UnitsID Ateco NEm TBRNEm PC WS WH SCY 1 1 Y … Y k 1 Y 1 2 Y 2 2 k 2 Y 1 3 Y 2 3 k 3 Y 1 S Y 2 S……...… Y p S 1 2Survey N (4.3 mil) SBR Not covered (  4%) FS (  16% of SMEs) SS (  80% of SMEs) UNICO (  97% of SMEs) The Frame’s Sources

data harmonization final dataset FS consistency CHK duplicate CHK discard bad data SS UN IRAP labour cost correction FS SS UN IRAP standardized dataset FS SS UN IRAP integration base FRAME imputation, estimation key vars Imputation missing units FRAME key variables FRAME variables of detail estimation, imputation other variables SBR RACLI The SBS-Frame process

data harmonization final dataset FS consistency CHK duplicate CHK discard bad data SS UN IRAP labour cost correction FS SS Uni co IRA P standardized dataset FS SS UN IRAP integration base FRAME imputation, estimation key vars Imputation missing units FRAME key variables FRAME variables of detail estimation, imputation other variables SBR RACLI

SBS-Frame process features annual activity complex workflow iterative activities many actors’ interactions different actor skills variability of the sources distributed computing tracking methodological choices replicability of results documenting processes storing distributed knowledge for safety in the production process

The SBS-Frame production process is executed once a year Due to source variability, it is necessary to make adjustments to the process yearly The adjustment phase takes two months At the end of the adjustment phase the process is executed only once SBS-Frame process features

scientific workflows  In the lT literature the SBS-Frame production process can be managed by a scientific workflow management system based on a data-centric approach A scientific workflow is a specialized workflow designed to compose and execute a series of complex computational or data manipulation steps. Basic requirements are: to construct simple what-if scenarios to support iterative hypothesis testing activities to share knowledge to track methodological choices

data-centric approach scientific workflows SETUP reuse, composition EXECUTION ANALYSIS evaluation, validation data warehouse DISSEMINATION

Statistical Data Warehouse (S-DWH) To support a data-centric approach we use a Statistical- Data Warehouse (S-DWH) as a single central data store Basic requirements for the S-DWH are: an easy-to-use environment for access complex data control of information visibility support of multiple-purpose statistical information in a specific statistical domain a metadata-driven model a single integrated system

Layered S-DWH From an architectural point of view, we can identify four conceptual layers in the S-DWH: access layer, for the final presentation, dissemination and delivery of the information sought; interpretation and data analysis layer enables data analysis or evaluation for statistical design; integration layer is where all operational activities are carried out; in this layer data are integrated and transformed in order to increase performance and usability of the upper layer; source layer is the level where data sources are stored; internal data (from surveys or step elaboration) or external data (from administrative provisions).

IRAP Unico SME survey data linking preliminary treatment SS FS integrated micro- data SBS-Frame process in S-DWH variables mapping SBR

IRAP Unico SME survey data linking preliminary treatment SS FS integrated micro- data SBS-Frame process in S-DWH variables mapping SBR dett VAR Frame dett VAR Frame missing imputation FRAME SBS FRAME SBS treatment of inconsistent data SBS views SBS views Key VAR Frame Key VAR Frame base Frame base Frame

IRAP Unico SME survey data linking preliminary treatment SS FS integrated micro- data SBS-Frame process in S-DWH variables mapping SBR missing imputation FRAME SBS FRAME SBS views SBS views sourceintegration interpretation access treatment of inconsistent data dett VAR Frame dett VAR Frame Key VAR Frame Key VAR Frame base Frame base Frame

IRAP Unico SME survey data linking preliminary treatment SS FS integrated micro- data SBS-Frame process in S-DWH variables mapping SBR missing imputation FRAME SBS FRAME SBS views SBS views sourceintegration interpretation access treatment of inconsistent data data centric approach workflow dett VAR Frame dett VAR Frame Key VAR Frame Key VAR Frame base Frame base Frame

The implementation of INSIDE To support the SBS-Frame production, the INSIDE (INtegrated StatIstical Datawarehouse Environment) software application has been implemented INSIDE is the S-DWH used to support the SBS-Frame workflow. INSIDE is: specialized in structural business statistics metadata driven based on 4 layers

INSIDE basic architecture

RoleDescription Source Integration Interpretation Access source mapperis a source expert responsible for mapping of economic variables data analystperforms statistical analysis and is in charge of all or part of the statistical production process data administratorresponsible for managing the data flows, user authorization and system maintenance INSIDE: user roles

IRAP Unico SME survey data linking preliminary treatment SS FS integrated micro- data SBS-Frame process in S-DWH variables mapping SBR missing imputation FRAME SBS FRAME SBS views SBS views sourceintegration interpretation access treatment of inconsistent data dett VAR Frame dett VAR Frame Key VAR Frame Key VAR Frame base Frame base Frame

data mapping is the process of creating data element mappings between two distinct data models the mapper is a source expert of economic variables, responsible for the coherent mapping between source variables and internal S-DWH variables. this operational activity is carried out in order to overcome the lack of control in source provisions IRAP … … SME survey variables mapping SS FS internal dictionary SBS- Frame source integration INSIDE: user: mapper

allows data analysts to access data regardless of their knowledge of the data-source layout works via a web interface directly on the source layer through a common metadata layer maps, automatically or manually, the source variables to the internal dictionary can search for keywords in the documentation attached to the source supply INSIDE: user: mapper IRAP … … SME survey variables mapping SS FS internal dictionary SBS- Frame source integration

IRAP Unico SME survey data linking preliminary treatment SS FS integrated micro- data SBS-Frame process in S-DWH variables mapping SBR missing imputation FRAME SBS FRAME SBS views SBS views sourceintegration interpretation access treatment of inconsistent data dett VAR Frame dett VAR Frame Key VAR Frame Key VAR Frame base Frame base Frame

FRAME SBS FRAME SBS View SBS View SBS View NA View NA SASWF access interpretation data analysts make the statistical evaluations. They work using an access- layer web interface to reach the interpretation layer data. The environment is optimized for the analysis of large amounts of data. This allows: basic analysis data mining testing by hypotheses simulation model processes procedure setup data analyst INSIDE: user: analyst

extracts microdata from a list of selected data sources previews the extraction has access permission creates a view in a private area shares the created views can access their views through standard statistical software (SAS, R, Stata,..) or through database querying language (SQL, PL/SQL) creates views as the starting points for further elaboration steps in the S-DWH life cycle INSIDE: user: analyst FRAME SBS FRAME SBS View SBS View SBS View NA View NA SASWF access interpretation data analyst

Data administrator: responsible for the data flow management, loads data sources, metadata and microdata acceptance operations manages data in all layers optimizes the data warehouse provisions provision layout provision layout dictionary facts view layout docs dimensions facts INSIDE: user: data administrator

data model: metadata logical schemes source integration interpretation provisions layouts dictionary provision view layouts docs dimensions access facts timing monitoring user

data model: data logical schemes integrationinterpretationaccess data hubfact tables views/marts SBR UNICO FS EMP CLASS GEO ATECO SS JUR. FORM FS DIM SS DIM DICTIONARY source tables provisions SBR surveys derived source PROVISION SBS

Key metadata concepts: PROVISION, set of information provided from a source (administrative or survey) relatively to a certain period SUBPROVISIONS, microdata, record layout, metadata, classifications, documentation, which represent the detail subset of the provision METASOURCE, record layout, elementary variables contained in a source DOCUMENTATION: set of documents (general information, tax instructions, etc.) related to a certain provision DICTIONARY items of the S-DWH, classified in measures, dimensions, unique identifiers and other non-identifying variables data model: metadata logical scheme

Key metadata concepts: DW_META_SOURCE, key associative table, contains the association between the microdata table fields and the DW dictionary items DW_META_ATTRIB, dictionary of the DW concepts DW_META_SOURCE_PCT, table functional to the mapping operation. It contains the calculated distances between the microdata table fields and the DW dictionary items DW_DIMENSION, description and the physical references to the dimensions used in the DW DW_FACT description and the physical references to the facts contained in the DW) data model: metadata logical scheme

INSIDE software application: user modules MAPPING VIEWER

Lista di selezione tipologie e periodo fornitura INSIDE software application: mapper welcome

35 provision list and time period INSIDE software application: mapper

mapping view results sources’ variables button for automatic mapping INSIDE software application: mapper S-DWH dictionary

automatic mapping results probabilistic matching, percentage of association manual matching INSIDE software application: mapper

38 Dettaglio metaattributo scelto probabilistic matching: details of possible matching and percentage of association for the first five results INSIDE software application: mapper

INSIDE architecture: two user modules MAPPING VIEWER VIDEO

INSIDE architecture: two user modules MAPPING VIEWER

The viewer is based on two sub-modules: View Builder: module for query building with the possibility of a data preview; View Manager: module for the management of views created INSIDE software application: viewer

facts list INSIDE software application: viewer: view builder building area: select area building area: where area

detail of variable INSIDE software application: viewer: view builder

search for variables through tags INSIDE software application: viewer: view builder

where conditions select condition fact links INSIDE software application: viewer: view builder

view preview INSIDE software application: viewer: view builder view name filter of yearly supply

query detail of the view recalled INSIDE software application: viewer: view manager view manager

query detail of the view INSIDE software application: viewer: view manager

49 view data preview INSIDE software application: viewer: view manager

view features: query structure, number of records, execution time INSIDE software application: viewer: view manager

INSIDE architecture: two user modules MAPPING VIDEO VIEWER

2-tier system INSIDE data analyst: desktop application environment VIDEO INSIDE architecture: two user modules PUBLISING & SHARING SERVICES CONTENT MANAGEMENT

2-tier system INSIDE data analyst: workflow environment INSIDE architecture: two user modules CONTENT MANAGEMENT PUBLISHING & SHARING SERVICES WORKFLOW SERVICE

INSIDE Tecnologies/FW TechnologyDescription Red Hat EL Server 5.11Web server operating system PHP 5.3Application development language APACHE 2.0Web Server software Yii 1.1MVC application framework BootstrapHTML,CSS and JavaScript framework (included in Yii) jQueryUIUser interface framework JavaScript library PdfJS viewerPDF viewer library HighchartsInteractive charting library in JavaScript DataTablesjQuery plug-in for advanced interaction controls to HTML table Red Hat EL Server 5.8Database server operating system Oracle 11gRelational database management system

thanks for your attention