Data warehouse approach to statistical data management and the prospect of its use for scanner data Antonio Laureti Palma Workshop scanner.


Similar presentations
Strategies for web based data dissemination A strategy is a plan of action designed to achieve a vision - from Greek "στρατηγία" (strategia). Zoltan Nagy.

ANNUAL INDUSTRIAL SURVEY (ENIA) Sergio Fontalva U. Santiago, June 2009.
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
ECONOMIC STATISTICS AND NATIONAL ACCOUNT IN ETHIOPIA By Sehin Merawi Central Statistical Agency of Ethiopia.
Fulvia Cerroni - Serena Migliardo - Enrica Morganti Italian National Institute of Statistics Session 27: Use of administrative sources I Helsinki 5 May.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
CZECH STATISTICAL OFFICE | Na padesatem 81, Prague 10 | Jitka Prokop, Czech Statistical Office SMS-QUALITY The project and application.
Information System for Quality Documentation A Short Presentation for the ESTP Course “Data Dissemination and Publication of Statistics” by Sonia Vittozzi.
S-DWH Architecture (Recap):
Dissemination and statistical use of Business Register data Conference of European statisticians Group of experts of Business Registers Luxembourg, 6-7.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Setting up a National Warehouse of Official Statistics in India P C Mohanan Deputy Director general National Statistical Organisation Ministry of Statistics.
Regional GDP Workshop. Purpose of the Project October Regional GDP Workshop Regional GDP Scope Annual Current price (nominal) GDP By region.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Metadata: Integral Part of Statistics Canada Quality Framework International Conference on Agriculture Statistics October 22-24, 2007 Marcelle Dion Director.
Carmela Pascucci – Istat - Italy Meeting of the Working Party on International Trade in Goods and Trade in Services Statistics (WPTGS) Linking business.
Electronic reporting in Poland 27th Voorburg Group Meeting Warsaw, Poland October 1st to October 5th, 2012 Central Statistical Office of Poland.
Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data.
Geneva, 30 October 2009 Giuseppe Sindoni, Istat, Italy An online system for multi-channel, register-based census data collection.
CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic The use of administrative data sources (experience and challenges)
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Vincenzo Del Vecchio Banca d’Italia Statistics Collection and Processing Department 2012 ESSnet Workshop – 4 December.
Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg,
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
May 2012 ESSnet DWH - Workshop III BUSINESS REGISTER IN STATISTICS LITHUANIA Jurga Rukšėnaitė Chief specialist.
Assessing the Capacity of Statistical Systems Development Data Group.
CountrySTAT Regional Basic Administrator Training for ECO Member States Friday, October 23, 2015 EVENT Foundations of CountrySTAT E-learning.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
Cristina Casciano, Viviana De Giorgi, Filippo Oropallo Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
New sources – administrative registers Genovefa RUŽIĆ.
Instituto Nacional de Estadística, Geografía e Informática (INEGI), Mexico National Economic Surveys (NES) Jun 2007.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Experience and response in developing countries: the twinning project with the Tunisian National Statistical Institute Monica Consalvi ISTAT, Division.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
ESS-net DWH ESSnet on microdata linking and data warehousing in statistical production Harry Goossens – Statistics Netherlands Head Data Service Centre.
Metadata Framework for a Statistical Data Warehouse
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
Overview and challenges in the use of administrative data in official statistics IAOS Conference Shanghai, October 2008 Heli Jeskanen-Sundström Statistics.
ESS-net DWH ESSnet on microdata linking and data warehousing in statistical production.
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
Harry Goossens Centre of Competence on Data Warehousing.
S-DWH Approach to statistical data management: The practice case of the SBS production process in ISTAT Francesco Altarocca, ISTAT Diego Bellisai, ISTAT.
Developing a metadata system for microdata About the project of developing a system for description of microdata at Statistics Sweden.
1 Statistical Business Register Infrastructure for statistical production systems Regional Seminar on Implementation of 2008 SNA and Supporting Statistics.
Meeting of Task Force on Small and Medium Sized Enterprise Data (SMED ) 13 th April 2015, 10:00-17:00 Inclusion of all economic sectors in SBS Giampiero.
Life circumstances and service delivery Community survey Finalise pilot survey (June 2006) List of dwellings completed (September 2006) Processes, systems.
Introduction to Statistics Estonia Study visit of the State Statistical Service of Ukraine on Dissemination of Statistical Information and related themes.
Statistical Business Register Enterprise Groups in Latvia Sarmite Prole Head of Business Register Section Business Statics Department Central Statistical.
1 C.I.A.T. Technical Conference Theme 2 KEY ASPECTS FOR IMPROVING CONTROL CAPACITY OF TAX ADMINISTRATION Mr Luigi Magistro Director of the Central Directorate.
Metadata models to support the statistical cycle: IMDB
“The infrastructure for the SBS-Frame production in ISTAT”
Re-engineering the French Business Register ( )
Generic Statistical Business Process Model (GSBPM)
YTY − an integrated production system for business statistics
Italian situation in the following areas:
“The role of S-DWH in the ESS 2020 modernization process”
Agenda Context of the BR Redesign Redesign Objectives Redesign changes
SDMX in the S-DWH Layered Architecture
Wiesbaden, 24 October, 2007 Svetlana Shutova Statistics Estonia
Parallel Session: BR maintenance Quality in maintenance of a BR:
Hanna Gembarzewska, Monika Grabani
Country Report of the Statistical Center of Iran for Workshop on Integrated Economic Statistics and Informal Sector for ECO Member Countries November.
Presentation transcript:

Data warehouse approach to statistical data management and the prospect of its use for scanner data Antonio Laureti Palma Workshop scanner data. Rome 1-2 October 2015

Summary the context, SBS.Frame, in which the DWH has been developed INSIDE: INtegrated StatIstical Datawarehouse Environment software features of INSIDE Mapping data Extracting data possible application in the context of the scanner data projec 2 Workshop scanner data. Rome 1-2 October 2015

The Frame, based on administrative data, allows ISTAT to obtain by sum the main economic aggregates required by the Eurostat SBS (Structural Business Statistics) Regulation The Frame allows ISTAT to overcome the limitations of the estimation domains of the sample surveys; the possibility to have accurate estimates on a relevant number of sub- populations A detailed and multidimensional mapping of the enterprises is possible It represents the new base for the National Accounts SEC 2010 estimates; SBS-Frame in the contest of business statistics 3 Workshop scanner data. Rome 1-2 October 2015

IDSourceDescriptionSupplierunits FS Financial Statements annual profit and loss statements of limited liability companies Chambers of Commerce 750K SS Sector Studies survey SMEs with Turnover in [30K-7.5M] euros Italian Revenue Agency 3.5M UNTax returns form unified model of tax declarations by legal form, containing economic information for different legal forms Italian Revenue Agency 4.4M IRAP Regional Tax on Productive Activities form Model of declaration for Regional Tax on Productive Activities payment Italian Revenue Agency 4.4M SME Small Medium Ent. Survey sample survey on enterprises with less than 100 employees ISTAT100K RACLI Labour Cost by Enterprise Reg. Register of Labour Cost by EnterpriseISTAT1.5M SBRBusiness Register Italian official Business Register of Active Enterprises44 ISTAT4.4M The Frame Sources 4 Workshop scanner data. Rome 1-2 October 2015

SBS-Frame process features annual activity variability of the sources many actors’ interactions complex workflow different actor skills tracking methodological choices replicability of results documenting processes storing distributed knowledge 5 Workshop scanner data. Rome 1-2 October 2015

Statistical Data Warehouse (S-DWH) To support the workflow is used a data-centric approach by a Statistical-Data Warehouse (S-DWH) as a single central data store Basic requirements for the S-DWH are: an easy-to-use environment to access complex data control of information visibility support of multiple-purpose statistical information in a specific statistical domain a metadata-driven model a single integrated system 6 Workshop scanner data. Rome 1-2 October 2015

7 To support the SBS-Frame production, the INSIDE (INtegrated StatIstical Datawarehouse Environment) software application has been implemented INSIDE basic architecture: The implementation of INSIDE

Layered S-DWH From an architectural point of view, we identify four conceptual layers in the S-DWH: access layer, for the final presentation, dissemination and delivery of the information sought; interpretation and data analysis layer enables data analysis or evaluation for statistical design; integration layer is where all operational activities are carried out; in this layer data are integrated and transformed in order to increase performance and usability of the upper layer; source layer is the level where data sources are stored; internal data (from surveys or step elaboration) or external data (from administrative provisions). 8 Workshop scanner data. Rome 1-2 October 2015

RoleDescription Source Integration Interpretation Access source mapperis a source expert responsible for mapping of economic variables data analystperforms statistical analysis and is in charge of all or part of the statistical production process data administratorresponsible for managing the data flows, user authorization and system maintenance INSIDE: user roles 9 Workshop scanner data. Rome 1-2 October 2015

“data mapping is the process of creating data element mappings between two distinct data models in order to overcome the lack of control in source provisions” the mapper is a source expert, specialized in a topic, responsible for the coherent mapping with the internal S-DWH dictionary. has access permission mapping is automatic or manual IRAP … … survey variables mapping SS FS internal dictionary Frame source integration INSIDE: user mappers Workshop scanner data. Rome 1-2 October

FRAME SBS FRAME SBS View SBS View SBS View NA View NA SASWF access interpretation data analyst INSIDE: user analysts Workshop scanner data. Rome 1-2 October 2015 data analysts make the statistical evaluations. The access layer is optimized for interacting easily with complex data. This allows: basic analysis creates a view in a private area from a list of selected data sources access to the views through standard statistical software has access permission

INSIDE: data administrator synthetic metadata model schemas source integration interpretation provisions layouts dictionary provision view layouts docs dimensions access facts timing monitoring user 12 Workshop scanner data. Rome 1-2 October 2015

INSIDE: data model synthetic microdata model schemas integrationinterpretationaccess data hubfact tables views/marts SBR UNICO FS EMP CLASS GEO ATECO SS JUR. FORM FS DIM SS DIM DICTIONARY source tables provisions SBR surveys derived source PROVISION SBS 13 Workshop scanner data. Rome 1-2 October 2015

INSIDE software application: user modules MAPPING VIEWER 14 Workshop scanner data. Rome 1-2 October 2015

mapping view results sources’ variables INSIDE software application: mapper S-DWH dictionary 15 Workshop scanner data. Rome 1-2 October 2015

automatic mapping results probabilistic matching, percentage of association manual matching INSIDE software application: mapper 16 Workshop scanner data. Rome 1-2 October 2015 not matched

INSIDE architecture: two user modules MAPPING VIEWER 17 Workshop scanner data. Rome 1-2 October 2015

facts list INSIDE software application: view builder building area: select area building area: where area 18 Workshop scanner data. Rome 1-2 October 2015

view preview INSIDE software application viewer: view builder view name 19 Workshop scanner data. Rome 1-2 October 2015

INSIDE software application viewer: view manager view manager 20 Workshop scanner data. Rome 1-2 October 2015

2-tier system INSIDE data analyst: desktop application environment INSIDE architecture: two user modules PUBLISHING & SHARING SERVICES CONTENT MANAGEMENT 21 Workshop scanner data. Rome 1-2 October 2015

Possible application in the context of the scanner data project Mapping variables INSIDE is optimized for the managing of complex sources: managing the acceptance process of any new (EAN) metadata provision managing the substitution of products (at GTIN/EAN code level): filtering by ECR classification mapping by text matching temporal data pre-viewing (turnover check) code linking the EAN to COICOP classification articulating the mapping activities within different source competence groups, ECR area or COICOP area 22 Workshop scanner data. Rome 1-2 October 2015

Data analysis INSIDE is optimized for the access to complex data, allowing: -easy access to the micro data at outlet level for several months -control of the data visibility of the users by product area -analysis of microdata (temporally or spatially) by COICOP or ECR classification or both -possibility of using any statistical software for analysis -use of the access layer as standard input for a production processes 23 Possible application in the context of the scanner data project Workshop scanner data. Rome 1-2 October 2015

INSIDE software application: mapper 24 Workshop scanner data. Rome 1-2 October 2015 EAN in EAN matched EAN internal


26 s s filter by ECR filter by EAN text SORG.ORTICAIA search DESC EAN: : ACQUA SANTA EGERIA STD TAVOLA MINERALE GAS PLAS CL INSIDE software application: mapper , SORG.ORTICAIA …… CL , SORG.ORTICAIA …… CL data preview


thanks for your attention 28 Workshop scanner data. Rome 1-2 October 2015