“Mapping the GSBPM on a SDW architecture”

Slides:



Advertisements
Similar presentations
Chapter 1: The Database Environment
Advertisements

ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
Enterprise Architecture Framework in Statistics Poland
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.
Business Information Warehouse Business Information Warehouse.
Chapter 13 The Data Warehouse
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Lecture 5 Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Business Intelligence System September 2013 BI.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
S-DWH Architecture (Recap):
Enterprise Architecture
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Understanding Data Warehousing
1 Brett Hanes 30 March 2007 Data Warehousing & Business Intelligence 30 March 2007 Brett Hanes.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Data Warehousing at STC MSIS 2007 Geneva, May 8-10, 2007 Karen Doherty Director General Informatics Branch Statistics Canada.
M ETADATA OF NATIONAL STATISTICAL OFFICES B ELARUS, R USSIA AND K AZAKHSTAN Miroslava Brchanova, Moscow, October, 2014.
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
The Adoption of METIS GSBPM in Statistics Denmark.
Using SAS® Information Map Studio
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
Explaining the statistical data warehouse (S-DWH)
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
Metadata Framework for a Statistical Data Warehouse
GSIM, DDI & Standards- based Modernisation of Official Statistics Workshop – DDI Lifecycle: Looking Forward October 2012.
ESS-net DWH ESSnet on microdata linking and data warehousing in statistical production.
Harry Goossens Centre of Competence on Data Warehousing.
The business process models and quality issues at the Hungarian Central Statistical Office (HCSO) Mr. Csaba Ábry, HCSO, Methodological Department Geneva,
System A system is a set of elements and relationships which are different from relationships of the set or its elements to other elements or sets.
Introduction to Quality Management Frameworks Eurostat, Luxembourg, January 2016 Process quality Dr Johanna Laiho-Kauranne.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
United Nations Economic Commission for Europe Statistical Division GSBPM in Documentation, Metadata and Quality Management Steven Vale UNECE
Managing Data Resources File Organization and databases for business information systems.
Introduction to Statistics Estonia Study visit of the State Statistical Service of Ukraine on Dissemination of Statistical Information and related themes.
Towards more flexibility in responding to users’ needs
Towards connecting geospatial information and statistical standards in statistical production: two cases from Statistics Finland Workshop on Integrating.
Generic Statistical Business Process Model (GSBPM)
YTY − an integrated production system for business statistics
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
C.U.SHAH COLLEGE OF ENG. & TECH.
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
“The role of S-DWH in the ESS 2020 modernization process”
Chapter 1: The Database Environment
SDMX in the S-DWH Layered Architecture
The Database Environment
Mapping Data Production Processes to the GSBPM
Metadata used throughout statistics production
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
Presentation transcript:

“Mapping the GSBPM on a SDW architecture” National Institute of Statistics – Italy “Mapping the GSBPM on a SDW architecture” Antonio Laureti Palma IT - Structural Business Statistics Unit Workshop ESS NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION 22 & 23 september 2011

Overview The aim of this study is to define and contextualize a statistical data warehouse in order to define a framework to assist the development and definition of “data warehousing and data linking”. The data warehousing architecture presented can be considered as an IT-conclusion of the activities of the first year of the ESSnet. While, the modelling approach proposed it would indicate the roadmap for the future IT representation on the context. It will be described by: Data Warehousing as a Single Coherent Statistical production System Statistical Data Warehousing an Architecture schema Modeling the Business Domain - Designer’s view of the GSBPM on DWA schema Modeling the Data/Metadata Domain Conclusion

The Data Warehouse IT definition: In computing, a data warehouse is a database used for reporting. …the concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse" (from Wikipedia). ...as Bill Inmon says - “the data warehouse is at the center of the corporate information factory, which provides a logical framework for decision support environments and business management capabilities”. ...in essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to delivering business intelligence.

Data Warehousing for Enterprise DW centrality in an enterprise is obtained trough a IT infrastructure transversal to all the operational systems. The data from operational systems are Extracted Transformed and Loaded (ETL) into the DW and then they are available for the DSS and MIS. MARKETING ETL DATA WAREHOUSE DSS Decision Support System RESOURCES ETL PRODUCTION ETL MIS Management Information System DISTRIBUTION ETL SALES ETL ENTERPRISE PRODUCTION LINE

Data Warehousing for Statistics In a NSI, if the DW is mainly used for improving production efficiency, like for an enterprise, it is transversal to the statistical production line: REGULATIONS ETL DATA WAREHOUSE DSS Decision Support System RESOURCES ETL SURVEYS ETL ADMIN DATA ETL MIS Management Information System ELABORATION ETL OUTPUT ETL STATISTICAL PRODUCTION LINE

Data Warehousing for Statistics In a NSI, if the DW is used for “improving the production efficiency” (DSS-MIS) and for “creating the statistical product” (SD), then the DW is part of the production line. …in this case, the DW could be considered as a single logical repository, the center of the information factory, of all information generated from the NSI: REGULATIONS DATA WAREHOUSE SD Statistical Dissemination ETL RESOURCES ETL DDS STATISTICAL PRODUCTION LINE SURVEYS ETL MIS ADMIN DATA ETL

From the survey, two issues arise:   Single coherent system (questions 6 to 13) 15 counties declare they do not have a single coherent system, even if 11 out of them are planning to change it... this situation will probably largely change in the next five years... Current output requirements are not integrated into data systems for 10 countries and the situation will probably change for half of them... Those who have a single coherent system do not want to change it, metadata and data-input are totally integrated in the data system as well as admin data. Motivation to start DW (question 14) The main motivations are linked to the ways to (re)use data, the improvement of the efficiency and the process integration in business statistics production... Adjunct motivations are integrating the project in the organization processing model, reducing the burden (cost and time) on survey responders and increasing consistency and quality.

Disadvantages of a stove-pipe-like production In a stove-pipe production system every single production line corresponds to a specific domain of statistics, together with the corresponding production system. For each domain, the whole production process from survey design to dissemination, takes place independently of other domains, and each has its own data suppliers and user groups:  Structural Business Statistics Short Term business Statistics Information Society elaboration statistical output Science Technology Innovation data integration SBS SBS …. STS STS survey data administrative data IS IS STI STI I/O I/O Business Register

Data Warehousing as a Single Coherent System In a NSI, a single coherent Data Warehousing System (DWSys) is finalized to improve the production efficiency and to create the statistical products, in a full integrated way. From this view, the DWSys becomes the “effective” Information System of the full statistical production line. Then, the DWSys should be used to refer to the interaction between: People, Business Processes, Data and Technology. The Statistical Data Warehouse (SDW) then can be seen as a central statistical data store, regardless of the data’s source, for managing all available data of interest, improving the NSI’s ability to: (re)use data to create new data/new outputs; perform reporting; execute analysis; produce the necessary information.

DWSys Architectural description A DWSys Architecture (DWA) for statistics is a rigorous description of the structure of the NSI production, which comprises DWSys components (business entities or sub-process), the externally visible properties of those components, and the relationships (e.g. the behavior) between them. The DWA should be a framework for a NSI which defines how to organize the DWSys: provide the mechanisms for communicating information about the relationships that are important in the architecture provide the discipline to gather and organize the data and construct the views in a way that helps ensure integrity, accuracy and completeness support the application of method and use of tools

Layers of the enterprise architecture In the context of the creation of enterprise architecture it is common, to recognize four types of architecture, each corresponding to its particular architectural domain.

DWA – Business Domain To provide a DWA as detailed as possible, in the context of statistics production, we could articulate the business domain in four functional layers: data source layer, integration layer, interpretation and data analysis layer, access layer. Each layer has its data domain structure: operational data, for data warehousing meta data, the description data of the SDW, usually used to manage, describe and monitor the information systems.

DWA layered business architecture SOURCE INTEGRATION INTERPRETATION & DATA ANALYSIS ACCESS REGULATIONS STAGING AREA PRIMARY DATA DATA MART RESOURCES DISSEMINATION STATISTICAL SURVEYS 1 DATA MART SURVEYS n BUSINESS REGISTER DSS ADMIN DATA 1 DATA MART ADMIN DATA n MIS META DATA MANAGEMENT

DWA - functional Layer Source Database Layer: This level is responsible for, physically or virtually, storing the data from internal (surveys) or external (archives) sources for statistical purpose. Typical data sources, in the context of business statistics, are data from : specific surveys, like STS, ICT, CIS, SBS, Customs Agency, Revenue Agency, Chambers of Commerce, National Social Security Institute.

DWA - functional Layer Integration layer: It is used for all integration and reconciliation activities of data sources. Into this layer we have the set of applications that perform the main ETL, which manages: inconsistent coding for the same object, the consistency is obtained by coding defined by the data warehouse; adjustment of the different units of measurement and inconsistent formats; alignment of inconsistent labels, same object named differently. Usually the data are identified according to the definition contained in the metadata of the system. incomplete or incorrect data; in this case operation may require human intervention to resolve issues not predictable a priori.   data linking, in which different sources enable the creation of extended, or new, units of analysis.

DWA - functional Layer Interpretation and data analysis layer: The basic functions performed at this level are advanced analysis and interpretation of data-elaborations, both based on statistical algorithms. Here “statistical expert users” operate to produce strategic value information, working with the maximum granularity data. Only a reduced number of users are allowed to access the data, in order to prevent lack of servers performance. This strategy of “process of information delivery”, where the demand for new statistical information does not involve the construction of new statistical production lines, but rather the creation of other data marts. Results of these activities are unplanned aggregate data for the next access layer or to develop software rules for next iteration, through data marts, regarded as subsets of the DW, usually oriented to a specific business line or team.

DWA - functional Layer Access Layer: It is the layer for the final presentation of the information sought, addressed to a wide typology of users, not necessarily expert on business statistics, or informatics instruments. They are: Specialized Business Intelligence tools: in this extensive category, in terms of solutions on the market, we find tools to build queries, navigational tools (OLAP viewer) including Web browsers; - Graphics and publishing tools: the Business Intelligence tools are able to generate graphs and tables for its users, this solution consists essentially in just a couple of steps to avoid inefficiency. Office Automation tools: this is a reassuring solution for users who come for the first time to the data warehouse context, as they are not forced to learn new complex instruments. The problem is that this solution while adequate with regard to productivity and efficiency, is very restrictive in the use of the data warehouse, since these instruments, have significant architectural and functional limitations;

DWA – Modeling the Business Domain The designer's view of business is also known as the analytical view and there are various standards for modeling this view. One mostly commonly used modeling standard is the Generic Statistical Business Process Model (GSBPM). The GSBPM definition by UNECE is (vers.4): “The original intention was for the GSBPM to provide a basis for statistical organizations to agree on standard terminology to aid their discussions on developing statistical metadata systems and processes. The GSBPM should therefore be seen as a flexible tool to describe and define the set of business processes needed to produce official statistics”. So, in order to define a general and comprehensive architecture for statistical production, it may be useful to identify and locate the different phases of a generic statistic production process on the different DWA’s functional levels.

Generic Statistic Business Production Model

DWA - Mapping the GSBPM on DWA The analysis of sub-processes locations on a SDW architecture is graphically represented in the next slides, with: SDW functional layers on the horizontal axis and the nine GSBPM phases on the vertical axis. Each element inside the graph is a sub-process, we will consider from the 4td to the 7td GSBPM phases. That is only an example of Model Processing. Each case must be validated and discussed on the different operational context this is just a basis for setting and starting the modelling work for the next two year of the ess-net. In the context, each sub-process must be regarded from either a: methodological, planning, technological, operational, point of view. Blank sub-processes are related to methodological, or planning, metadata definitions, meanwhile brown sub-processes are related to operational, or technological, function for data elaboration.

Designer's view - Mapping the GSBPM on DWA Sub-Process of the GSBPM allocated on the functional layers of the DWA. Interpretation and analysis Layer Source Layer Integration Layer Access Layer 7 Disseminate 7.1-update output systems 7.2-produce dissemination 7.3-manage release of dissemination products 7.4-promote dissemination 7.5-manage user support 6 Analyze 6.1-prepare draft output 6.4-apply disclosure control 6.3-scrutinize and explain 6.5-finalize outputs 6.2-validate outputs

Designer's view - Mapping the GSBPM on DWA Sub-Process of the GSBPM allocated on the functional layers of the DWA. Interpretation and analysis Layer Source Layer Integration Layer Access Layer 5 Process 5.1-integrate data 5.2-classify & code 5.3-review, validate & edit 5.4-impute 5.6-calculate weights 5.7-calculate aggregate 5.5-derive new variables and statistical units 5.8-finalize data files 4 Collect 4.4-finalize collection 4.1-select sample 4.2-set up collection 4.3-run collection

Designer's view – Modeling the Data Domain Graphic scheme of layered architecture with a focus on “statistical data”:

SDA – Modeling the Meta Data Domain Our purpose is to refer to an IT infrastructure of SDW, so we should consider only structured metadata articulated as: Structural Metadata (SM), they are used for description, identification and retrieval of statistical and quality information. Moreover they could link the various different components of the SDW; Process Metadata (PM), they are used to store the data usage and maintenance of process administration, as well as the proper information for automatic execution of work flows or management systems. Both of them can be Active, when they enables operational use, manual or automated, for one or more processes, or Passive in all other uses.

Designer's view - Modeling the Meta Data Domain Graphic scheme of layered architecture with a focus on “meta data”:

Conclusion We have contextualized the statistical production in a Data Warehousing Architecture. So, we have introduced a general Enterprise Architecture vision for a SDW production system. We have showed as the GSBPM representation can be used for modelling the business domain of the SDW layered architecture, for a complete operational view for the deploy of statistical production cases. Finally, we have showed the corresponding four level data-domain of the architecture for a Statistical Data Warehouse.