Item 5.3 of the Agenda Census Hub – progress report Francesco Rizzo Unit B5 21-22 October 2009 IT Director's Group Meeting

2 Key factors for the Census Hub project
Main objective: the dissemination of the result of the censuses in EU should provide the users with an easy access to detailed census data that are methodologically comparable between the MSs and structured in the same way Starting point: Eurostat Census TF April 2007. analysis of the alternatives launch of the pilot Supervision: the Census TF and Census WG Beside the census experts, NSs I have designed the IT contacts with the main aim to be in touch with Eurostat and to co-ordinate the work when the NSI decide to join the Census Hub project 21-22 October 2009 IT Director's Group Meeting

3 The Census Hub architecture
SDMX standards Pull mode and Data sharing Hub SDMX Standards SDMX is anchored in international recognised bodies such as ISO SDMX is not just a new data transmission format, in fact it consist of: an Information Model; a content-oriented guidelines; an architecture for the efficient exchange and sharing of statistical data and metadata SDMX has its base an open, enduring format ("XML") rather than a proprietary one Pull mode: collecting organisation retrieves the data directly from the provider's website Data sharing: where a group of partners agree on providing access to their data according to standard processes formats and technologies Hub: is based on agreed hypercubes, but here the hypercubes are not sent to the central system, but data are fetched directly from the data producer databases when a user requests them 21-22 October 2009 IT Director's Group Meeting

4 Data Repository (Warehousing) Architecture
NSI Eurostat Pull Requestor eDAMIS Data Input SDMX Registry Intermediate storage Verification / Conversion To SDMX Received data in SDMX-ML Loader register Warehouse Eurobase query Dissemination XSL for P U L S H the Single Entry Point allows both push and pull methods. eDAMIS: is now able to recognise and deliver SDMX-ML files a very interesting pilot has been performing by NL on the MILK statistics The pull approach concerns the following steps: Step 1: when new data are available, the NSI should: Create an SDMX-ML file containing the new data, or Do nothing if the NSI WS builds SDMX-ML messages upon request Step 2: the NSI should add a new feed entry, including an SDMX-ML Query message describing the new Dataset, to the NSI feed the Pull Requestor reads the new feed entry and: Retrieves the SDMX-ML file from the specified URL, if it resides in a URL, or Uses the Query Message included in the feed to query the NSI WS, if the data are prepared by the NSI WS The Pull Requestor forwards the SDMX-ML dataset to the rest of the modules within Eurostat production environment 21-22 October 2009 IT Directors Group 4

5 Data Hub Architecture NSI NSI Data Portal Query NSI Retrieve dataset
Dissemination cache NSI XSL for SDMX-ML Based on the concept of data sharing, where a group of partners agree on providing access to their data according to standard processes formats and technologies The hub is based on agreed hypercubes, but here the hypercubes are not sent to the central system, but data are fetched directly from the data producer databases when a user requests them SDMX formats and architecture are used Data sharing a group of partners agree on providing access to their data according to standard processes, formats and technologies Data providers can: make data available directly from their systems through a querying system; Data users can browse the hub to define a dataset of interest via the above structural metadata; retrieve the dataset from the NSIs. 21-22 October 2009 IT Directors Group 5

6 The SDMX Census Hub reference architecture
NSI SDMX infrastructure NSI Census Data warehouse Eurostat Data retriever SDMX Query Parser Web Service Data base Hub SDMX-ML data generator The Web service receives a SDMX query, handles errors, validates the query, then forwards the SDMX query to the SDMX query parser The SDMX query parser breaks down the query and sends it to the SQL query builder The Data Retriever/SQL Query Builder creates one or more SQL queries and sends them to Database The SDMX Data Generator receives a resultSet from the Database, assembles it in an SDMX-ML data message and forwards it to the web service The Web service responds with an SDMX-ML message

7 Benefits in participating to the pilot project
participants will be part of a SDMX project that will let them share experiences among the different actors, both statisticians and IT staff, at different levels (planning, production, etc.); participants will build an IT infrastructure useful not only for this exercise but also for their 2011 census data warehouse using standards recognized at international level. the same SDMX architecture could be used in other projects with few or no changes 21-22 October 2009 IT Director's Group Meeting

8 Costs in participating to the pilot project
costs for implementing an SDMX infrastructure needed for the Census Hub Pilot Project are limited and can be embedded in the more general project that each NSI will support for the 2011 Census; the use of an XML-based data format will help to reduce costs of implementation as follows: many NSIs are already using, or planning to use XML as the basis for their data management and dissemination systems; a wide selection of IT commercial applications and tools are available to work with XML-based data; expertise for working with XML is readily available and will often be available in-house. All the Countries that already developed the SDMX infrastructure have prepared a final report. In the final report, available on CIRCA, you can find cost/benefit evaluation: IR 120 man days; DE ; PT 9 man days; IT 15 man days; CZ 30 man days 21-22 October 2009 IT Director's Group Meeting

9 The Census Hub Project National Statistical Institute Eurostat Census
The workflow is: Step 1: a “data user” browses the Hub to define a dataset of interest via structural metadata. He browses the dimensions and selects a dataset. Then he chooses the organization of the output layout specifying which dimension wll match X-axis and Y-axis and which dimension will vary item after item to generate new tables Step 2: The Hub converts the user request into an SDMX Query and sends the SDMX Query to an interested NSI Web Service Step 4: The NSI Web Service converts the SDMX Query in a set of SQL queries and sends them to the NSI data warehouse Step 5: The NSI data warehouse sends the result to the NSI web service Step 6: The NSI Web Service converts the result in a SDMX-ML Data message and sends it to the Hub Step 7: The same steps are repeated if the user has requested data from different NSIs Step 8: the Hub puts together all the SDMX-ML data messages coming from the interested NSIs and presents the result to the “data user” in the web browser in readable format. 21-22 October 2009 IT Director's Group Meeting

10 21-22 October 2009 IT Director's Group Meeting

11 Census Hub Pilot project – statistical aspects
The planned pilot hypercube is very simple one in order to let NSIs to produce it in a short period; Data are not expected to be exact, just simulated to ensure a proof of concept; Data must comprise the following dimensions: Sex Age Current Activity Status Geography 21-22 October 2009 IT Director's Group Meeting

12 The Census Hub pilot project phase I – main results
Verified the feasibility of the project Developed the central software (Eurostat) Developed 4 NSI SDMX infrastructures (DE, IE, IT, PT) Produced the “Census Hub Web Service Implementing Guidelines version 1.0” by Eurostat Produced the specification for the pilot phase II Eurostat provided technical advice through bilateral meetings (IT, PT, LV, BG) Eurostat provided technical advice to help NSIs get started 21-22 October 2009 IT Director's Group Meeting

13 The Census Hub pilot project phase II milestones and interim results
Involving other Member State in the project BG, CZ, EE, MT, SI, ES, PL (UK will join starting from summer 2010; CY is in the process of deciding) Providing technical advice in implementing a SDMX IT architecture Bilateral technical meetings (HU, MT, SI, FI, UK, ES, NL) Designing a generic reference SDMX architecture for NSIs Development of SDMX Reference Architecture components (in .NET and Java) Developing and testing additional functionalities for the central application New GUI Cache system and multi threading Off-line queries for bulk requests Developing all the necessary DSDs related to more than 100 hypercubes 21-22 October 2009 IT Director's Group Meeting

14 NSI SDMX Reference Architecture
A simplified picture of the SDMX Reference architecture for MSs The architecture represents the syntheses of several experiences worldwide and can be considered not a strict specification rather than a guide or “best practice” document The main objective is to provide a description/specification of a generalized architecture to be used partially or as whole by MSs interesting in starting SDMX projects Dissemination database This is the final storage data warehouse being kept from each NSI for data that can be published to potential Data Consumers Mapping assistant This module is responsible for creating the mappings between an SDMX Data Structure Definition (DSD) and a DB schema (dissemination database) or a set of dissemination data files. It maps the DB schema from the database to the SDMX DSD (“SDMX Structure File” artefact) Mapping Store This artefact is responsible for keeping the mappings between the SDMX and the native format (a file or a DB schema) SDMX Structure File This artefact is the SDMX-ML Data Structure Definition required by the “Mapping Assistant” module in order to map its component (i.e. Dimensions, Attributes, Measures) to the dissemination database columns and tables Data Retriever This module is responsible for querying the dissemination database and getting the respective recordset Data Loader This module is responsible for loading new data from the NSI’s production environment/database to the dissemination environment/database and updating the module “RSS Generator” SDMX-ML Data Generator This module is responsible, upon receiving the recordset and the respective mappings from the “Data Retriever”, for generating an SDMX-ML Dataset message Web Service Provider This module is responsible for exposing the Dataset using a Web Service interface that provides SDMX-ML messages RSS Generator This module is responsible for generating a feed entry on the event of new data arriving from the “Data Loader”. SDMX Query Parser This module is responsible for getting the request from the “Web Service Provider” and populate the internal data model i.e. sdmx data model 21-22 October 2009 IT Director's Group Meeting

15 Census Hub Demonstration
21-22 October 2009 IT Director's Group Meeting

16 Conclusion The Census Hub pilot project phase I has demonstrate the feasibility of the proposed architecture The Census Hub pilot project phase II represents a consolidation of the entire project, both from a technical and a participation point of view The used architecture represents the most advanced example of the data sharing architecture detailed in the SDMX standards As the pilot has been planned as simple as possible, NSIs would participate with a minor effort Countries may confirm their participation through their representative in the Census Task Force or Census Working Group 21-22 October 2009 IT Director's Group Meeting

