GESMES and SDMX-ML - Practical issues

Slides:



Advertisements
Similar presentations
Slide 1 Eurostat Directorate B – Statistical methods and tools; dissemination Towards implementation of SDMX – 9/11 January 2007 SDMX Open Data Interchange.
Advertisements

Eurostat Unit B3 – Statistical Information Technologies Data transmission tools and services 15/05/ eDAMIS The standard solution for transmitting.
1 Annual National Accounts  1. Situation of OECD annual national accounts database  2. New features of the joint OECD-Eurostat questionnaire  3. COFOG2.
Eurostat Unit B3 – IT and standards for data and metadata exchange SDMX Basics Training – 2012 IT architectures for data exchange SDMX-RI and the Hub approach.
13-Jul-07 Implementation of SDMX for data and metadata exchange Balance of Payments Working Group 2-3 April 2012 Daniel Suranyi Eurostat B5 Management.
Eurostat 1 7a. Practical use case 1: Pesticides Use Project Blanaru Cristina Eurostat Unit B5: “Central data and metadata services” SDMX Basics course,
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
SDMX IT Tools SDMX use in practice in NA
7b. SDMX practical use case: Census Hub
Implementation of SDMX for Balance of Payments Balance of Payments Working Group 9-10 April 2013 BP Daniel Suranyi Eurostat B5 Management of statistical.
IT Directors’ Group Meeting October 2010 Sharing data validation tools in the ESS Christine WIRTZ – Head of Unit B3 Georges PONGAS – Unit B3 Daniel.
COORDINATING GROUP FOR STATISTICS ON TRANSPORT
The CVD Metadata Handler
Data validation rules Item 3b Eurostat Task Force on Annual Financial Accounts Frankfurt, 4 March 2016.
Update on the UOE 2012 data collection
Data reception and dissemination status Item 5 of the Agenda
Practical use case of SDMX (1): Short-term Statistics (STS)
Data collection of 2012: Data transmission standards and tools
Disseminating statistics: Internet and Publications course
SDMX Visualisation.
2. An overview of SDMX (What is SDMX? Part I)
Eurostat – Units E2, B5 Cristina BLANARU
IT Directors Group October 2006
Environmental Accounts Working Group March 2011
Draft EP/Council Regulation for processes, standards and
Workshop on ESA 2010 transmission programme – What and how?
Education and Training Statistics Working Group
eDAMIS The single entry point
Task Force on Annual Financial Accounts
Data Transmission Tools & Services EDAMIS, SDMX, Validation
ESS.VIP VALIDATION An ESS.VIP project for mutual benefits
IT Issues and TRIS Nico Hilbert 13-Jul-07.
Working Group on Population and Housing Censuses
SDMX in the S-DWH Layered Architecture
SDMX: an Overview Abdulla Gozalov UNSD.
SDMX as basis for water data reporting
LAMAS Working Group October 2014
Sharing data validation activities in the ESS.
Item of the Agenda Latest developments in eDAMIS and progress in the coverage of the Single Entry Point Vincent Tronet and John Allen Eurostat Unit.
Point 6. Eurostat plans for Time Use Survey data processing and dissemination Working Group on Time Use Surveys 10 April 2013.
A review of the 2011 census round in the EU, including the successful implementation of a detailed European legal base First meeting of the Technical Coordination.
Unit D2: Regional Statistics New Eurostat rules Data Transmissions
CRIME - Data Transmission
EDAMIS: report on two outstanding issues
Item 4.3 – Repeal of CVTS legal acts
Data Transmissions Tools and Services
Demography applications of SDMX Giuseppe SINDONI, Unit B3
SDMX IT Tools SDMX use in practice in NA
EuroGroups register First results of measures on advancement
Tools for transmitting data to Eurostat The Single Entry Point (SEP)
9. Practical use case 3: Pesticides Use Project
SDMX Implementation The National Accounts use case
Situation of Annual National Accounts data and metadata in 2009
GENEDI EUROPEAN COMMISSION - EUROSTAT GENERIC EDI TOOLBOX
The GLC Questionnaire for 2007
Modernisation of Validation in the ESS Collaboration with countries
European Statistical System Metadata Handler ESS MH (Super) Providers
EDAMIS INSTALLATION IN CYPRUS COSTAS DIAMANTIDES STATISTICAL SERVICE OF CYPRUS GLC15 20 – 21 OCTOBER, 2005.
eDAMIS – Statistics of usage
Questionnaire 2009 – Assessments
EDIT data validation system Ewa Stacewicz EUROSTAT VALIDATION TEAM
5. SDMX: General input requirements
16th Meeting of the Working Group of Local Coordinators for Data Transmission March 2006 Agenda Item 009 GENEDI V. Tronet 2.
Validation Activities in the ESS What you will hear today…
Future of EDAMIS Webforms
Coverage of Single Entry Point (SEP)
EDAMIS3: CURRENT STATUS
Daniel Suranyi, Krassimir Ivanov
Presentation transcript:

GESMES and SDMX-ML - Practical issues 16th Meeting of the Working Group of Local Coordinators for Data Transmission 13-14 March 2006 Agenda Item 15 GESMES and SDMX-ML - Practical issues John Allen 2

Time for assessment More than 10 years background for GESMES 2 revolutions in 10 years: email & internet eDAMIS developments in last 2 years SDMX strategy endorsed by ITDG New Code of Practice for the European Statistical System

Objective for data collection EDI => efficient, automatic data collection: data and metadata are associated, agreed data structures are used EDIFACT and SDMX are not objectives, they are the means to reach the objective

Current situation – data providers Very decentralised data collection : 32 countries providing detailed data 1000 Competent National Authorities (800 in EU25 only) The IT architecture of the NSI (main data providers) itself is very decentralised and heterogeneous (results of the GLC Questionnaire on 30 NSI in 2005): all NSI use several production environments (20 have more than 5 production environments) no single source for transmission of data to Eurostat at least 27 countries send data to Eurostat directly from their production environments => Technical issues linked to data transmission to Eurostat often in the hands of the staff of statistical production units in NSIs

IT architecture in NSI (GLC Questionnaire – 2005) One Production One Data Warehouse Dissemination environment Schema 1 0 cases (0%) One Data Warehouse Dissemination environment Schema 2 5 cases 17% Several Production Dissemination environment Schema 3 8 cases (27%) Several Data Warehouses Several Production Dissemination environment Schema 4 17 cases (57%) Several Production

Situation in Eurostat – resources (CVD Questionnaire – 2005) Staff and workload: 444 persons work in 20 main Units involved in the collection of data Equivalent of 72 persons (intra-muros) full time work on reception of data about 33% of the workload of the statisticians in charge of data production Conclusions Data collection => important burden (manual work) on production units, often tedious tasks, manual work can lead to delays and data losses or mistakes, workload related to data collection closely linked to the number of data providers. Workload per task

How is GESMES produced? (results of GLC questionnaire in 2004) As export from database? (26 NSIs responded) only 3 NSI (CZ, DK, NO) produced GESMES exclusively as exportation format of their database 6 NSI (FR, IT, NL, PT, FI, SE) produced GESMES partially (for some domains) as exportation format of their database 23 NSI use Excel or structured flat files (CSV, tab delimited or fixed length records) as input format to generate GESMES Using which tools? (27 NSI responded) 18 NSI use GENEDI 11 NSI use an Excel add-in (mainly for National Accounts) 11 NSI use a direct export filter linked to their database, or an in-house product or a public/commercial product.

Figures of GESMES transmissions About 66% of the data files received via the Single Entry Point (SEP) are in GESMES format: Almost all data files received for the following domains are in GESMES format: BOP, ESAxxxx, LCI, PRODCOM, STSxxxx Many data files received for transport statistics are also in GESMES format Other domains are almost not covered But SEP represents only 25% of the files sent to Eurostat => GESMES coverage is probably below 20%

EDI objective reached ? results of 2004 LC Questionnaire showed that GENEDI or Excel add-ins (ESA – National Accounts) are used in most cases to produce GESMES results of the 2005 GLC Questionnaire showed that NSI do not have central databases used to transmit data to Eurostat contacts with statisticians in NSI (missions in NSI or visits to Eurostat) showed that statisticians mostly use manual approaches to generate GESMES (with GENEDI or other tools) problems detected by the Eurostat GESMES support team show that many GESMES files were built or modified manually Therefore, the objective of efficient, automatic transmissions from NSIs to Eurostat is globally not reached

Main Eurostat tools used in NSIs GENEDI: for flat files (data records extracted from databases) basic data validation and GESMES generation used in transport, SBS, STS, LCI, Prodcom and Comext Excel add-in: for tabular data data entry, basic validation and GESMES generation used for national accounts (ESA data)

Bad practices GESMES produced manually by statisticians and sent manually to Eurostat: typically at least 2 manual steps are carried out: production of GESMES file from GENEDI or Excel add-in (often in several sub-steps) creation of a second envelope in SWA or eWA to send data to Eurostat In many cases, additional steps are done such as: data prepared in Excel from several sources (selection, aggregation, calculations, layout) codes converted or fields mapped with GENEDI specific modules GESMES header changed using a text editor

Bad practices - example 6 manual steps: User connects to the database, selects the data needed for Eurostat, extracts and saves it as « CSV » User opens the CSV file with Excel User opens an Excel table with macro instructions that generate ARR++ segments and copy/paste the data from the CSV file User runs the macro to generate ARR++ segments and saves the file User opens the file with Notepad, copy/pastes a standard GESMES header and footer and changes some identification information in the header (or forgets to change it!). User saves the GESMES file. User sends manually the GESMES file via SWA or eWA (new envelope created)

Recommended approach in NSI with eWA Eurostat CNA Data One action: extraction Production unit Acknowledgement eDAMIS Monitoring Archive Dispatching Notification eWA Basic validation automatic transmission local Monitoring Notification GESMES or SDMX-ML Data Statel Statel

Alternative solution For GESMES « non-priority » domains, the following approach is simple to implement and can be made automatic: export data as « structured flat file » (CSV) save using the file name convention in the eWA intray directory If needed, eDAMIS will handle the data format conversion at Eurostat before delivery to the production unit

Conclusions (1) efficient, automatic data transmission is the objective GESMES and SDMX-ML are means (transmission formats used to implement agreed data structure definitions) for GESMES « non-priority domains », structured flat files (CSV) are currently acceptable to implement agreed data structure definitions; in future, SDMX-ML implementations may be developed, in agreement with the domains concerned because it uses XML, SDMX-ML can be generated relatively easily from many databases, and it can be processed using standard XML tools

Conclusions (2) eDAMIS will handle (including basic validations) data structure definitions implemented in GESMES, SDMX-ML and CSV formats staff of statistical production units in NSIs should not care about the data format: this technical issue should be treated by the NSI’s information systems implementation of an EDI strategy requires optimisation of the NSI’s IT architecture: ideally all data for Eurostat are available in a single data warehouse which could export data directly in GESMES or SDMX-ML format

What’s next ? further assessment of current practices in NSIs and Eurostat (GLC questionnaire, possible studies…) early experience in implementation of SDMX in selected domains will enable a realistic approach to more efficient practices in data transmission and dissemination by using SDMX stand-alone format conversion tool will not be supported by Eurostat for SDMX-ML production in NSI SDMX open-source software « components » may be used by NSIs in their information systems development; Eurostat will contribute to the development of these components