Download presentation
Presentation is loading. Please wait.
1
GESMES and SDMX-ML - Practical issues
16th Meeting of the Working Group of Local Coordinators for Data Transmission 13-14 March 2006 Agenda Item 15 GESMES and SDMX-ML - Practical issues John Allen 2
2
Time for assessment More than 10 years background for GESMES
2 revolutions in 10 years: & internet eDAMIS developments in last 2 years SDMX strategy endorsed by ITDG New Code of Practice for the European Statistical System
3
Objective for data collection
EDI => efficient, automatic data collection: data and metadata are associated, agreed data structures are used EDIFACT and SDMX are not objectives, they are the means to reach the objective
4
Current situation – data providers
Very decentralised data collection : 32 countries providing detailed data 1000 Competent National Authorities (800 in EU25 only) The IT architecture of the NSI (main data providers) itself is very decentralised and heterogeneous (results of the GLC Questionnaire on 30 NSI in 2005): all NSI use several production environments (20 have more than 5 production environments) no single source for transmission of data to Eurostat at least 27 countries send data to Eurostat directly from their production environments => Technical issues linked to data transmission to Eurostat often in the hands of the staff of statistical production units in NSIs
5
IT architecture in NSI (GLC Questionnaire – 2005)
One Production One Data Warehouse Dissemination environment Schema 1 0 cases (0%) One Data Warehouse Dissemination environment Schema 2 5 cases 17% Several Production Dissemination environment Schema 3 8 cases (27%) Several Data Warehouses Several Production Dissemination environment Schema 4 17 cases (57%) Several Production
6
Situation in Eurostat – resources (CVD Questionnaire – 2005)
Staff and workload: 444 persons work in 20 main Units involved in the collection of data Equivalent of 72 persons (intra-muros) full time work on reception of data about 33% of the workload of the statisticians in charge of data production Conclusions Data collection => important burden (manual work) on production units, often tedious tasks, manual work can lead to delays and data losses or mistakes, workload related to data collection closely linked to the number of data providers. Workload per task
7
How is GESMES produced? (results of GLC questionnaire in 2004)
As export from database? (26 NSIs responded) only 3 NSI (CZ, DK, NO) produced GESMES exclusively as exportation format of their database 6 NSI (FR, IT, NL, PT, FI, SE) produced GESMES partially (for some domains) as exportation format of their database 23 NSI use Excel or structured flat files (CSV, tab delimited or fixed length records) as input format to generate GESMES Using which tools? (27 NSI responded) 18 NSI use GENEDI 11 NSI use an Excel add-in (mainly for National Accounts) 11 NSI use a direct export filter linked to their database, or an in-house product or a public/commercial product.
8
Figures of GESMES transmissions
About 66% of the data files received via the Single Entry Point (SEP) are in GESMES format: Almost all data files received for the following domains are in GESMES format: BOP, ESAxxxx, LCI, PRODCOM, STSxxxx Many data files received for transport statistics are also in GESMES format Other domains are almost not covered But SEP represents only 25% of the files sent to Eurostat => GESMES coverage is probably below 20%
9
EDI objective reached ? results of 2004 LC Questionnaire showed that GENEDI or Excel add-ins (ESA – National Accounts) are used in most cases to produce GESMES results of the 2005 GLC Questionnaire showed that NSI do not have central databases used to transmit data to Eurostat contacts with statisticians in NSI (missions in NSI or visits to Eurostat) showed that statisticians mostly use manual approaches to generate GESMES (with GENEDI or other tools) problems detected by the Eurostat GESMES support team show that many GESMES files were built or modified manually Therefore, the objective of efficient, automatic transmissions from NSIs to Eurostat is globally not reached
10
Main Eurostat tools used in NSIs
GENEDI: for flat files (data records extracted from databases) basic data validation and GESMES generation used in transport, SBS, STS, LCI, Prodcom and Comext Excel add-in: for tabular data data entry, basic validation and GESMES generation used for national accounts (ESA data)
11
Bad practices GESMES produced manually by statisticians and sent manually to Eurostat: typically at least 2 manual steps are carried out: production of GESMES file from GENEDI or Excel add-in (often in several sub-steps) creation of a second envelope in SWA or eWA to send data to Eurostat In many cases, additional steps are done such as: data prepared in Excel from several sources (selection, aggregation, calculations, layout) codes converted or fields mapped with GENEDI specific modules GESMES header changed using a text editor
12
Bad practices - example
6 manual steps: User connects to the database, selects the data needed for Eurostat, extracts and saves it as « CSV » User opens the CSV file with Excel User opens an Excel table with macro instructions that generate ARR++ segments and copy/paste the data from the CSV file User runs the macro to generate ARR++ segments and saves the file User opens the file with Notepad, copy/pastes a standard GESMES header and footer and changes some identification information in the header (or forgets to change it!). User saves the GESMES file. User sends manually the GESMES file via SWA or eWA (new envelope created)
13
Recommended approach in NSI with eWA
Eurostat CNA Data One action: extraction Production unit Acknowledgement eDAMIS Monitoring Archive Dispatching Notification eWA Basic validation automatic transmission local Monitoring Notification GESMES or SDMX-ML Data Statel Statel
14
Alternative solution For GESMES « non-priority » domains, the following approach is simple to implement and can be made automatic: export data as « structured flat file » (CSV) save using the file name convention in the eWA intray directory If needed, eDAMIS will handle the data format conversion at Eurostat before delivery to the production unit
15
Conclusions (1) efficient, automatic data transmission is the objective GESMES and SDMX-ML are means (transmission formats used to implement agreed data structure definitions) for GESMES « non-priority domains », structured flat files (CSV) are currently acceptable to implement agreed data structure definitions; in future, SDMX-ML implementations may be developed, in agreement with the domains concerned because it uses XML, SDMX-ML can be generated relatively easily from many databases, and it can be processed using standard XML tools
16
Conclusions (2) eDAMIS will handle (including basic validations) data structure definitions implemented in GESMES, SDMX-ML and CSV formats staff of statistical production units in NSIs should not care about the data format: this technical issue should be treated by the NSI’s information systems implementation of an EDI strategy requires optimisation of the NSI’s IT architecture: ideally all data for Eurostat are available in a single data warehouse which could export data directly in GESMES or SDMX-ML format
17
What’s next ? further assessment of current practices in NSIs and Eurostat (GLC questionnaire, possible studies…) early experience in implementation of SDMX in selected domains will enable a realistic approach to more efficient practices in data transmission and dissemination by using SDMX stand-alone format conversion tool will not be supported by Eurostat for SDMX-ML production in NSI SDMX open-source software « components » may be used by NSIs in their information systems development; Eurostat will contribute to the development of these components
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.