Download presentation
Presentation is loading. Please wait.
1
Eurostat validation grants: outcomes
Item 4.2 Data Transmission Coordinators Group meeting 17-18 June 2019 Luca Gramaglia Eurostat, Unit B5
2
History of validation grants
3
Timeline of validation grants
Participating countries Timeline ESSnet ValiDat Foundation DE, IT, NL, LT ESSnet ValiDat Integration DE, LT, NL, PT, PL, SE 2017 Validation Grants IT, LT, HR, PL 2018 Validation Grants FR, HU, IS, NL, PL, PT, SE 2018-Ongoing Over the past years, Eurostat has launched several rounds of grants on validation: Two ESSnet (multi-beneficiary grants) Two rounds of individual grants
4
Main deliverables of validation grants
ESSnet ValiDat Foundation Methodological Handbook for Data Validation Study on VTL ESSnet ValiDat Integration Update to the Methodological Handbook for Data Validation Proposal for a standard validation report structure Cost-benefit analysis on implementation scenarios
5
Main deliverables of validation grants
Studies on the internal organisation of validation (LT, HR) Development of a VTL Editor (IT) Development and testing of national VTL-based validation approach (PL) Final reports not publicly available yet. 2018 Validation Grants Comparison of different validation scenarios (FR, HU, PT, SE) Improvement of internal validation processes / tools (IS, NL) Work on the implementation of VTL and “standard” Validation reports (PL) Still ongoing
6
Highlights of 2017 grants IT: VTL Rule Editor
7
Highlights of 2017 grants PL: VTL-based approach to data validation in the Farm Structure Survey domain
8
2018 Grants: some issues
9
Analysis of different scenarios
Many beneficiaries of the 2018 round of individual grants chose to test and assess the costs and benefits of the different scenarios proposed for the implementation of the ESS Business Architecture for validation: Scenario 1: Use the Rules (HU, PT)
10
Analysis of different scenarios
Scenario 2: Use the Services (FR, HU, PT, SE) Scenario 3: Use the Process (FR, HU, PT)
11
Analysis of different scenarios
Testing the different scenarios creates a dependency between the grant actions and the availability of deliverables coming from Eurostat. Scenario Needed Eurostat deliverables Scenario 1 VTL rules available for the domains covered by the grants Scenario 2 STRUVAL service available for replication by data providers CONVAL service available for replication by data providers Scenario 3 Validation process available for the domains covered by the grants
12
Fixing the problems Eurostat will work with the individual beneficiaries affected by the delays. Eurostat has also made some organisational changes to the way validation is organised which should improve coordination in the future and avoid these issues. Before Unit B1 (grants + standards) Unit B5 (implementation) Unit A3 (developments) After Unit B5 (implementation + grants + standards) Unit A3 (developments)
13
Validation Benchmarking
14
Benchmarking ESS partners
Project overview The three main objectives of the project are: To perform the benchmark of architecture and IT landscape of MSs (NSIs) in four domains: Data Validation European System of Interoperable Statistical Business Registers (ESBRs) Data Management (SERV) Linked Open Data (LOD) To provide tailored recommendations for ESS project managers and ESS partners To assess the global interoperability of the ESS based on the data collected during the benchmarking and provide recommendations It is important to note that the decision was taken to put the ESBRs stream of the benchmarking exercise on hold. European statistics are produced by the European Statistical System (‘ESS’) Eurostat and the other members of the ESS jointly developed a common vision: the ‘ESS Vision 2020’ To allow the sharing of information, services and costs among ESS partners, the following is required: The development of a reference ESS EA More standardised processes Increased interoperability between processes Standardised solutions Within this context EA was set up as a supporting framework to contribute to the ESS Vision objectives The ESS EA RF aims to guide the modernisation of the ESS according to the ESS Vision 2020 However, there is still a gap toward implementation and ESS stakeholders, mainly NSIs, need to be guided to assess their architecture readiness Context Objectives and scope
15
Benchmarking exercise on validation
In the context of the implementation of the ESS Vision 2020, Eurostat launched a benchmarking exercise to assess the maturity of the ESS as a whole towards reaching the objectives of the Vision. One of the topics chosen for this exercise was “data validation”. During this exercise, 14 countries were surveyed and specific ESS-wide recommendations were made for the way forward.
16
Data validation Overview (1/2) Dimension Findings
Harmonisation of the approach Highly diverse landscape of validation tools, methodologies and processes across NSIs and across domains mainly due to the fact that: NSIs are at different stages of maturity, having started sending data to Eurostat at different points in time; Eurostat provides more than 10 tools to the NSIs for the preparation and the validation of their data prior to official transmission; Each domain has both data and organisational particularities (e.g. many different teams in the domain, the nature of the statistics produced by each domain and the data sources used etc.). IT centralisation & data storage Data storage is mostly domain specific and secondarily, IT platform specific; A single data warehouse exists in few NSIs (EE, HU, NL, PL, PT), but it does not cover the data of all statistical domains; Many NSIs have a centralised data structure and code-list repository, but not a centralised validation rules repository; A good level of metadata centralisation has been achieved in some NSIs (e.g. FR, HR, PT). Validation process In general a number of manual steps are required for the preparation and validation of one single file for Eurostat; Data validation and the generation of the data structures are the steps that result in the heaviest workload and during which, the most common difficulties appear; Domains that were able to apply largely the same processes for national and Eurostat data, reported less issues with the production process as well as higher satisfaction with Eurostat tools. Documentation and rules The validation rules and the validation process are documented in the majority of the surveyed domains, but are rarely centrally stored; The majority of the NSIs uses the validation rules defined by Eurostat, with some domains like STS using them to a lesser extent. Most of the NSIs use also additional validation rules for their national needs; The level of awareness about the levels of validation is very high across the ESS; It is often challenging to identify the latest data structures, validation rules and other documentation that the NSIs should be using. Eurostat tools SDMX Converter is the most commonly used Eurostat tool. It is mostly used in manual mode, due to lack of consistent communication; MS-Excel templates are still used by many NSIs for the NA domain, while Eurostat would like to stop to support this template; SDMX-RI implementations exist in a number of NSIs, but only for a small number of domains (e.g. NA and Census); The validation feedback is quickly received – with exceptions having been reported, but the reports can be difficult to understand; Many Eurostat tools require manual inputs from data providers, copy and paste of data or the transmission of files one by one.
17
Data validation Overview (2/2) Dimension Findings Share & reuse
Most countries are open to share and reuse validation tools/services, but the actual sharing and reuse is not widespread. The usage of open-source components is limited to few NSIs (IT, IS, NL). The main obstacle is the lack of formal support; Scenario 2 - with STRUVAL and CONVAL being used as replicated validation services for validating both output data for Eurostat and (especially for CONVAL) national data – is the preferred one. In general, NSIs are interested in a replicated validation service that they could use to run their own national validation rules at an earlier stage. Standards (VTL, SDMX) NSIs have stated an interest in VTL, with many of them having already projects to use it in the future (IT, LT, NL, PL, PT, SE). However, concerns about the translation of VTL into other languages (e.g. R) and the possibility of preparing checks of data from more than one data flow with VTL were expressed; The SDMX standard is widely used by the NSIs, but the level of adoption and the way the SDMX-ML files are generated are different for different domains and NSIs. INSEE (FR) is the only NSI that generates SDMX-ML files directly with their own tools. Enterprise architecture The level of adoption of service-oriented architectures is low across NSIs. This is because NSIs still have legacy systems in the different statistical domains; The usage of an Enterprise Architecture Framework and of advanced integration technologies like an ESB is very limited across ESS; A few NSIs try to move towards a more service-oriented architecture, by implementing micro-services; NSIs prefer to implement validation web services locally in their own infrastructures to use them also for national or confidential data; Technology landscape in the ESS SQL is used by all the surveyed NSIs; SAS and Java are the next most common technologies across ESS, with SAS being used in some domains as an end-to-end solution for statistical production and some respondents proposing SAS validation scripts to be made available for their domains; Few ESS members, spearheaded by the Netherlands, have a strong preference for open-source technologies and R; Virtualisation, artificial intelligence and cloud computing are or will be the most important technologies for the NSIs.
18
Key recommendations (1/2)
Data validation Key recommendations (1/2) Key recommendations Dimension Recommendation Owner Harmonisation of the approach Adopt a product management approach across domains to create a common roadmap for validation. This effort has to be supported by enterprise architecture to ensure strategic product platforms are aligned with common ESS standards. Eurostat Harmonisation of the approach - IT centralisation Continue down a pragmatic path of centralisation and standardisation with the aim to reduce the number of domain specific solutions over time. Build common IT solutions, customisable per statistical domain, based on centralised S-DWHs and metadata management capabilities. NSIs Use new operating models like the product-driven organisation to build domain oriented practitioner communities across IT system teams and departmental borders. These domain specific groups could be extended further and combined with an NSI chapter on validation to ensure that best practices are communicated and systems reused whenever possible. Eurostat - NSIs Validation process Value-map business processes to identify wasteful steps and to establish how they can be improved. An example from the best practices observed is the automation of manual process steps e.g. the automation of the SDMX-ML conversion. Provide tangible accelerators and other reusable assets to help NSIs adopt standards. In fact, while Eurostat and the ESS have encouraged adoption of standards and methodologies for statistical production, these have to date only had a limited impact on the day-to-day production processes in the NSIs. Standards provide high-level guidance, but tangible accelerators and other reusable assets should be provided to help NSIs adopt standards. Documentation Implement a central information portal for all domains and systems, where all relevant documentation and software versions can be accessed by the NSIs. There should always be a recommended version, as well as archived and beta versions. An additional feature could be user reviews/ratings and a discussion/support forum for each topic to gather user feedback and foster collaboration. Introduce customisable thresholds per country related to validation rules to reduce the number of warnings and errors. Manage documentation on the NSIs side across domains centrally, initially with a focus to increase visibility within and across domains. Eurostat tools Evaluate the specific requests by the NSIs regarding specific enhancements of the IT systems provided by Eurostat. A cost-benefit analysis would enable Eurostat to draw conclusions on which specific enhancements could be implemented. Establish a user experience centre of excellence or task force that works with the different domains and technology teams at Eurostat to define and implement requested and new improvements to key systems. Investigate how applications that face peak time performance issues could be moved to the cloud to benefit from increased scalability and elasticity. Produce a quick start guide and automation manual for all tools that Eurostat provides. Existing guides and manuals (e.g. the one for the Eurostat SDMX converter) could be enhanced to start with a use case chapter.
19
Key recommendations (2/2)
Data validation Key recommendations (2/2) Key recommendations Dimension Recommendation Owner Share & reuse Establish an ESS open-source community, potentially on GitHub or on another open-source collaboration platform that becomes a single point of access for sharing and reuse between member states. Eurostat – NSIs Use open-source components to shorten the development cycles and reduce costs, if the capabilities to customise and maintain them exist in the NSI. NSIs Standards (VTL, SDMX) Review the end-to-end process for the production of SDMX via SDMX-RI and the converter and also, the input data formats supported. NSIs can review their implementations of SDMX converter in the different domains and to implement batch mode where the converter is still used in manual mode. Eurostat already works to support SDMX-CSV, standard CSV as well as a SDMX compatible Excel. Eurostat - NSIs Support collaborations in relation to VTL and for solutions, with a wider relevance for multiple ESS members, for Eurostat to create supported base libraries to accelerate the development of translators in the NSIs. Eurostat Enterprise architecture Define a model to make validation web services available for NSIs to implement in their own infrastructures (replicated); this will accelerate the adoption of these web services, hence legacy solutions could be retired sooner. Review NSIs’ EA strategy to define how NSIs can increase the flexibility of their IT landscape while aiming for standardisation from statistical business services down to the IT system level. This should be done based on a pragmatic approach that prioritises efficiency. Technology landscape in the ESS Confirm when ESS members consider replacing SAS with open-source alternatives. This can be used to build a roadmap, where NSIs can share their timeframe for adopting e.g. R driven solutions. Based on this roadmap collaboration, technology support and knowledge sharing events can be planned. Regularly review the IT landscape in the NSIs and then focus on the key technologies to support rather than supporting a wider range of tools, which would dilute the efforts by Eurostat. This point also relates to the aspect of harmonisation and the creation of a strategic product roadmap for statistical production. Standardise on the smallest number of technologies possible. This will result in synergies across domains and for IT. On the other hand, this should not be a justification to limit innovation and organisations can grant more flexibility when new technologies are trialled.
20
Next round of grants
21
Next round of grants Call for proposals for 2019 grants published at the end of May. Focus is narrower than previous grants: VTL developments and automation of the validation of data transmitted to Eurostat are the main topics. To avoid the issues experienced with the 2018 round of grants, the list of priority domains outlined in the call for proposal has been aligned with the list of domains where validation rules are available.
22
Questions & Discussion
Questions / Comments on the validation grants? Any comments / suggestions on the improvement actions that came out of the benchmarking exercise? DTCG members are invited to spread the word about the 2019 round of validation grants
23
Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.