EDIT – Eurostat’s editing tool

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
Eurostat The ESS.VIP Validation and its implementation in waste statistics Q2014 – Session 13 4 June 2014 Hartmut Schrör, Eurostat.
ESS VIP project on Validation
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Background Data validation, a critical issue for the E.S.S.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
The european ITM Task Force data structure F. Imbeaux.
Toward Generic Systems Shifra Haar - Central Bureau of Statistics-Israel.
13-Jul-07 Implementation of SDMX for data and metadata exchange Balance of Payments Working Group 2-3 April 2012 Daniel Suranyi Eurostat B5 Management.
Francesco Rizzo (ISTAT - Italy) SDMX ISTAT FRAMEWORK GENEVE May 2007 OECD SDMX Expert Group.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Editing Building Block (EBB) Validation Tool for FDI and ITS Balance of Payments Working Group 02 April 2012 Unit B4, IT for Statistical Production Georges.
Eurostat achievements and challenges Emanuele Baldacci, Director European Commission - Eurostat Director Methodology; Corporate statistical.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
SDMX IT Tools Introduction
SDMX and Metadata SDMX Basics Course 12 April 2013 Daniel Suranyi Eurostat B5 Management of statistical data and metadata.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing.
1 The EDIT System, Overview European Commission – Eurostat.
SDMX IT Tools SDMX use in practice in NA
7b. SDMX practical use case: Census Hub
Implementation of SDMX for Balance of Payments Balance of Payments Working Group 9-10 April 2013 BP Daniel Suranyi Eurostat B5 Management of statistical.
Networks and Information Models Roma Georges Pongas.
IT Directors’ Group Meeting October 2010 Sharing data validation tools in the ESS Christine WIRTZ – Head of Unit B3 Georges PONGAS – Unit B3 Daniel.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Eurostat Sharing data validation services Item 5.1 of the agenda.
Eurostat 6. SDMX: A non-technical overview of the SDMX architecture and IT tools 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services”
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
Training Course on EDIT
The Eurostat Metadata Handler Götzfried Eurostat (Head of Unit B6)
National Accounts World Wide Exchange
Training Course on EDIT 2013
The CVD Metadata Handler
SDMX Information Model
Workshop on the Validation of Waste Statistics
Eurostat EDIT 2012 Functional Presentation.
Introduction to Software Testing
ESS Vision 2020 Validation: Implementation of deliverables
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
2. An overview of SDMX (What is SDMX? Part I)
ESS Vision 2020: ESS.VIP Validation
2. An overview of SDMX (What is SDMX? Part I)
Data Validation in the ESS Context
SDMX Information Model: An Introduction
Data Transmission Tools & Services EDAMIS, SDMX, Validation
ESS.VIP VALIDATION An ESS.VIP project for mutual benefits
Statistical Information Technology
Giuliano Amerini Unit E6 (Transport)
Sharing data validation activities in the ESS.
Validation services developed in the ESS
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Prepared by Peter Boško, Luxembourg June 2012
Applying the ESS EARF in a VIP project: The ESS.VIP Validation example
Mapping Data Production Processes to the GSBPM
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Streamlining statistical production
Metadata on quality of statistical information
European Statistical System Metadata Handler ESS MH (Super) Providers
EDIT data validation system Ewa Stacewicz EUROSTAT VALIDATION TEAM
Validation Activities in the ESS What you will hear today…
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

EDIT – Eurostat’s editing tool SDMX - TWG Paris, 13-14 Dec. 2012

EDIT - features Eurostat generic data editing and imputation tool The process relies on a Scripting Language capable of expressing complex rules; Data and metadata are isolated into independent ‘Domains’; Available in a stand-alone version, a server version and a freely accessible Web version 1-way SSL and ECAS protected; The Web version allows editing statistical data by anyone registered without any software installation.

Some functional capabilities Allow to define formats by using an editor in the User Interface. Imports/Exports Metadata from/to external files; Imports/Exports from/to Oracle; Imports Auxiliary Data (lookup datasets); Accepts GESMES, CSV, SDMX-ML and flat files; Executes programs on imported dataset instances. Data set operations Editing and deterministic imputation Outlier detection TRAMO, Hidiroglou-Berthelot and sigma-gap

Integration Capabilities Integration with EDAMIS – Eurostat's Single Exchange Point (ongoing): detects incoming files and process them in unattended mode; publishes validation results to the EDAMIS Back Channel; Integrated with the Euro SDMX Registry: fetches DSDs into EDIT structures; loads code lists from the Registry;

EDIT main principles Treatment of micro and macro data; Scripting principle (symbols/placeholders); Editing seen as computations; Multi - dataset approach; Cube approach in computations.

EDIT Terminology Domain – isolated workspace inside EDIT; Dataset Definition (Format) – defines the structure of the data; Identified by name inside a Domain; Dataset – collection of data rows according to the structure of a format; Program – a set of operations to be performed on a specified Format; Key set – a set of fields which uniquely identify a row in a data set; Partition – a sub-set of data identified by a fixed sub-set of the key set; Transposition – key set fields which uniquely identifies a row inside a partition. 6

Scripting Language Capabilities Custom Scripting Language designed specifically for data editing Attempt to be as simple as possible and still enough flexible to fit the requirements of any known / analyzed domain; The programs describe the rules and are composed of a set of steps with inputs and outputs; Drawbacks: Programs difficult to be written by non-programmers; Does not follow pure cube approach.

Working with EDIT Define a format (input file characteristics); Write a program composed by: Field rules – treating a single cell Horizontal Rules – at the level of records; Vertical Rules – sub set of the data set is seen as transposed; Hierarchical rules – on two or more interlinked datasets; Dataset Operations; Import the data set instance; Import auxiliary data (other datasets, lookup tables); Establish or import program parameters Execute a job = run the program against all the imported or set data. 8

Rules - examples 1. RECORD FL171 {CONDITION (NOT isNull (A1bis)) -> inLookup (A1bis, NACE, "CODE"); ERRMSG "Rule FL171 failed for field [A1bis]: NACE rev 1.1" SEVERITY "Warning" (A1bis) ; } 2. RECORD pureRecord { PRICE := 20;} 3. RECORD conditionalRecord {CONDITION isNull(VALUE); THEN {VALUE := PRICE * QUANTITY;} ELSE { PRICE := VALUE / 5; QUANTITY := VALUE / PRICE; } 4. VERTICAL pureVertical { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; // dimensions – now the data set is a cube TRKEYS COUNTRY; //divide the cube in sub-cubes by country VALUE['TOTAL'] := nvl(VALUE['TOTAL'],0); } }

Editing rules - overview 1. Cell level rules may involve determining: whether the entry of any cell is an invalid blank; whether the recorded entries are among a set of valid codes for the cell; 2. Horizontal validation rule – at the level of a record - usually specified on the basis of extensive knowledge of the subject matter of data. Example: combinations of fields which are jointly unacceptable. 3. Vertical validation rule - involve a data integrity check for entries across a collection of related records: Examples: total number of imports for a given product is equal to the sum of imports from individual countries; stock value in the beginning of month is equal to the closing stocks in the previous month; 4. Hierarchical validation rule – checks involving one or more datasets hierarchically interlinked.

Eurostat's Meta-Language SDMX - TWG Paris, 13-14 Dec. 2012

VIP on Validation (VIP = Vision Infrastructure Project) It is an ongoing project; Main task of the project: organize and optimize data editing among MSs and Eurostat for ESS data collections. Main deliverables: set of standards including a common Meta Language Additionally, a GUI driven IT system capable to generate rule sets in a meta language and standardised documentation of the validation rules understandable by business users

The need for a common language A formal unambiguous language was needed to allow rules encoding so that they can be translated into other existing syntaxes; This can help to create a more efficient production chain with responsibilities clearly assigned to the different actors

Accompanying tools Guidelines for selecting a set of rules to ensure a defined minimum standard of quality for the data exchanged; Guidelines for the assignment of responsibility in the data editing chain (Member States and Eurostat); User requirements and functional specifications for: A tool to edit and monitor compliance; A tool to specify the rules.

Scope of the Meta – Language (ML) A formal unambiguous language for encoding validation rules; Friendly to statisticians – if possible, the rules to be expressed in a human understandable way; To be able to treat both micro data and aggregate data; To allow exchange of validation rules between organizations; Able to work with cubes and with bi-dimensional data sets (e.g. micro-data).

The Meta Language (ML) – under development Information model: The simpler the information model – the more flexible the language; Data model = bi-dimensional data sets consisting of rows and columns; To allow working with all types of incoming files; Operators/functions/calculations: Statistical needs oriented; Act on data model objects = data sets; Allow expression of logical operators and computations.

Documentation of the rules The same operator/function/expression may be used for expressing different statistical meaning A rule is documented by: A list of ML - expressions used; A set of parameters; Documentation – a text provided by the user when implementing the rule