Download presentation
Presentation is loading. Please wait.
Published byFanny Straub Modified over 6 years ago
1
Michael Schäfer, Mark van der Loo & Olav ten Bosch
Validation in two countries: a Proof of Concept (PoC) Michael Schäfer, Mark van der Loo & Olav ten Bosch
2
Contents Goals Process It is all on github… Experiences from Germany
Experiences from Netherlands BTW: This is still work in progress...
3
Goals To test feasibility of results of ESSnet
Can we implement a set of validation rules, specified in one common language into two different systems in two different countries? Validation package in R from CBS validation language eStatistik of Destatis Can we use VTL for that, unambiguously and easy to understand? Can data be validated using these validation rules? What does the comparison of the results say about the usage of VTL in terms of specifying validation rules for the ESS? What more do we learn from executing these steps?
4
Compare Compare ESSnet on Validation, Proof of Concept (PoC)
The Survey(WP1) Compare Validation results Validation results 1800 examples of validation rules Run Validate Run eStatistik A study on VTL (WP4) 18 rules in Validate syntax 18 rules in eStatistik syntax Select Synthetic datasets 18 test Validation rules Translate 18 rules in VTL syntax Generate Compare
5
PoC on Github
6
Some of the rules
7
A dataset for rule 5
8
A dataset for rule 5
9
Rule 5, 3 implementations VTL eStatistik (DE) Validate (NL)
10
Experiences from Germany
Data Validation and Editing Specification Language Control Assert and set Reuse Procedures Full instruction set Can iterate over reference data Scoping/Scenarios Functions Reduced instruction set Can iterate over reference data Have one return value Validation rules Small instruction set Return TRUE or FALSE Errors trigger Automated edits Small instruction set Properties
11
Experiences from Germany
Some notable characteristics of the German system (I) Only the results of validation rules are relevant in terms of determining the validity of data. Automated edits cannot be invoked explicitly, only implicitly by validation rules that return TRUE. There is always only one data set under test per reference period, of which only the current (hierarchical set of) record(s) is visible to the procedure, rule or function being executed.
12
Experiences from Germany
Some notable characteristics of the German system (II) No built-in handling of missing values; preconditions cases can only be implemented by writing appropriate code in a procedure. Similarly, the state of a validated record is always CORRECT or INCORRECT, but never UNVALIDATED; this must be emulated by writing a separate validation rule that checks the precondition and is interpreted as a soft check. The procedure must then skip the invocation of the actual validation rule.
13
Experiences from Germany
Some notable characteristics of the German system (III) Procedures, validation rules, edits and functions are edited separatedly and are distinct pieces of code; since one procedure and one rule are required, this poses a problem to code transformations.
14
Experiences from Germany
(Minor) technical issues: Column/field names of the PoC contained an illegal character and had to be renamed ('-' replaced with '_') The field separator of the CSV files had to be changed to ';' because it is fixed (except when importing reference data, then it's configurable). The decimal separator had to be changed to ',' for the same reason. The header rows had to be deleted from the CSV files Records not complying with the expected data set structure cannot be processed (missing fields).
15
Experiences from Germany
Usability issues: Extracting the idea of a rule from the VTL code was only easy in very simple cases, partly due to infamiliarity with the relational paradigm and related concepts. In most cases, rules could not be implemented without resorting to the informal description of the rules, but then, there can be lanuage barriers.
16
Experiences from Germany
Usability issues: Extracting the idea of a rule from the VTL code was only easy in very simple cases, partly due to infamiliarity with the relational paradigm and related concepts. In most cases, rules could not be implemented without resorting to the informal description of the rules, but then, there can be lanuage barriers.
17
Experiences from Germany
Lessons so far: All VTL rules could be translated and then produced the expected results, but some adaptions were necessary including splitting code into procedures, rules and functions. This indicates that transforming VTL automatically is probably a demanding task and may only fully work under a set of restrictions which may in turn affect its usability. Understanding VTL currently requires a specific skill set and finding the right staff may not be easy for all NSIs.
18
Experiences from Netherlands
Notes from translator are on github, here are some of them:
19
Conclusions? To be done…
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.