Michael Schäfer, Mark van der Loo & Olav ten Bosch

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Configuration management
Configuration management
CC SQL Utilities.
MS-Access XP Lesson 3. Validation Rule Property 1.Validation rule defines limitations to the data in a filed. Field Name:Marks Type:Number Validation.
Tutorial 8: Developing an Excel Application
1 Design by Contract Building Reliable Software. 2 Software Correctness Correctness is a relative notion  A program is correct with respect to its specification.
Performed by:Gidi Getter Svetlana Klinovsky Supervised by:Viktor Kulikov 08/03/2009.
Introduction to a Programming Environment
Guide To UNIX Using Linux Third Edition
“GENERIC SCRIPT” Everything can be automated, even automation process itself. “GENERIC SCRIPT” Everything can be automated, even automation process itself.
Other Features Index and table of contents Macros and VBA.
Activity 1 - WBs 5 mins Go online and spend a moment trying to find out the difference between: HIGH LEVEL programming languages and LOW LEVEL programming.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
CC0002NI – Computer Programming Computer Programming Er. Saroj Sharan Regmi Week 7.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Introduction to Programming Peggy Batchelor.
Configuration Management (CM)
Overview of Formal Methods. Topics Introduction and terminology FM and Software Engineering Applications of FM Propositional and Predicate Logic Program.
WSMX Execution Semantics Executable Software Specification Eyal Oren DERI
A State Perspective Mentoring Conference New Orleans, LA 2/28/2005 RCRAInfo Network Exchange.
Lesson 4.  After a table has been created, you may need to modify it. You can make many changes to a table—or other database object—using its property.
CHAPTER 6 LESSON B Creating Custom Forms. Lesson B Objectives  Suppress default system messages  Create alerts and messages to provide system feedback.
1 Types of Data Fundamental data type (atoms, primitive) –integers, characters Data structures- fundamental data types grouped in a particular way –Employee.
Chapter 1 The Phases of Software Development. Software Development Phases ● Specification of the task ● Design of a solution ● Implementation of solution.
Java Programming Fifth Edition Chapter 1 Creating Your First Java Classes.
Autonomy Paradigm Warning: This document is a part of my “Responsible Programming” theme. All docs related to that theme just gather some of my ideas.
Product Training Program
EGR 2261 Unit 13 Classes Read Malik, Chapter 10.
Trigger used in PosgreSQL
Basic concepts of C++ Presented by Prof. Satyajit De
C++ First Steps.
Chapter 3 of Programming Languages by Ravi Sethi
APPENDIX a WRITING SUBROUTINES IN C
Relevant Improvements
Chapter 7 Text Input/Output Objectives
Chapter 7 Text Input/Output Objectives
Lecture 1 Introduction Richard Gesick.
Input Space Partition Testing CS 4501 / 6501 Software Testing
Chapter 7 Text Input/Output Objectives
Arab Open University 2nd Semester, M301 Unit 5
Debugging and Random Numbers
Software Processes (a)
Chapter 6: Design of Expert Systems
Algorithm and Ambiguity
Validation in the ESS CoE Data Warehousing 23./
TRANSLATORS AND IDEs Key Revision Points.
Designing and Debugging Batch and Interactive COBOL Programs
Arrays, For loop While loop Do while loop
ESSnet project "Automated data collection and reporting in accommodation statistics"   Objectives, achievements and results
Packages and Interfaces
Programming Fundamentals (750113) Ch1. Problem Solving
PHP.
Coding Concepts (Basics)
Oracle9i Developer: PL/SQL Programming Chapter 8 Database Triggers.
3rd WGM Meeting 3 May 2018 Item 2.3 Possible standards for ESS Validation.
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Chapter 1: Programming Basics, Python History and Program Components
Chapter 10: Compilers and Language Translation
ESSI CHANGE AND nonconformity control SYSTEM
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Review of Previous Lesson
Programming Languages and Compilers (CS 421)
COP4020 Programming Languages
11.1 The Concept of Abstraction
Chapter 11 Abstraction - The concept of abstraction is fundamental in
Presentation transcript:

Michael Schäfer, Mark van der Loo & Olav ten Bosch Validation in two countries: a Proof of Concept (PoC) Michael Schäfer, Mark van der Loo & Olav ten Bosch

Contents Goals Process It is all on github… Experiences from Germany Experiences from Netherlands BTW: This is still work in progress...

Goals To test feasibility of results of ESSnet Can we implement a set of validation rules, specified in one common language into two different systems in two different countries? Validation package in R from CBS validation language eStatistik of Destatis Can we use VTL for that, unambiguously and easy to understand? Can data be validated using these validation rules? What does the comparison of the results say about the usage of VTL in terms of specifying validation rules for the ESS? What more do we learn from executing these steps?

Compare Compare ESSnet on Validation, Proof of Concept (PoC) The Survey(WP1) Compare Validation results Validation results 1800 examples of validation rules Run Validate Run eStatistik A study on VTL (WP4) 18 rules in Validate syntax 18 rules in eStatistik syntax Select Synthetic datasets 18 test Validation rules Translate 18 rules in VTL syntax Generate Compare

PoC on Github

Some of the rules

A dataset for rule 5

A dataset for rule 5

Rule 5, 3 implementations VTL eStatistik (DE) Validate (NL)

Experiences from Germany Data Validation and Editing Specification Language Control Assert and set Reuse Procedures Full instruction set Can iterate over reference data Scoping/Scenarios Functions Reduced instruction set Can iterate over reference data Have one return value Validation rules Small instruction set Return TRUE or FALSE Errors trigger Automated edits Small instruction set Properties

Experiences from Germany Some notable characteristics of the German system (I) Only the results of validation rules are relevant in terms of determining the validity of data. Automated edits cannot be invoked explicitly, only implicitly by validation rules that return TRUE. There is always only one data set under test per reference period, of which only the current (hierarchical set of) record(s) is visible to the procedure, rule or function being executed.

Experiences from Germany Some notable characteristics of the German system (II) No built-in handling of missing values; preconditions cases can only be implemented by writing appropriate code in a procedure. Similarly, the state of a validated record is always CORRECT or INCORRECT, but never UNVALIDATED; this must be emulated by writing a separate validation rule that checks the precondition and is interpreted as a soft check. The procedure must then skip the invocation of the actual validation rule.

Experiences from Germany Some notable characteristics of the German system (III) Procedures, validation rules, edits and functions are edited separatedly and are distinct pieces of code; since one procedure and one rule are required, this poses a problem to code transformations.

Experiences from Germany (Minor) technical issues: Column/field names of the PoC contained an illegal character and had to be renamed ('-' replaced with '_') The field separator of the CSV files had to be changed to ';' because it is fixed (except when importing reference data, then it's configurable). The decimal separator had to be changed to ',' for the same reason. The header rows had to be deleted from the CSV files Records not complying with the expected data set structure cannot be processed (missing fields).

Experiences from Germany Usability issues: Extracting the idea of a rule from the VTL code was only easy in very simple cases, partly due to infamiliarity with the relational paradigm and related concepts. In most cases, rules could not be implemented without resorting to the informal description of the rules, but then, there can be lanuage barriers.

Experiences from Germany Usability issues: Extracting the idea of a rule from the VTL code was only easy in very simple cases, partly due to infamiliarity with the relational paradigm and related concepts. In most cases, rules could not be implemented without resorting to the informal description of the rules, but then, there can be lanuage barriers.

Experiences from Germany Lessons so far: All VTL rules could be translated and then produced the expected results, but some adaptions were necessary including splitting code into procedures, rules and functions. This indicates that transforming VTL automatically is probably a demanding task and may only fully work under a set of restrictions which may in turn affect its usability. Understanding VTL currently requires a specific skill set and finding the right staff may not be easy for all NSIs.

Experiences from Netherlands Notes from translator are on github, here are some of them:

Conclusions? To be done…