Download presentation
Presentation is loading. Please wait.
Published byBrian Joly Modified over 6 years ago
1
A handbook on validation methodology Marco Di Zio Istat
Workshop ValiDat Foundation – Wiesbaden, November 2015
2
Underlying idea of the HB
Why a handbook on methodology for data validation? Standardization of language, of elements, provide common measures for evaluation… establish a common reference framework and develop metrics for evaluating DV The HB is composed of two main parts: A generic framework for data validation Discuss metrics to evaluate a validation procedure (tuning, evaluating the procedure..) ValiDat foundation workshop Wiesbaden November 2015
3
Generic framework for data validation
The objective of this first section is to clarify What Why How and … ValiDat foundation workshop Wiesbaden November 2015
4
Generic framework for data validation
Clearly establish the relation with other phases of the statistical production process and internationals standards as GSBPM GSDEMs GSIM Describe the data validation life cycle – useful for managing the data validation process ValiDat foundation workshop Wiesbaden November 2015
5
What is data validation… Definition
Data Validation is an activity verifying whether or not a combination of values is a member of a set of acceptable combinations. not far from the Unece definition: An activity aimed at verifying whether the value of a data item comes from the given (finite or infinite) set of acceptable values but essentially different… ValiDat foundation workshop Wiesbaden November 2015
6
What… It is a decisional procedure ending with an acceptance or refusal of data as acceptable. The decisional procedure is generally based on rules expressing the acceptable combinations of values. ValiDat foundation workshop Wiesbaden November 2015
7
Why do we perform data validation…
The purpose of data validation is to ensure a certain level of quality of the final data but quality has several aspects. We clarified which aspects are related to DV Essentially the ones related the ‘structure of the data’, that are accuracy, comparability, coherence. But others are connected, e.g., timelines can be seen as a constraining factor ValiDat foundation workshop Wiesbaden November 2015
8
How to perform DV… Two main elements Validation levels
to what extent a data set has been validated Validation rules Rules are applied to data, a failure of the rule implies that the corresponding validation level is not attained by the data at hand (decisional process: accept/not accept) ValiDat foundation workshop Wiesbaden November 2015
9
Validation levels They are related to the perspective of the ‘validator’ … In the HB: Business perspective Starting form the elements characterising usually the DV process (increasing information) A formal approach Looking a the elements characterizing a point in a statistical setting ValiDat foundation workshop Wiesbaden November 2015
10
Validation levels: business perspective
ValiDat foundation workshop Wiesbaden November 2015
11
Validation levels: formal approach
metadata aspects that are necessary to identify a data point, The universe U from which a statistical object originates. (e.g., household, company,) The time t of selecting an element u from the current population p(t) The selected element u. This determines the value of variables X over time that may be observed. The variable selected for measurement. ValiDat foundation workshop Wiesbaden November 2015
13
Data validation - GSDEMs
Generic Statistical Data Editing Models statistical data editing composed of three different function types: Review, Selection and Amendment The review functions are defined as: Functions that examine the data to identify potential problems. This may be by evaluating formally specified quality measures or edit rules or by assessing the plausibility of the data in a less formal sense, for instance by using graphical displays ValiDat foundation workshop Wiesbaden November 2015
14
Data validation - GSDEMs
Among the GSDEMs different function categories there is ‘Review of data validity’ that is Functions that check the validity of data values against a specified range or a set of values and also the validity of specified combinations of values. Each check leads to a binary value (TRUE, FALSE) ValiDat foundation workshop Wiesbaden November 2015
15
Data Validation - GSBPM
ValiDat foundation workshop Wiesbaden November 2015
16
Data validation life cycle
ValiDat foundation workshop Wiesbaden November 2015
18
Second part of the document: Metrics
Evaluating validation procedure …next presentation… ValiDat foundation workshop Wiesbaden November 2015
19
Thanks for your attention
ValiDat foundation workshop Wiesbaden November 2015
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.