Generic Statistical Data Editing Models (GSDEMs)

One of the outputs of ….. the Generic Statistical Data Editing Models is an output Workshop on the Modernisation of Official Statistics The Hague, 24 November 2015

2 Aim of GSDEMs Data editing is one of the most expensive and time-consuming parts of the statistical production process. It uses advanced methodology and algorithms with opportunities of sharing methods and tools. The use of multiple and new sources poses new challenges for data editing. To exchange ideas, experiences and practices, we need common concepts, models and definitions The GSDEMs are standard references for statistical data editing, consistent with existing standards (GSBPM and GSIM). They aim to facilitate understanding, communication, practice and development. Data editing is an important field for modernisation of statistics, for several reasons. To face these issues together and to cooperate, we need common concepts, models and definitions. That is what GSDEMs are intended to provide. Thereby staying of course fully consistent with other standards, such as GSBPM and GSIM.

3 The Task Team A Generic Process Framework for Statistical Data Editing was proposed at the 2014 UNECE Work Session on Data Editing GSDEMs have been developed by an international task team (Finland, France, Italy, Norway, Netherlands, UNECE) Under authority of the High Level Group for the Modernisation of Official Statistics (HLG-MOS). Resulting in a report describing: Data editing business functions (what) Methods to perform these functions (how) Organisation of the process (process flow models) The idea of creating a framework for data editing originated at the 2014 Work Session on Data Editing. Following that idea, GSDEMS have been developed by an international task team with 6 contributing institutions. The work was done under the authority of the HLG-MOS It resulted in a report describing: Data editing business functions (what id done in data editing), the methodology used in data editng (how) and the organisation of the process (in process flow models).

4 Data Editing Functions
Data editing functions perform different kinds of tasks (GSBPM), and can be classified by purpose in three broad categories. Review (of input data) Assessing the consistency and plausibility of input data Quality measurement and validation Selection (for further processing) Selection of values or units for specified further treatment Amendment (manual editing, imputation) Changing or replacing data values to improve data quality The business functions in data editing perform different kinds of tasks, that can be found in GSBPM. They can be classified by their purpose in three broad categories: Review, Selection and Amendment. Review functions asses the consistency and plausibility of the data Selection functions select data values or units for specific further treatment Amendment function actually change data to improve the quality Each with a number of lower-level, more specific functions.

5 Methods for data editing functions
Review Evaluation of pre-specified edit-rules (Im)plausibility of values Outlier detection algorithms Selection Automatic selection by influence scores Selection for manual review Graphical inspection of estimates Amendment Regression Imputation Estimation of missing values Imputation by donor values GSDEMs not only describe what is done by the specific data editing functions but also how it is done. This means that commonly used methodology for each of the functions is listed. Often more methods can be used for the same function depending on the circumstances

6 Process flow: Organising the data editing process
A data editing process consists of functions with specified methods that are executed in an organised way. The organisation is described by subdivided it into: Process steps : Sub-processes each consisting of a number of specified functions. Amendment and Review but no Selection. To describe the routing of the data through the process steps Process controls : Selection but no amendment functions. Do not modify data values but specify different streams of data through the process flow. The report contains process-flows for a number of scenarios: Structural Business Statistics, Short-term statistics, Business census, Household statistics, Statistics based on multiple sources. A data editing process consists of a considerable number of functions with specified methods that need to be executed in an organised way. To describe the process, GSDEMs divide it into sub-processes or process step. But we also need to describe the navigation of the data through these process steps and for that we use the controls.

7 Example: SDE flow model for Households statistics

8 Example: SDE flow model for statistics using data integration

