Theme (v): Managing change UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing 24-26 April 2017 The Hague, Netherlands Theme (v): Managing change Discussants: Agnes Andics, Sander Scholtus
Introduction to theme (v) Many generic ideas and methods for efficient and effective data editing are now well-established, e.g., Selective editing Automatic editing The use of generalised data editing systems Introduction of new data sources (administrative/big data) can also prompt new approaches to data editing However, successful introduction of new methodology into statistical production processes is not trivial
Introduction to theme (v) Important factors for successful changes: Obtaining support and cooperation from top-level management and editing staff as soon as possible Planning development work in achievable and measurable stages Evaluating the quality of data editing processes and monitoring the effects of changes
Introduction to theme (v) In this theme: presentations on Evaluating data editing processes The use of process data Indicators for impact of different process steps on results Developing a generic, metadata-driven data editing system Developing data editing methodology for a new data source
Agenda for theme (v) United Kingdom: Improvements in editing methods and processes for use of Value Added Tax data in UK National Accounts Denmark: Usage of process data in data editing at Statistics Denmark Switzerland: Data preparation process analysis of the structural survey of the Swiss population census Finland: An information model for a metadata-driven editing and imputation system General discussion
Summary: presentations in theme (v) WP.24 Improvements in editing methods and processes for use of Value Added Tax data in UK National Accounts (UK) Development of a system to process VAT data for short term statistics Review and refinement of previously developed methods Rules for detecting systematic errors (£1000 error, quarterly patterns), automatic editing of suspicious turnover values Suggestion to introduce selective editing of suspicious turnover values VAT units vs. reporting units In principle, would like to edit as early in the process as possible (VAT units) However, better auxiliary information available for reporting units Investigation whether auxiliary information can be translated to VAT unit level Development process: agile, four week sprint, continuous feedback
Summary: presentations in theme (v) WP.25 Usage of process data in data editing at Statistics Denmark (Denmark) Project to modernise data editing at Statistics Denmark Construction of a common database (Data Archive) for storing and editing surveys; includes storage of process data Feedback from process data to improve data editing process, e.g., Extend score function for selective editing with the probability that a suspicious value would lead to a correction Investigate unusual changes in process indicators such as edit rate Investigate systematic reporting problems Application to survey on pig farmers: yearly feedback process
Summary: presentations in theme (v) WP.26 Data preparation process analysis of the structural survey of the Swiss population census (Switzerland) Evaluation of the data editing process of structural census survey Computed indicators proposed in EDIMBUS manual, including Item response rate / ratio Imputation rate / ratio Structural missingness rate Compared indicators at different stages of the editing process for different years (evolution monitoring) Implemented indicators in R package ‘sdap’ (available online)
Summary: presentations in theme (v) WP.27 An information model for a metadata-driven editing and imputation system (Finland) Possibility of developing a generic system for data editing and imputation, driven by metadata (no hard-coded parameters) Metadata information model influenced by Statistics Canada’s Banff Parameters (such as edit rules) defined as separate metadata objects Facilitates, e.g., re-use of the same edit rule by different methods Parameters are versioned so the outcome of an editing process is reproducible ‘Naïve’ data organisation model proposed; will be made more efficient
Questions / Points for discussion Paper by Statistics Denmark: regular feedback (using process data) to improve data collection and data editing processes Experiences of other statistical institutes? Indicators in papers by Statistics Denmark and SFSO are based on properties of data before and after editing/imputation Advantage: indicators can be computed directly from available data Limitation: no direct information on the effects of editing/imputation on the accuracy of statistical output (in terms of bias, variance) Could impact of editing/imputation on accuracy of statistical output also be measured during regular production? Any experiences?
Questions / Points for discussion Selective editing of administrative data (e.g., VAT data) Number of influential errors may be too large for manual follow-up Previous work sessions: ‘probabilistic selective editing’ proposed Select random sample of records for manual follow-up (e.g., probability proportional to score) Estimate correction term for the residual errors in the non-selected records Any experiences of statistical institutes in practice?