Standardized and modernized data editing in Statistics Denmark Hanne-Pernille Stax hps@dst.dk
Background Main area of responsibility Digitalisation of questionnaires for business statistics From 0-100 - in 10 years 350.000 digital reports per year Special area of interest Lower burden Better data quality Automatic data validation in online questionnaires
Online validation – so far Data type Value: range / match Absolute error Likely error Relation between values Relation between new and known values Data confrontation / Cross validation… Must be number Max 100 pct. Please specify product High salary per reported employee High price compared to last report
Data confrontation - perspectives Prefill enterprise specific known values to each enterprise specific online questionnaire Values from prior report to same survey Admin data Promising results Non-standard processes Call for data sharing Values/data from other surveys
General reform of data editing at Statistics Denmark One system for storing raw & edited data for official statistics Standardize data editing across diverse production systems Prioritized editing of raw data from surveys and registers Automatic validation in online questionnaires for survey based statistics
Standardized and modernized data editing in Statistics Denmark Why? A standard approach to data editing - across divisions & production systems - to facilitate: efficient, transparent & robust processes resources aimed at pre-set quality level knowledge sharing and mobility When? 2015: Project initiation 2022: All systems reformed Standardized and modernized data editing in Statistics Denmark Who? Method It Data collection Subject matter How? Develop standard tools & methods for editing during & post data collection Analyse existing error, editing & effect Move checks from post data collection to digital surveys Reform (prioritize) post data collection editing.
Set up for reformation of data editing processes for survey based statistics Review: Current data quality, editing process and effect of each editing step Recommend: Prioritized data editing Implement: Data sets & editing in Data Archive Recommend: Online validation in questionnaires Implement: Edit checks in questionnaires
Organisation Board Subject Matter, It, Methods, Business Data Collection Project manager Process & Methods Statistical Methods Team 1 Team 2 Data Archive IT Team 3 Online Validation Business Data Collection Team 4 Test and Super Users Subject Matter
As Is Why reform? 60.000 enterprises Use: 100 digital questionnaires In: 1 platform for digital reporting To: 1 input database With: 100 separate stovepipes Connected to: 100 separate data processing systems With: Unique data validation tools And: Unique data archiving processes And: No systematic sharing/integration of data
As Is 1 Platform Input database Stat 1 Stat 2 Stat 3 Stat 4 Stat 5
Data processing – in stove pipes As Is Input database Stat 1 Stat 2 Stat 3 Stat 4 Stat 5 Stat 6 Stat 7 Stat 8 Stat 9 Stat 10 Stat 11 Stat 12 Stat 13 Stat 14 Stat 100 Editing Integration Imputation … Porcess 1 Process 2 Process 3 Process 4 Process 5 Process n 1 … 2 3 n ... a b c d e
Prefill for online validation - from own stovepipe As Is Challenges: Unique database Unique prefill system No updates during data collection Closed system – no data sharing
New Central Data Archive To Be Input database Stat 1 Stat 2 Stat 3 Stat 4 Stat 5 Stat 6 Stat 7 Stat 8 Stat 9 Stat 10 Stat 11 Stat 12 Stat 13 Stat 14 Stat 100
Standardised data editing - layered edited data To Be Data Editing
Systematic use of edited data in online validation/data confrontation To Be Pre-populate with known values from: Preceding reports Other surveys Other sources Online data confrontation and editing
The way ahead Stronger evidence base for implementation of automatic data validation in online questionnaires New possibilities for intelligent and flexible cross validation between new and known data New possibilities for smart reporting from large and complex units A need to think ahead when data editing is moved from post data collection and into the online data collection process (GSBPM)
Thank you for your attention Contact information: Hanne-Pernille Stax HPS@dst.dk