Theme (i): New and emerging methods UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing 24-26 April 2017 The Hague, Netherlands New perspectives for data editing in the context of new data sources and data integration Theme (i): New and emerging methods
New and emerging methods This topic aims to bring together contributions that include innovative ideas and applications related to various aspects of editing and imputation. Of special interest are methods for the detection and amendment of errors in alternative or new data sources, including administrative and big data, either on their own or in combination with survey data. They should also highlight the expected impact that new methods might have on the statistical agency, including how they contribute to standardizing concepts, terminology, methods, data structures and the quality of its data products.
Seven papers 1/2 Correcting for misclassification under edit restrictions in combined survey- register data using Multiple Imputation Latent Class modelling (MILC) – Netherlands. Multi source data: how to include a covariate when the LC model is constructed by using the ‘three-step’ approach. Simplifying constraints in data editing – Netherlands. How algorithms from Operation Research and Artificial Intelligence can be used to simplify real-life edit sets. An automatic procedure for selecting weights in kNN imputation – Austria. Selection of proper weights for the distance variables when applying k-Nearest- Neighbour imputation for different kinds of variables. Imputation methods satisfying constraints – Netherlands. An overview of imputation methods that satisfy edit restrictions and/or constraints due to known or previously estimated totals.
Seven papers 2/2 Evaluating the quality of business survey data before and after automatic editing – Netherlands. Evaluating the measurement quality of automatic editing methods by modelling the residual measurement errors in the data. Computational estimates of data editing related variance – Netherlands. The process of data editing is considered to be a part of the estimator. The purpose is to estimate the total variance of the estimator that include the stochastic contribution due to the editing steps. A comparison of Kokic and Bell and conditional bias methods for outlier treatment – France. Comparing two outlier treatment methods in a simulation study on real data with a complicated sampling design.
Enjoy the presentations!
Discussion 1/3 MILC method by ‘three-step’ approach: is covariate variable considered free-error or not? Could more covariates be included? Simplifying constraints in data editing: how efficient is it in terms of both costs and time? Despite the constraint simplification such algorithms are not often applied in the field of data editing. Why? Selecting weights in kNN imputation: the Lasso weights are set equal to 1 for all variables with non-zero regression coefficients. Why not differentiate by using the respective coefficients themselves as weights?
Discussion 2/3 An overview of imputation methods: an ‘‘ideal’’ imputation method has not been found, nor does it exist. As always, an imputation method can be good for something, but bad for others. Should the choice to impute or to weight be treated just as a practical question? Evaluating the quality of business survey: consider the following automatic editing procedure: “in case of potential errors, set the changeable value(s) to that of the most reliable unchangeable input source” – how would this method look under the same evaluation approach? Generally, how to distinguish numerical consistency from truthfulness?
Discussion 3/3 Data editing related variance: (for the audience) are there experiences with estimating imputation variance at NSI’s? What methods are used? Is the proposed method an option? Outlier treatment: both methods give similar results. Which one is going to be applied? Are there theoretical or practical reasons to choose one over the other?