Download presentation
Presentation is loading. Please wait.
Published byDorcas Anderson Modified over 9 years ago
1
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing Introduction
2
Selection for manual editing The objective of macro editing and selective editing is to limit the time consuming and costly manual editing (reviewing, treatment or re-contact) as much as possible without substantially decreasing output quality. Both macro editing and selective editing are selection methods, with the common purpose of selecting units with potentially influential errors for interactive (manual) review. Thus, to cases with high expected benefit.
3
Issues covered Selective editing: scoring. Define a score for each unit such that units with high scores contain the most (potentially) influential errors. Define a threshold value for scores. Only scores with higher values are reviewed and treated manually. Threshold value can be obtained by evaluating the “pseudo bias” in estimates due to not editing all units. Macro editing: (aggregate method). How to identify suspect aggregates How to drill-down to responsible units. Papers in this session will cover applications and extensions of traditional selection methods and discuss issues in their implementation.
4
Issues covered Some papers will also discuss more recent approaches to selective editing. Model based. Models the error mechanism: observed value = true value + error with error a random variable. Optimal selection of units, construction of score functions and setting threshold values may all use model based evaluations of error.
5
Presentations Spain: Optimization as a theoretical framework to selective editing. Minimize #units to edit subject to a bound on mean squared measurement errors of non-edited units. Italy: Multivariate selective editing via mixture models: first applications to Italian structural business surveys. Shows application of a scoring approach based on a mixture model to structural business statistics. Long presentations from Spain and Italy. Both papers use models for the error distribution. Selection is based on model based estimates of error rather than traditional score functions.
6
Presentations Census Bureau: An application of selective editing to the U.S. Census Bureau trade data. A scoring method for selective editing of foreign trade data. Evaluation of pseudo bias. Sweden: Tree analysis – a method for constructing edit groups. Acceptance region can differ between homogeneous subgroups of the data. Break (20 minutes) Germany: An automated comparison of statistics. Automatic checking of aggregates and flagging of suspicious records (drilling-down).
7
Presentations Sweden: The use of evaluation data sets when implementing selective editing. Implementation issues of selective editing, especially w.r.t. setting threshold values. Italy: Selective editing as a part of the estimation procedure. Explore possibilities of using a sample from the non-edited units to estimate and correct bias due to not editing all units.
8
Presentations Enjoy the presentations!
9
Summary Spain paper (optimization as theoretical framework): Defines selective editing as an optimization problem: choose the minimal subset of the records for editing such that the remaining measurement error is below a specified fraction. Italian paper (application of contamination model): Uses a (different) model-based approach, derive score functions from the model predicted values and applies to structural business statistics. Italian paper (selective editing and estimation): Uses SeleMix approach to selection and considers sampling, with probabilities proportional to scores, from unedited units to estimate –and correct for- bias.
10
Discussion The approach of Spain needs a covariance matrix of measurement errors, estimated on fully edited previous data. It is demanding to obtain this matrix. Is there an indication of how sensitive the solution (selection of units) is to misspecification of this covariance matrix. In the mixing model there is a mixing probability which is the expected proportion of error free data. This seems a quantity of interest in itself. Can it say something about the quality of the data and amount of editing necessary? Can the posterior (unit level) probability of an observation being free of error given be used as a “score”? The errors have zero mean under the model. So, editing reduces only variance and not bias? Can the model still be usefully applied when this assumption is violated. How about data with large positive errors (thousand errors).
11
Summary Census Bureau: Selective editing of trade data. A scoring method that modifies HB-method since now previous values are available. Evaluation of pseudo bias. Sweden: Tree model for constructing edit groups. Soft edits are used to detect suspicious values. The acceptance region can differ between homogeneous subgroups of the data. Germany: An automated comparison of statistics. Automatic checking of aggregates and flagging of suspicious records (drilling-down). Based on principle components instead of original variables. Sweden: Use of evaluation data sets. Implementation issues of selective editing, especially w.r.t. setting threshold values. Advise on how to choose evaluation data sets.
12
Discussion Is the tree-based method for creating edit groups already applied? Are there practical experiences? Macro editing using principle components. Is it necessary that the “true” correlation structure (with no errors) of the reference and actual data set are similar? Evaluation of absolute pseudo bias. Is this expensive since it requires fully edited data? How often do we need such evaluations. Can it also be done for only a sample from the non-edited units?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.