Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.

Slides:

Advertisements

Similar presentations

Copula Representation of Joint Risk Driver Distribution

Advertisements

Introduction to Hypothesis Testing

Tests of Hypotheses Based on a Single Sample

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,

Within Subjects Designs

Mixed Designs: Between and Within Psy 420 Ainsworth.

Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,

1 A Common Measure of Identity and Value Disclosure Risk Krish Muralidhar University of Kentucky Rathin Sarathy Oklahoma State University.

Introduction to Statistical Quality Control, 4th Edition Chapter 7 Process and Measurement System Capability Analysis.

Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.

Topic 6: Introduction to Hypothesis Testing

1 Editing Administrative Data and Combined Data Sources Introduction.

The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.

The Simple Linear Regression Model: Specification and Estimation

Dr. Chris L. S. Coryn Spring 2012

Chapter 4 Multiple Regression.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 10 th Edition.

Evaluating Hypotheses

Lecture 10 Comparison and Evaluation of Alternative System Designs.

A new sampling method: stratified sampling

Inferences About Process Quality

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.

Chapter 14 Inferential Data Analysis

Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.

Bootstrapping applied to t-tests

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;

Vienna, 23 April 2008 UNECE Work Session on SDE Topic (v) Editing on results (post-editing) 1 Topic (v): Editing based on results Discussants: Maria M.

Objectives of Multiple Regression

11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,

Eurostat Statistical Data Editing and Imputation.

Introduction to Statistical Quality Control, 4th Edition Chapter 7 Process and Measurement System Capability Analysis.

Statistical Methods, part 1 Module 2: Latent Class Analysis of Survey Error Models for measurement errors Dan Hedlin Stockholm University November 2012.

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.

A generic tool to assess impact of changing edit rules in a business survey – SNOWDON-X Pedro Luis do Nascimento Silva Robert Bucknall Ping Zong Alaa Al-Hamad.

Experimental Design making causal inferences Richard Lambert, Ph.D.

Andrew Thomson on Generalised Estimating Equations (and simulation studies)

Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics

Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.

Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.

1 Selective data editing Development & implementation Q 2010 Helsinki Jörgen Svensson Process Owner Statistics Sweden (SCB)

Score Functions under the Optimization Approach Work Session on Statistical Data Editing Paris, April 2014 Ignacio Arbués and Pedro Revilla INE Spain.

Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.

Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.

Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.

Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.

Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.

Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.

Sampling Theory and Some Important Sampling Distributions.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.

9-1 Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:

Testing the use of administrative data to edit the 2009 Agriculture Census Dolores Lorca National Statistical Institute of Spain.

Q2010 Special session 34 Data quality and inference under register information Discussion by Carl-Erik Särndal.

Ljubljana, 11 Mai 2011UNECE Work session on SDE Topic (vii) New and emerging methods 1 Topic (vii): New and emerging methods Discussion Discussants: Rudi.

Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.

Deep Feedforward Networks

An R package for selective editing based on a latent class model

Improving the efficiency of editing in ONS business surveys

Principles of the Global Positioning System Lecture 11

Jeroen Pannekoek, Sander Scholtus and Mark van der Loo

Testing Causal Hypotheses

Presentation transcript:

Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing Introduction

Selection for manual editing The objective of macro editing and selective editing is to limit the time consuming and costly manual editing (reviewing, treatment or re-contact) as much as possible without substantially decreasing output quality. Both macro editing and selective editing are selection methods, with the common purpose of selecting units with potentially influential errors for interactive (manual) review. Thus, to cases with high expected benefit.

Issues covered Selective editing: scoring. Define a score for each unit such that units with high scores contain the most (potentially) influential errors. Define a threshold value for scores. Only scores with higher values are reviewed and treated manually. Threshold value can be obtained by evaluating the “pseudo bias” in estimates due to not editing all units. Macro editing: (aggregate method). How to identify suspect aggregates How to drill-down to responsible units. Papers in this session will cover applications and extensions of traditional selection methods and discuss issues in their implementation.

Issues covered Some papers will also discuss more recent approaches to selective editing. Model based. Models the error mechanism: observed value = true value + error with error a random variable. Optimal selection of units, construction of score functions and setting threshold values may all use model based evaluations of error.

Presentations Spain: Optimization as a theoretical framework to selective editing. Minimize #units to edit subject to a bound on mean squared measurement errors of non-edited units. Italy: Multivariate selective editing via mixture models: first applications to Italian structural business surveys. Shows application of a scoring approach based on a mixture model to structural business statistics. Long presentations from Spain and Italy. Both papers use models for the error distribution. Selection is based on model based estimates of error rather than traditional score functions.

Presentations Census Bureau: An application of selective editing to the U.S. Census Bureau trade data. A scoring method for selective editing of foreign trade data. Evaluation of pseudo bias. Sweden: Tree analysis – a method for constructing edit groups. Acceptance region can differ between homogeneous subgroups of the data. Break (20 minutes) Germany: An automated comparison of statistics. Automatic checking of aggregates and flagging of suspicious records (drilling-down).

Presentations Sweden: The use of evaluation data sets when implementing selective editing. Implementation issues of selective editing, especially w.r.t. setting threshold values. Italy: Selective editing as a part of the estimation procedure. Explore possibilities of using a sample from the non-edited units to estimate and correct bias due to not editing all units.

Presentations Enjoy the presentations!

Summary Spain paper (optimization as theoretical framework): Defines selective editing as an optimization problem: choose the minimal subset of the records for editing such that the remaining measurement error is below a specified fraction. Italian paper (application of contamination model): Uses a (different) model-based approach, derive score functions from the model predicted values and applies to structural business statistics. Italian paper (selective editing and estimation): Uses SeleMix approach to selection and considers sampling, with probabilities proportional to scores, from unedited units to estimate –and correct for- bias.

Discussion The approach of Spain needs a covariance matrix of measurement errors, estimated on fully edited previous data. It is demanding to obtain this matrix. Is there an indication of how sensitive the solution (selection of units) is to misspecification of this covariance matrix. In the mixing model there is a mixing probability which is the expected proportion of error free data. This seems a quantity of interest in itself. Can it say something about the quality of the data and amount of editing necessary? Can the posterior (unit level) probability of an observation being free of error given be used as a “score”? The errors have zero mean under the model. So, editing reduces only variance and not bias? Can the model still be usefully applied when this assumption is violated. How about data with large positive errors (thousand errors).

Summary Census Bureau: Selective editing of trade data. A scoring method that modifies HB-method since now previous values are available. Evaluation of pseudo bias. Sweden: Tree model for constructing edit groups. Soft edits are used to detect suspicious values. The acceptance region can differ between homogeneous subgroups of the data. Germany: An automated comparison of statistics. Automatic checking of aggregates and flagging of suspicious records (drilling-down). Based on principle components instead of original variables. Sweden: Use of evaluation data sets. Implementation issues of selective editing, especially w.r.t. setting threshold values. Advise on how to choose evaluation data sets.

Discussion Is the tree-based method for creating edit groups already applied? Are there practical experiences? Macro editing using principle components. Is it necessary that the “true” correlation structure (with no errors) of the reference and actual data set are similar? Evaluation of absolute pseudo bias. Is this expensive since it requires fully edited data? How often do we need such evaluations. Can it also be done for only a sample from the non-edited units?