Automatic Editing Data. A New Version of DIA System Prepared by J.M. Gomez Presented by D.Lorca National Statistical Institute of Spain.

Slides:



Advertisements
Similar presentations
Self tuning regulators
Advertisements

SPL/2010 Test-Driven Development (TDD) 1. SPL/
1 Measuring data quality by the use of a routine re-interview module Experiences from the Norwegian European Social Survey Øyvin Kleven and Frode Berglund.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
SLICE 1.5: A software framework for automatic edit and imputation Ton de Waal Statistics Netherlands UN/ECE Work Session on Statistical Data Editing,
Programming in Visual Basic
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 1 Section 2 – Slide 1 of 22 Chapter 1 Section 2 Observational Studies, Experiments, and.
Preparing for the First Hourly Summer Term. Summer Term Course Structure Probability and Design Issues  Descriptive Statistics, Confidence Intervals.
Summary of the lecture We discussed –variable scope –instance variable declarations –variable lifetime.
Chapter 8 Estimation: Additional Topics
CHAPTER 4: Representing Integer Data The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander.
1.1 Some Basics of Algebra Algebraic Expressions and Their Use
Simultaneous Equations Models
Biostatistics Frank H. Osborne, Ph. D. Professor.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
4.2 Integer Exponents and the Quotient Rule
Performing Computations C provides operators that can be applied to calculate expressions: example: tax is 8.5% of the total sale expression: tax =
Copyright © 2007 Pearson Education Canada 1 Chapter 12: Audit Sampling Concepts.
E&I for 2006 Canadian Census Mike Bankier Statistics Canada
Review of normal distribution. Exercise Solution.
Eurostat Statistical Data Editing and Imputation.
Applied Discrete Mathematics Week 9: Relations
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Sampling January 9, Cardinal Rule of Sampling Never sample on the dependent variable! –Example: if you are interested in studying factors that lead.
Computer Arithmetic Nizamettin AYDIN
1 The system aspect of statistical quality Q2014 european conference on quality in official statistics Special session: Consistency of Concepts and Applied.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS G. Bianchi, R. M. Lipsi, P. Francescangeli, G. Ruocco, A.
5 June 2013 SDMX Technical Working Group Luxembourg 1 5 June 2013 SDMX Technical Working Group Luxembourg 1 WP Item 6 The Expressions Language of Banca.
CMPS 211 JavaScript Topic 1 JavaScript Syntax. 2Outline Goals and Objectives Goals and Objectives Chapter Headlines Chapter Headlines Introduction Introduction.
Automatic Editing with Hard and Soft Edits – Some First Experiences Sander Scholtus Sevinç Göksen (Statistics Netherlands)
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
New and Emerging Methods Maria Garcia and Ton de Waal UN/ECE Work Session on Statistical Data Editing, May 2005, Ottawa.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Preparing for the 2 nd Hourly. What is an hourly? An hourly is the same thing as an in-class test. How many problems will be on the hourly? There will.
CSC 221 Computer Organization and Assembly Language
Implicit Linear Inequality Edits Generation and Error Localization in the SPEER Edit System Maria Garcia U.S. Census Bureau UNECE Work Session on Statistical.
Auditing: The Art and Science of Assurance Engagements Chapter 13: Audit Sampling Concepts Copyright © 2011 Pearson Canada Inc.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
Understanding Basic Statistics
Sensitivity Analysis A systematic way of asking “what-if” scenario questions in order to understand what outcomes could possibly occur that would effect.
1 Motion Fuzzy Controller Structure(1/7) In this part, we start design the fuzzy logic controller aimed at producing the velocities of the robot right.
Programming with Microsoft Visual Basic th Edition
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
New and Emerging Methods UN/ECE Work Session on Statistical Data Editing Vienna April 21-23, 2008.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 1 Section 2 – Slide 1 of 21 Chapter 1 Section 2 Observational Studies, Experiments,
1 Simulation Scenarios. 2 Computer Based Experiments Systematically planning and conducting scientific studies that change experimental variables together.
Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A
On Implementing CSPA Specifications for Editing and Imputation Services Donato Summa, Monica Scannapieco, Diego Zardetto, Istat, Italy Istituto Nazionale.
ANOVA II (Part 1) Class 15. Follow-up Points size of sample (n) and power of test. How are “inferential stats” inferential?
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Ljubljana, 11 Mai 2011UNECE Work session on SDE Topic (vii) New and emerging methods 1 Topic (vii): New and emerging methods Discussion Discussants: Rudi.
Ljubljana, 11 Mai 2011UNECE Work session on SDE Topic (vii) New and emerging methods 1 Topic (vii): New and emerging methods Introduction Session organizers:
Chapter 7 - Functions. Functions u Code group that performs single task u Specification refers to what goes into and out of function u Design refers to.
Section 9.2 Rational Exponents.
chance Learning impeded by two processes: Bias , Chance
3-3 Side Effects A side effect is an action that results from the evaluation of an expression. For example, in an assignment, C first evaluates the expression.
13.1 Exponents.
What are their purposes? What kinds?
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Software Testing.
STATISTICS ELEMENTARY MARIO F. TRIOLA
Chapter 9 Estimation: Additional Topics
Automatic Editing with Soft Edits
New and Emerging Methods
Presentation transcript:

Automatic Editing Data. A New Version of DIA System Prepared by J.M. Gomez Presented by D.Lorca National Statistical Institute of Spain

Summary DIA system: Generalized software for automatic editing and imputation of qualitative data based on the Fellegi-Holt methodology –Heuristic algorithm to extend DIA system to continuous and integer data –To modify the treatment of the systematic errors to avoid possible re-imputations

The Heuristic Algorithm It gives a solution to the error localisation (EL) problem without having to calculate the Complete Set of Edits (CSE) It avoids for most real cases to have to break-down the Set of Explicit Edits (SEE)

The Heuristic Algorithm When a record R 0 fails an edit we determine the minimum set (MS) of variables to impute working not with the CSE but with the SEE plus, if it is necessary a small set of implicit edits (SIE) that is specially required to impute the erroneous record R 0

The Heuristic Algorithm Labour Survey: Variables: 146 Explicit edits: 1,500 Valid values: 3,521

The Heuristic Algorithm Current version DIA New version DIA Break-down SEE 51 Number of imputations 27,42126,825

Treatment of systematic errors DIA system contains a module aimed at processing systematic errors: Rules of Deterministic Imputation (RDIs) Example: We assume that a systematic error arises if a record has the values A=1, B=2 and C=3 and if so, we impute the value ‘Blank’ to the variable B

Treatment of systematic errors RDI example: On the left of equal sign we express the systematic error and on the right one we determine the imputation

Treatment of systematic errors Current version: Firstly, DIA system executes RDIs After DIA system imputes data following the Fellegi-Holt methodology The gap between both types of processes can bring about possible re-imputations To avoid them we define a new edit named Deterministic Imputation Edit (DIE)

Treatment of systematic errors Steps to convert a RDI into DIE 1) The failure condition imposed on the Deterministic Imputation (DI) variable in the RDI is converted to the failure condition imposed on a new variable named the image of the DI variable: IMA_DIA in the DIE 2) The complement (¬) to the imputation in the RDI is converted to a failure condition imposed on the DI variable in the DIE

Treatment of systematic errors RDI example: DIE example: Both edits express the same and DIE matches the normal form of edit required on the Fellegi-Holt model

Treatment of systematic errors DIA system calculates the MS of variables to impute taking into account both types of errors together Given that the MS cannot contain repeated variables the possibility of re-imputations disappears

Conclusions (I) The heuristic algorithm presented permits to extend the DIA system to quantitative data It avoids for most real cases to have to break-down the SEE into several subsets reducing the number of imputations

Conclusions (II) The DIE allows to integrate edits expressing systematic errors with edits expressing random errors according to Fellegi-Holt model and thus we can apply the DIA system simultaneously to both type of errors avoiding possible re- imputations