Automatic Editing Data. A New Version of DIA System Prepared by J.M. Gomez Presented by D.Lorca National Statistical Institute of Spain
Summary DIA system: Generalized software for automatic editing and imputation of qualitative data based on the Fellegi-Holt methodology –Heuristic algorithm to extend DIA system to continuous and integer data –To modify the treatment of the systematic errors to avoid possible re-imputations
The Heuristic Algorithm It gives a solution to the error localisation (EL) problem without having to calculate the Complete Set of Edits (CSE) It avoids for most real cases to have to break-down the Set of Explicit Edits (SEE)
The Heuristic Algorithm When a record R 0 fails an edit we determine the minimum set (MS) of variables to impute working not with the CSE but with the SEE plus, if it is necessary a small set of implicit edits (SIE) that is specially required to impute the erroneous record R 0
The Heuristic Algorithm Labour Survey: Variables: 146 Explicit edits: 1,500 Valid values: 3,521
The Heuristic Algorithm Current version DIA New version DIA Break-down SEE 51 Number of imputations 27,42126,825
Treatment of systematic errors DIA system contains a module aimed at processing systematic errors: Rules of Deterministic Imputation (RDIs) Example: We assume that a systematic error arises if a record has the values A=1, B=2 and C=3 and if so, we impute the value ‘Blank’ to the variable B
Treatment of systematic errors RDI example: On the left of equal sign we express the systematic error and on the right one we determine the imputation
Treatment of systematic errors Current version: Firstly, DIA system executes RDIs After DIA system imputes data following the Fellegi-Holt methodology The gap between both types of processes can bring about possible re-imputations To avoid them we define a new edit named Deterministic Imputation Edit (DIE)
Treatment of systematic errors Steps to convert a RDI into DIE 1) The failure condition imposed on the Deterministic Imputation (DI) variable in the RDI is converted to the failure condition imposed on a new variable named the image of the DI variable: IMA_DIA in the DIE 2) The complement (¬) to the imputation in the RDI is converted to a failure condition imposed on the DI variable in the DIE
Treatment of systematic errors RDI example: DIE example: Both edits express the same and DIE matches the normal form of edit required on the Fellegi-Holt model
Treatment of systematic errors DIA system calculates the MS of variables to impute taking into account both types of errors together Given that the MS cannot contain repeated variables the possibility of re-imputations disappears
Conclusions (I) The heuristic algorithm presented permits to extend the DIA system to quantitative data It avoids for most real cases to have to break-down the SEE into several subsets reducing the number of imputations
Conclusions (II) The DIE allows to integrate edits expressing systematic errors with edits expressing random errors according to Fellegi-Holt model and thus we can apply the DIA system simultaneously to both type of errors avoiding possible re- imputations