Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.

Similar presentations


Presentation on theme: "Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute."— Presentation transcript:

1 Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute UN/ECE Work Session on Statistical Data Editing Ottawa, 16-18 May 2005

2 Outline Introduction The simulation approach Perfomance indicators An example: the Istat software ESSE

3 Quality of E&I = Accuracy  accuracy at micro level Capability of editing of correctly identifying errors / the capability of imputation of correctly recovering true data  accuracy at macro level Capability of editing/imputation of preserving the data distributions and target estimates  true  The quality of E&I in terms of accuracy can be measured only when it is possible to compare the edited and imputed data with the corresponding true ones

4 Why evaluating the quality of E&I  Analysis of the performance of an editing/imputation method  for a specific type of data/error  under different data/error scenarios  Improve the performance of an editing/imputation method for a specific type of data/error  Choose among alternative editing/imputation methods for a specific type of data/error

5 “E&I represent additional sources of non sampling errors in the statistical production process” The evaluation framework True values Observed (corrupted) values Localized errors Final values ? ? ? ? ? ? ? Error/missing mechanisms Editing model Imputation model (Super-population/ Finite populatoin)

6  The evaluation of the quality of editing and/or imputation has to be performed taking into account the other mechanisms involved in the statistical production process  This correspond to measuring the effects on data induced by the editing and/or the imputation mechanisms conditionally to the other mechanisms influencing the survey results Evaluating the quality of E&I

7 The simulation approach Artificial generation of some of the key elements of the evaluation framework based on predefined mechanisms/models  Controlled experiments  data distributions and data relations  error and missing data mechanisms  error and missing data incidence  Variability due to each stochastic mechanism (repeated simulations)  Low cost

8 The simulation approach  High modelling effort – true data – raw data

9 Simulation of true data Let (X 1, …, X p ) be a random variable following the probability function F(x 1, …, x p ;   F(x 1, …, x p ;  )  unknown  parametric approaches (specify a data model; estimate parameters; re-sampling techniques)  non parametric approaches (no assumptions; re-sampling techniques)

10 Simulation of true data Additional problems:  Modelling multivariate distributions (reproducing joint relations/dependencies between variables)  Modelling asymmetric multivariate distributions  Modelling under edit constraints

11 Simulation of raw data Parametric/non parametric approaches:  Generating missing data  Generating errors (deviations from true data)

12 Simulation of missing data  Assumptions on non response mechanisms (MCAR, MAR, NMAR)  Assumptions on the incidence of non response (non response rates)  In multivariate contexts, modelling patterns of non response  Assumptions on multivariate non response mechanisms (e.g. independence)  Assumptions on rates of non response patterns

13 Simulation of errors  Assumptions on error mechanism (EAR, ECAR, ENAR)  Assumptions on the incidence of errors (error rates)  Assumptions on the intensity of errors (error magnitude; intermittent nature of errors)  In a multivariate context, modelling error patterns:  Assumptions on multivariate error mechanisms (e.g. independence)  Assumptions on rates of error patterns  Overlapping mechanisms (e.g. stochastic+ systematic)  Simulation of errors under constraints

14 How to measure: evaluation indicators under the simulation approach  Evaluation objectives  Accuracy at micro level  Accuracy w.r.t. distributions and target estimates  Indicators  Level (micro/macro; local/global)  Identification  Priority

15 An Istat tool for evaluating E&I under the simulation approach ESSE (Editing Systems Standard Evaluation) system (SAS language + SAS/AF environment)  Module for raw data simulation  Module for evaluation

16 Module for raw data simulation  Approach: non parametric  Missing data mechanisms: MCAR, MAR and independent non responses  Error mechanisms: Completely At Random (ECAR) and independent errors (e.g. Misplacement errors, Interchange of values, Interchange errors, Loss or addition of zeroes,….)

17 Module for evaluation Assumptions  Editing is a classification procedure that assigns each raw value into one of two states: -(1) acceptable -(2) not acceptable  Imputation affects only values previously classified by the editing process as unacceptable.  Imputation is successful if the new assigned value is equal to the original one

18 Module for evaluation  Evaluation objective: assessing the accuracy of E&I at micro level (capability to detect as many errors as possible; capability to to restore the true values)  Evaluation approach: single application of E&I (no variability)  Evaluation level: micro level  Indicators: local indicators (hit rates) based on the number of detected, undetected, introduced and corrected errors

19 Future work at ISTAT  Identify standard measures to assess the accuracy of E&I at macro level  Simulating multivariate patterns of errors/missing values (dependent errors/non response)  Evaluating the impact of E&I on variability at micro/macro level


Download ppt "Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute."

Similar presentations


Ads by Google