Presentation is loading. Please wait.

Presentation is loading. Please wait.

Imputation in UNECE Statistical Databases: Principles and Practices

Similar presentations


Presentation on theme: "Imputation in UNECE Statistical Databases: Principles and Practices"— Presentation transcript:

1 Imputation in UNECE Statistical Databases: Principles and Practices
Steven Vale and Heinrich Brüngger, UNECE Statistical Division

2 Contents The ECOSOC view of statistical imputation Current practices
Basic principles Step-by-step implementation Conclusions and open questions 14 November 2018

3 ECOSOC views Resolution 2006/6 on strengthening statistical capacity
Sets limits for the use of imputation ... but also implicitly endorses it as a statistical technique Statistical agencies need to review their practices to ensure compliance

4 Defining imputation “A procedure for entering a value for a specific data item where the response is missing or unusable” Boundary issues: Imputing and editing Imputing and forecasting

5 Current practice in UNECE
Very limited ad-hoc imputation Four cases: Account identities Regional aggregates Poor quality national data with little impact on region totals Re-classification Using imputations from others Sufficient transparency in source metadata?

6 Basic principles (1) Imputed national data are not published
Avoids the need for consultation Only official sources used for imputation Preference for data from same country Clear distinction between “real” and imputed data Transparency – imputed data clearly flagged, and methods documented

7 Basic principles (2) Aggregates must contain > 90% “real” data, covering > 50% of countries Imputed data are re-calculated periodically to adjust for revisions Method used defined at the level of the variable and stored as an attribute Decisions on the use of imputation to be taken with regard to the quality framework

8 Step-by-step application
Automatic imputation routines to extend imputation towards the boundaries set by the ECOSOC Resolution One step at a time, with pause and review to consider quality and cost / benefit “Dashboard” to allow statisticians to choose the most appropriate method Implemented in the context of re-engineering of statistical database system

9 First step Use a linear trend to impute missing values Requirements:
Sufficient time series observations (at least 3 out of previous 5 periods) Closeness of fit of linear trend (R2 close to 1) Constraints Validity of R2 for few observations Forward imputation only

10 2000 2001 2002 2003 2004 2005 2006 2007 N Y Data Available: Y = Yes N = No Imputation: = Yes = No

11 Next steps More flexibility:
Longer time series Imputing values at start and in middle of time series Non-linear trends? Cross-country imputation in strictly limited cases?

12 Conclusions Strong links between imputation and quality
Trade-off between accessibility and accuracy Step-by-step, pause and review approach seems appropriate Transparency is essential Standardization of practices between international organizations would help

13 Open questions Are other organizations interested in defining a common policy on the use of imputation, in response to the ECOSOC Resolution? Could we go further and consider harmonization of methods and tools? How should this be done? Is a specific forum needed, or can this be dealt with in combination with work on data quality? Have other organizations modified their policies on imputation in the light of the ECOSOC Resolution, and if so, how?


Download ppt "Imputation in UNECE Statistical Databases: Principles and Practices"

Similar presentations


Ads by Google