Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.

Slides:



Advertisements
Similar presentations
Paul Smith Office for National Statistics
Advertisements

ESSnet on Data Warehousing - WP2 Overview Amsterdam September 2013.
Some considerations on developing a DWH for SBS estimates Orietta Luzi – Mauro Masselli Istat - Italy march 2013.
CAST Project funded by the European Commission, Directorate-General Energy & Transport, under the 6th RTD Framework Programme CAST Plenary Meeting 30 September.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands.
Results and next steps from the ESSnet Admin Data Alison Pritchard Business Outputs & Developments, Office for National Statistics, UK 4 December 2012.
UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.
1 Editing Administrative Data and Combined Data Sources Introduction.
1 Methods for detecting errors in VAT Turnover data Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology.
Pieter Vlag ESSnet DWH: business register. Outline Central role of the  statistical units,  population frame, which includes number of enterprises,
Maintenance of Selective Editing in ONS Business Surveys Daniel Lewis.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
The use and convergence of quality assurance frameworks for international and supranational organisations compiling statistics The European Conference.
Vienna, 23 April 2008 UNECE Work Session on SDE Topic (v) Editing on results (post-editing) 1 Topic (v): Editing based on results Discussants: Maria M.
Eurostat Statistical Data Editing and Imputation.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)
The Edit Anders Norberg, Statistics Sweden (SCB) Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic The use of administrative data sources (experience and challenges)
Marina Signore Head of Service “Audit for Quality Istat Assessing Quality through Auditing and Self-Assessment Signore M., Carbini R., D’Orazio M., Brancato.
Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg,
Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University.
Emerging methodologies for the census in the UNECE region Paolo Valente United Nations Economic Commission for Europe Statistical Division International.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic 1 Subsystem QUALITY in Statistical Information System Czech.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
ESS-net DWH ESSnet DWH - Metadata in the S-DWH Harry Goossens – Statistics Netherlands Head Data Service Centre / ESSnet Coordinator
Sander Scholtus and Leon Willenborg Editing and Imputation in the Memobust Handbook.
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
Explaining the statistical data warehouse (S-DWH)
ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.
Recommended Practices for Editing and Imputation in the European Statistical System: the EDIMBUS Project* Orietta Luzi (Istat, Italy) Ton De Waal (Statistics.
Direction and system changes impacting on data editing and imputation at Statistics New Zealand Paper by Emma Bentley and Felibel Zabala, presented by.
Cristina Casciano, Viviana De Giorgi, Filippo Oropallo Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
ESSnet AdminData Methods of estimation for business statistics variables that cannot be obtained from administrative data sources (WP3) Duncan Elliott.
New sources – administrative registers Genovefa RUŽIĆ.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
Session topic (iii) – Editing and Imputation in the context of data integration from multiple sources and mixed modes Discussants Felipa Zabala, Orietta.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.
Outlier Treatment in HCSO Present and future. Outline Outlier detection – types, editing, estimation Description of the current method Alternatives Future.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Work packages SGA II ESSnet on microdata linking and data warehousing in statistical production Harry Goossens – Statistics Netherlands Head Data Service.
Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.
STS Compilation with Multiple Data Sources Anu Peltola Economic Statistics Section, UNECE UNECE Workshop on Short-Term Statistics (STS) and Seasonal Adjustment.
ESS-net DWH ESSnet on microdata linking and data warehousing in statistical production Harry Goossens – Statistics Netherlands Head Data Service Centre.
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Generic Statistical Data Editing Models (GSDEMs) Workshop on the Modernisation of Official Statistics The Hague, 24 November 2015.
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
ESS-net DWH ESSnet on microdata linking and data warehousing in statistical production.
Evaluating the benefits of using VAT data to improve the efficiency of editing in a multivariate annual business survey Daniel Lewis.
Harry Goossens Centre of Competence on Data Warehousing.
Framework Regulation Integrating Business Statistics (FRIBS)
1 Chapter 2 SW Process Models. 2 Objectives  Understand various process models  Understand the pros and cons of each model  Evaluate the applicability.
Session topic (i) – Editing Administrative and Census data Discussants Orietta Luzi and Heather Wagstaff UNECE Worksession on Statistical Data Editing.
Theme (v): Managing change
Improvements in editing methods and processes for use of Value Added Tax data in UK National Accounts Martina Portanti and Robert Breton Office for National.
Survey phases, survey errors and quality control system
Improving the efficiency of editing in ONS business surveys
Survey phases, survey errors and quality control system
Guidelines on the use of estimation methods for the integration of administrative sources WG Methodology 2018/05/03.
6.1 Quality improvement Regional Course on
3.4 Modernisation of Social Statistics
Presentation transcript:

Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy

Overview Introduction Related projects Combining data sources Selective editing – data sources and tools Selective editing in SDWH Framework Proposed case studies Deliverable outcomes and recommendations

Introduction Selective editing options for a Statistical Data Warehouse – including options for weighting the importance of different outputs UK and Italy Review or quality assure – Sweden (SELEKT) Q1: Would you like to review and give comments? (Yes/No)

Statistical Data Warehouse (SDWH) Benefits: – Decreased cost of data access and analysis – Common data model – Common tools – Drive increased use of administrative data – Faster and more automated data management and dissemination

Statistical Data Warehouse (SDWH) Drawbacks: – Can have high cost – maintenance and implement changes – Tools may need to be developed for statistical processes – Methodological issues of SDWH framework – covered by WP2 Phase 1 (SGA-1)  “Work in progress” for most NSIs

Combining data sources Many NSIs using admin data or registers to produce statistics Advantages include: – Reduction in data collection and statistical production costs; large amount of data available; re-use data to reduce respondent burden. Drawbacks include: – Different unit types (statistical and legal); timeliness; variable definition discrepancies. Mixed source usually required

Editing UNECE Glossary of terms on Statistical Data Editing: – “an activity that involves assessing and understanding data, and the three phases of detection, resolving, and treating anomalies…” Large amount of literature on: – Editing business surveys – Editing administrative data

Aims and related projects This deliverable aims to add value by investigating how to edit (selective editing) when combining sources Mapping with other projects: – EssNet on Data Integration – EssNet on Administrative Data – MEMOBUST – EDIMBUS Project (2007) – EUREDIT Project ( ) – BLUE-ETS Q2: Do you know of any other relevant projects? (Yes/No)

Editing combined data sources SDWH will combine survey, register and admin data sources Editing required for: – maintaining business register and its quality; – a specific output and its integrated sources; – Improving the statistical system. Part of quality control in SDWH Split processes for data sources? (e.g. France)

Combined sources - Questions… Q3: Do you currently combine data sources? – A. Yes; B. No; C. Unsure. Q4: Do you have separate editing processes for each data source? – A. Only survey data edited (do not edit admin data); – B. Data sources edited separately; – C. Data sources edited separately, but units/variables in both sources edited for coherence; – D. Other.

Selective editing Editing – traditionally time consuming and expensive Selective / significance editing: – Prioritises based on score function that expresses the impact of their potential error on estimates – Score should consist of risk (suspicion) and influence (potential impact) components – Divide anomalies into a critical and a noncritical stream for possible clerical or manual resolution (possibly including follow-up) – More efficient editing process

Selective editing – Survey and Admin data Use as auxiliary data in selective editing score function for survey data (e.g. UK, Italy) Use score of differences between data sources to determine which need manual intervention (e.g. France) Use scores based on historical data Apply selective editing to admin data, same score function as survey data, but weights=1 (e.g. France SBS system)

Selective editing – question Q5: Is selective editing used in the processing of admin/register data at your organisation? – A. No; – B. No, but admin data used as auxiliary for selective editing of survey data; – C. No, but a score function is used to compare data sources; – D. Yes, selective editing is applied to admin data; – E. Not sure.

Selective editing – tools SELEMIX – ISTAT SELEKT – Statistics Sweden Significance Editing Engine (SEE) – ABS SLICE – Statistics Netherlands Q6: Are you aware of any other selective editing tools? – A. Yes, I can provide documentation; – B. Yes; – C. No.

Selective editing in SDWH Methodological issues: – Survey weight not meaningful in SDWH Weight=1? Several sets of weights tailored for different uses? – Selective editing data “without purpose” Importance weight for all potential uses? Alternative editing approach? – Scores to compare data sources Should score functions be used, or all discrepancies be followed up, or automatically corrected? – Selective editing of admin data – manual intervention? Is selective editing appropriate if manual intervention is not possible? Should automatic correction be applied to admin data identified as suspicious?

Any solutions? … Survey weights used in selective editing score not meaningful – Q7: What do you think would be the best options: A. Everything in SDHW represents itself and therefore weights=1 B. Calculate several survey weights for all known uses of unit data item and incorporate into one global score C. Calculate separate scores for all outputs, and combine (max, average, sum) D. Other – discuss!

Any solutions? … Selective editing data “without purpose” – Q8: Is selective editing appropriate if the data will be used multiple times, with unknown purpose at collection? A. No; B. No, another editing approach would be better; C. Yes, we would use key known/likely outputs to calculate the score; D. Yes, I can suggest/recommend a solution; E. Not sure;

Any solutions? … Scores to compare data sources – Q9: Should score functions be used to compare sources, or all discrepancies be followed up, or automatically corrected? A. All discrepancies need to be investigated by a data expert; B. All discrepancies need to be flagged, and can then be corrected automatically; C. Scores should be used to flag only significant/influential discrepancies, which should be investigated by a data expert; D. Scores should be used to flag only significant/influential discrepancies, which can then be corrected automatically; E. Other – discuss? F. Not sure.

Any solutions? … Selective editing of admin data – Q10: Is selective editing appropriate if manual intervention is not possible? A. No, only correct for fatal errors, systematic errors (e.g. unit errors), and suspicious reporting patterns; B. No, identify all errors/suspicious values and automatically correct/impute; C. Yes, identify only influential errors to avoid over editing/imputing admin source; D. Yes, as well as fatal errors, systematic errors and suspicious reporting patterns – to also identify influential errors; E. Other; F. Not sure.

Experimental studies ISTAT: Prototype DWH for SBS – Use SELEMIX – Combine statistical and admin data sources at micro level to estimate variables on economic accounts, known domains – Evaluate the quality of model-based selective editing and automatic correction – Re-use available data for other output ONS: Combined sources for STS – Use SELEKT – Monthly business survey and VAT Turnover data – Compare selective editing or traditional editing of admin data (followed by automatic correction), known domains – Re-use available data for other output

Deliverable outcome - recommendations Draft report put on CROS-portal – will include input from this workshop Provide recommendations for methodological issues of using selective editing in SDWH – Using best practice from NSIs, and – Outcome from experimental studies. Metadata checklist

Metadata requirements Input to editing: – Quality indictors (e.g. of data source) – Threshold for selective editing score – Potential publication domains – Question number – Predictor/Expected value for score (e.g. historical data, register data) – Domain total and/or standard error estimate for score – Edit identification – … Output from editing: – Raw and edited value – Selective editing score – Error number/description/type – Flag if suspicious – Flag if changed – …

Thank you!