StatisticalData Editing Anders Norberg, Statistics Sweden (SCB) 2010-11-23.

Slides:



Advertisements
Similar presentations
Work Session on Statistical Data Editing Paris, France, April 2014 Topic (i): Selective editing / macro editing Experiences from Selective Editing.
Advertisements

Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy
1 QUANTITATIVE DESIGN AND ANALYSIS MARK 2048 Instructor: Armand Gervais
Workshop on Energy Statistics, China September 2012 Data Quality Assurance and Data Dissemination 1.
15 de Abril de A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical.
Internal Control Concepts Knowledge. Best Practices for IT Governance IT Governance Structure of Relationship Audit Role in IT Governance.
Rapid Analysis Farrokh Alemi, Ph.D.. Analysis takes time and reflection People must be lined up and their views sought. People must be lined up and their.
RESEARCH METHODS Lecture 19
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Multiple Indicator Cluster Surveys Data Interpretation, Further Analysis and Dissemination Workshop Overview of Data Quality Issues in MICS.
Multiple Indicator Cluster Surveys Survey Design Workshop Data Analysis and Reporting MICS Survey Design Workshop.
Maintenance of Selective Editing in ONS Business Surveys Daniel Lewis.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Vienna, 23 April 2008 UNECE Work Session on SDE Topic (v) Editing on results (post-editing) 1 Topic (v): Editing based on results Discussants: Maria M.
The Edit Anders Norberg, Statistics Sweden (SCB) Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Electronic reporting in Poland 27th Voorburg Group Meeting Warsaw, Poland October 1st to October 5th, 2012 Central Statistical Office of Poland.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
M ETADATA OF NATIONAL STATISTICAL OFFICES B ELARUS, R USSIA AND K AZAKHSTAN Miroslava Brchanova, Moscow, October, 2014.
Quantitative Research 1: Sampling and Surveys Dr N L Reynolds.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Chapter Twelve Census: Population canvass - not really a “sample” Asking the entire population Budget Available: A valid factor – how much can we.
A generic tool to assess impact of changing edit rules in a business survey – SNOWDON-X Pedro Luis do Nascimento Silva Robert Bucknall Ping Zong Alaa Al-Hamad.
Eng.Mosab I. Tabash Applied Statistics. Eng.Mosab I. Tabash Session 1 : Lesson 1 IntroductiontoStatisticsIntroductiontoStatistics.
The Adoption of METIS GSBPM in Statistics Denmark.
Chapter Thirteen Validation & Editing Coding Machine Cleaning of Data Tabulation & Statistical Analysis Data Entry Overview of the Data Analysis.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
Support for design of statistical surveys at Statistics Sweden
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
Applying Process Indicators to Monitor the Editing Process.
A Strategy for Prioritising Non-response Follow-up to Reduce Costs Without Reducing Output Quality Gareth James Methodology Directorate UK Office for National.
The application of selective editing to the ONS Monthly Business Survey Emma Hooper Office for National Statistics
1 MODERNIZATION OF BELARUSIAN STATISTICS _________________________________________________ IMPLEMENTATION OF THE PROCESS APPROACH IN ORGANIZING THE STATISTICAL.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Statistical Data Editing Anders Norberg, Statistics Sweden (SCB)
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
1 Selective data editing Development & implementation Q 2010 Helsinki Jörgen Svensson Process Owner Statistics Sweden (SCB)
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
Paul P. Biemer RTI International Lars E. Lyberg Statistics Sweden I ntroduction to S urvey Q uality.
Workshop on Price Index Compilation Issues February 23-27, 2015 Data Collection Issues Gefinor Rotana Hotel, Beirut, Lebanon.
A Quality Driven Approach to Managing Collection and Analysis
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.
1 Towards a common statistical enterprise architecture Ongoing process reengineering at Statistics Sweden Service Oriented Architecture – SOA Sharing of.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
A selective editing method considering both suspicion and potential impact, developed and applied to the Swedish foreign trade statistics Topic (ii), WP.
Recent development in the metadata area at Statistics Sweden Klas Blomqvist
1 Processorientated statistical production IAOS Conference, October 16, 2008 Åke Bruhn, Director, Process Dept, Statistics Sweden.
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
Testing the use of administrative data to edit the 2009 Agriculture Census Dolores Lorca National Statistical Institute of Spain.
Evaluating the benefits of using VAT data to improve the efficiency of editing in a multivariate annual business survey Daniel Lewis.
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
1 Process Orientation at statistics Sweden – Implementation and Initial Experiences IAOS Conference, October 15, 2008 Mats Bergdahl, Deputy Director Process.
TRITON - An event driven SOA architecture MSIS Jakob Engdahl, Statistic Sweden
Producer Price Indices in Denmark - Producer price index for commodities (PPI) March 20 th 2015.
FDI - Imputation. Overview Introduction Overview of Imputation Methods Overview of Outliering methods Overview of Estimation methods Aggregation Disclosure.
Implementation of Quality indicators for administrative data
Survey phases, survey errors and quality control system
Improving the efficiency of editing in ONS business surveys
Survey phases, survey errors and quality control system
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Validation at Statistics Sweden
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Mapping Data Production Processes to the GSBPM
Metadata used throughout statistics production
Presentation transcript:

StatisticalData Editing Anders Norberg, Statistics Sweden (SCB)

Papers by our colleague Leopold Granquist Granquist (1984). On the role of editing. Statistical Review 2 Granquist (1997). The New View on Editing. International Statistical Review Granquist and Kovar (1997). Editing of Survey Data: How Much is Enough? In Survey Measurement and Process Quality. Wiley.

If … we only want information from businesses that we know they have, and we ask for that information so they understand, and we motivate them to deliver as good quality in data as possible, and we help them to avoid accidental errors in answering questionnaires, then editing would be a minor process! 3

The role of editing Quality Control of the measurement process –Find errors (efficient controls) –Consider every identified error as a problem for the respondent to deliver correct data by our collection instrument –Identify sources of error (process data) –Analyse process data – communicate with cognitive specialists Contribute to quality declaration Adjust (change/correct) significant errors 3

Types of errors  Obvious errors / Fatal errors  Non-valid values  Item non-response  Data structure- or model errors, total≠sum of components  Contradictions  Suspected data values  Deviation errors (Outliers) Suspiciously high/low values, data outside of predetermined limits  Definition errors (Inliers) Many respondent miss-understand a question in the same way Many respondents fetch data from info-systems with other definitions

Suspected data values Deviation errors Manual follow-up takes time and is expensive Few deviation errors have impact on output statistics (low hit-rate, many changes in data have very little impact) Editing must have impact on the output! Remember response burdon !

Suspected data values Definition errors (Inliers) Difficult to find Ways to find them:  Combined editing for several surveys  Deep interviews in focus groups  Use statistics from FEQ and from re-contacts with respondents  High proportions of item non-response  Graphical editing  Good examples

The Process Perspective Analyze and act Measure the effects Collect Process data

The Process Perspective Audit and improve data collection (measurement instrument and collection process) and the editing process itself Un-edited data must be saved in order to produced important process indicators, as hit- rate and impact on output!

Process data Sources of errors = problem for the respondents Suspicions Error codes Manuel actions (accept / amended values) Automatic actions

Process indicators Sources of errors (problem for the respondents) Prop. of flagged units and variables Prop. of manually and automatically reviewed units and variables Prop. of amended values and impact of the changes, per variable Hit-rate for edits

12 Establish needs 1 Plan and design 2 Create and test 3 Collect 4 Prepare and process 5 Analyse 6 Report and communicate 7 Statistical Production Process survey statistical needs 1.1 Affirm customer needs 1.2 Develop table plan 1.3 Identify data sources 1.4 Examine disclosure 1.5 Carry out market process 1.6 Prepare data and statistical values for dissemination 7.1 produce final output 7.2 Report and communicate final output 7.3 handle inquiries 7.4 Market final output 7.5 Classify and code micro data 5.1 Check micro data 5.2 Impute for Non-response 5.3 Complement Data set 5.4 Calculate weights 5.5 Establish final observation register 5.6 Create frame and draw sample 4.1 Handle respondent issues 4.2 Prepare data collection 4.3 Carry out data collection 4.4 Transfer and store data electronically 4.5 Plan and design table plan 2.1 Plan and design data collection 2.3 Plan and design data processing 2.4 Plan and design analysis and reporting 2.5 Plan production flow 2.6 Design production system 2.7 Create and test Measurem. instrument 3.1 Develop existing and building new production tools 3.2 Assure communication between production tools 3.3 Test production system 3.4 Carry out pilot test 3.5 Implement production tools 3.6 Implement production system 3.7 Produce statistical values 6.1 Quality assurance of produced statistics 6.2 Interpret and explain 6.3 Prepare contents for reporting and communication 6.4 Establish contents for reporting and communication 6.5 Plan and design frame/population and sample 2.2

13 Establish needs 1 Plan and design 2 Create and test 3 Collect 4 Prepare and process 5 Analyse 6 Report and communicate 7 Statistical Production Process survey statistical needs 1.1 Affirm customer needs 1.2 Develop table plan 1.3 Identify data sources 1.4 Examine disclosure 1.5 Carry out market process 1.6 Prepare data and statistical values for dissemination 7.1 produce final output 7.2 Report and communicate final output 7.3 handle inquiries 7.4 Market final output 7.5 Classify and code micro data 5.1 Check micro data 5.2 Impute for Non-response 5.3 Complement Data set 5.4 Calculate weights 5.5 Establish final observation register 5.6 Create frame and draw sample 4.1 Handle respondent issues 4.2 Prepare data collection 4.3 Carry out data collection 4.4 Transfer and store data electronically 4.5 Plan and design table plan 2.1 Plan and design data collection 2.3 Plan and design data processing 2.4 Plan and design analysis and reporting 2.5 Plan production flow 2.6 Design production system 2.7 Create and test Measurem. instrument 3.1 Develop existing and building new production tools 3.2 Assure communication between production tools 3.3 Test production system 3.4 Carry out pilot test 3.5 Implement production tools 3.6 Implement production system 3.7 Produce statistical values 6.1 Quality assurance of produced statistics 6.2 Interpret and explain 6.3 Prepare contents for reporting and communication 6.4 Establish contents for reporting and communication 6.5 Plan and design frame/population and sample 2.2

14 Establish needs 1 Plan and design 2 Create and test 3 Collect 4 Prepare and process 5 Analyse 6 Report and communicate 7 Statistical Production Process survey statistical needs 1.1 Affirm customer needs 1.2 Develop table plan 1.3 Identify data sources 1.4 Examine disclosure 1.5 Carry out market process 1.6 Prepare data and statistical values for dissemination 7.1 produce final output 7.2 Report and communicate final output 7.3 handle inquiries 7.4 Market final output 7.5 Classify and code micro data 5.1 Check micro data 5.2 Impute for Non-response 5.3 Complement Data set 5.4 Calculate weights 5.5 Establish final observation register 5.6 Create frame and draw sample 4.1 Handle respondent issues 4.2 Prepare data collection 4.3 Carry out data collection 4.4 Transfer and store data electronically 4.5 Plan and design table plan 2.1 Plan and design data collection 2.3 Plan and design data processing 2.4 Plan and design analysis and reporting 2.5 Plan production flow 2.6 Design production system 2.7 Create and test Measurem. instrument 3.1 Develop existing and building new production tools 3.2 Assure communication between production tools 3.3 Test production system 3.4 Carry out pilot test 3.5 Implement production tools 3.6 Implement production system 3.7 Produce statistical values 6.1 Quality assurance of produced statistics 6.2 Interpret and explain 6.3 Prepare contents for reporting and communication 6.4 Establish contents for reporting and communication 6.5 Plan and design frame/population and sample 2.2

Average proportions of costs of sub-processes Process Proportion of total cost (%) All products Short-period Annual surveys and periodic Respondent service Manual pre-editing Data-registration editing Production editing Output editing Total editing cost

Web data collection Demands: High hit-rate in electronic questionnaires System that can measure hit-rate? Question: Can it be a goal for us to move all editing to electronic data collection?

Expectation on the production editing process at Stat. Sweden Generic IT-tools –Less IT-maintenance –Easier planning of work and personnel at Data collection units –Better working environments –Methodology studies Efficient editing methods –Selective/significance editing –Better working environments –Less response burden Collection and analysis of process data –Continuous improvement of data collection and editing processes –Information for quality declaration of statistics

Input, throughput, output

Impact Actual impact = w ( y_une – y_edi) for observation k is the impact on domain-total T if y_une is kept instead of making a review to find y_edi. Potential impact = w (y_edi – y_pred) is a proxy for actual impact to be used in practice, as y_edi will not be known until review. y_pred is a prediction (expected value) for y_edi. Anticipated (expected) impact (per domain, variable, observation) is the product of suspicion and potential impact.

Suspicion: Traditional edits

Selective editing Potential impact Suspicion 01 Flagged 19

1

5

Predicted (expected) values Data / predictor Time series Previous value Forecast Cross section Mean/standard error Median/quartile Edit groups All data Blue collar workers White collars Monthly pay Weekly pay Profession=3111 Profession=3112 Payment by the hour Monthly pay Weekly pay Payment by the hour Profession =1Profession= 2Profession= 3Profession=9 Profession=3480 MenWomen Profession=

Suspicion R= Suspicion=R/(TAU+R) Susp

Score function Local score, by domain d, variable j & observed unit k,l is the anticipated impact related to an appropriate measure of size for the domain/variable, say standard error of estimate. VIOLIN j = weights for variables (j) CLARINET c = weights for classifications (domains) c(d) OBOE j = adjustment for size of estimated total or its standard error (j) LScore d,j,k,l = Suspicion j,k,l x(Potential impact) x CELLO d(c),j 27

Score function Global scores are aggregated local scores by domain, variable and possibly second stage units to one score for each primary unit or respondent. Methods: sum, sum of squares, sum of local scores truncated by local thresholds, maximum etc. 28

Evaluation Relative pseudo-bias is a measure of error in output due to incomplete data review

Evaluation Psedobias for PPI relative to the overall price index. Observation units ordered in descending order of impact.

Editing – remaining methodology issues  Fatal errors –Classifying variables –Survey variables  Confidence (respondents and clients)  New and old respondents  Edited in earlier processes –Web-questionnaires –Scanned paper questionnaires  Data and methods for computing predicted values etc.  Homogenous groups  Priorities; variables, domains (from the clients perspective)  Score functions  How to decide threshold values  Sampling below threshold –Inference –Data for evaluation

SELEKT Parameters (in) Give AUTO - SELEKT the parameters: %let PATH_sys=C:\SELEKT\1.0; %let PATH_app=C:\SELEKT\Prod\Demo1_Enterprise; %let EDIT_parms=Ent_Parms1; %let EDIT_data=Demo1_Adap_Ent_Inflowdata; %let EDIT_T1_Value=2009; %let EDIT_T2_Value=1; By the parameter table &EDIT_parms AUTO - SELEKT knows what to do.

SELEKT Error list (out) Identification: Column name = Variable Id1 = Identity for respondent[optional] Id2 = Identity for primary sampling unit (PeOrgnr, CfarNr etc.) Id3 = Identity for observational unit (Social security number, CN8 for products) EditNumber = Edit identification, if the edit flags for suspicions or obvious error. Timestamp = Time when the questionnaire passes SELEKT Process data: EditFlag = 0 = accepted, 1-5 = error flagged EditSuspicion = Suspicion generated by continuous edits. Score1 = (Local) Score for respondent[optional] Score2 = (Local/Global) Score for primary sampling unit [optional] Score3 = (Global) Score for observational unit [optional] N_Obs = Number of observations, which have gone through the edit round. N_Obs_Flagg = Number of error flagged observations in the PSU, on this list N_PSU = Number of PSU for the respondent, which have passed the edit round N_PSU_Flagg = Number of error flagged primary sampling units, on this list

Edits EDIT GROUP AND ACCEPTANCE REGION Edit identification Edit group Acceptance range EDIT Edit identification Type of edit Active Section Internal error message External error message Instruction for data review Un-edited test variable Error flag KEY Edit identification Survey variable IMPACT ON STATISTICS Survey variable Potential impact on statistics FLAGGING OBJECTS EDIT PRACTICAL SUPPORT Edit identification Standard edit rule Edited test variable Suspicion probability value produced by the SELEKT system 2 1