UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

ENGENDERING POPULATION CENSUSES IN MALAWI Prepared by: Mylen Mahobe National Statistical Office (Malawi)
Harvard Center for Population and Development Studies1 Census Editing and the Art of Motorcycle Maintenance Michael J. Levin Center for Population and.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Data Collection Procedures Section A 1.
Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
Quality assurance -Population and Housing Census Alma Kondi, INSTAT, Albania.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
Multiple Indicator Cluster Surveys Data Interpretation, Further Analysis and Dissemination Workshop Overview of Data Quality Issues in MICS.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
National Statistical Office, Thailand 2-6 December 2013, Hanoi, Viet Nam Census Evaluation.
Vienna, 23 April 2008 UNECE Work Session on SDE Topic (v) Editing on results (post-editing) 1 Topic (v): Editing based on results Discussants: Maria M.
Arun Srivastava. Types of Non-sampling Errors Specification errors, Coverage errors, Measurement or response errors, Non-response errors and Processing.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 15.
Copyright 2010, The World Bank Group. All Rights Reserved. PROCESSING, Part 1 Data capture, editing, imputation and tabulation Quality assurance for census.
Tool for Assessing Statistical Capacity (TASC) The development of TASC was sponsored by United States Agency for International Development.
Post test survey of the General Census of Population and Housing.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys United.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Results Generated from.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
Data Capture Overview United Nations Statistics Division
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
1 Archiving Michael J. Levin Harvard Center for Population and Development Studies
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
First Thoughts on Editing in Mixed Modes in the 2011 Census Heather Wagstaff and Ruth Wallis Methodology Directorate Office for National Statistics, U.K.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Collection Section A 1.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Data Analysis.
2008 Population Census of Cambodia Post Enumeration Survey Mrs. Hang Lina Deputy Director General National Institute of Statistics, Min. of Planning Regional.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Design and Assessment of the Toronto Area Computerized Household Activity Scheduling Survey Sean T. Doherty, Erika Nemeth, Matthew Roorda, Eric J. Miller.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-I Evaluation of editing and.
POST ENUMERATION SURVEY TANZANIA EXPERIENCE BY Mrs RADEGUNDA MARO.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Addis.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Asunción,
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Asunción,
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok,
The 2011 Census: Estimating the Population Alexa Courtney.
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
Demographic Full Count Review Presentation to the FSCPE March 26, 2001 Washington D.C.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
1 Handbook on Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods,
National Population Commission (NPopC)
Methodologies and Procedures for Evaluating Coverage and Content Error Pres. 6 United Nations Regional Workshop on the 2010 World Programme on Population.
Multi-Mode Data Collection Approach
Post Enumeration Surveys Pres. 2
Data Capture Process Stages
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Multi-Mode Data Collection Approach
Treatment of Missing Data Pres. 8
Discrepancy Management
Multi-Mode Data Collection
Methodologies and Procedures for Evaluating Coverage and Content Error Pres. 6 United Nations Regional Workshop on the 2010 World Programme on Population.
A handbook on validation methodology. Metrics.
Presentation transcript:

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Editing: Introduction

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Objectives of Session Editing is the procedure for detecting and correcting errors from data. Imputation is the procedure of assigning values to missing or inconsistent data The objective of the session is to present an overview of the concepts and definitions, and discuss the application and issues

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Summary  Types of Errors in the census process  Objectives of Editing: Why do it?  How to and Why Edit? Some illustrative examples  Principles of Editing: How to do it  Fatal versus Query Edits  Micro-editing versus Macro-editing  Manual versus automatic editing  Impact of capture mode on editing  Pitfalls of Over-editing  Other considerations

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Types of Errors in the Census Process  Coverage Errors Incomplete/inaccurate maps or EAs Incomplete canvassing of all units Duplicate counting Omission of persons unwilling to be enumerated Erroneous treatment of visitors or non-resident aliens (especially in relation to de jure versus de facto methods) Loss or destruction of census records after enumeration ……

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Types of Errors in the Census Process  Content Errors Errors in questionnaire design Enumerator errors Respondent errors Coding errors Data entry errors Errors in computer editing Errors in tabulation

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Types of Errors in the Census Process  Two types of errors at processing stage: Those that block further processing and Those that produce invalid/ inconsistent results without interrupting logical flow of subsequent processing operations  ALL errors of first kind must be corrected and as many of second kind as possible

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Objectives of Editing : Why do it? Objectives of editing (Granquist, 1984) “Tidy up data” so as to facilitate analysis (creation of complete file) Identify types and sources of errors (for reporting on data quality) Improve quality of census data (for current and future census) Important not only to detect errors but also to identify causes, in order to take appropriate corrective measures and improve overall quality

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 How to Edit? TABLE 1: 2010 Population by Age and Sex, Unedited and Edited Unedited data Edited data Age groupTotalMaleFemaleSex Not reportedTotalMaleFemale Total4,1472,0332, ,1472,0432,104 Less than 15 years1, , to 29 years1, , to 44 years to 59 years to 74 years years and over Age Not reported15672

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 How to Edit? TABLE 1: Population by Age and Sex, Unedited and Edited  How to deal with data “not reported”? Distribute the age unknowns and the sex unknowns in same proportion as the corresponding known values For example, for 23 sex unknowns, distribute (2033/4147)*23 = 12 to males (and remaining 11 to females by subtraction); see RHS of Table 1 Similarly, distribute 15 age unknowns across 6 age groups in proportion to known values, see RHS of Table 1 This method could render biased results if number of unknowns (number of non-responses) high since distribution of knowns and unknowns may be very different An improved strategy would be to use multivariate distributions involving other variables such as relationship between spouses, having a positive entry for number of children born, etc,

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Why Edit? TABLE 2: Population by Age with Unknowns for 2000 and 2010 Age groupNumbers Percent Total4,1473, Less than 15 years1,6391, to 29 years1, to 44 years to 59 years to 74 years years and over Age Not reported

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Why Edit? TABLES 2 and 3: Population by Age with and without Unknowns for 2000 and 2010  Another problem is that unknowns may affect the analysis of trends  In Table 2, if unknowns not taken into account, percentage of persons aged years appears to increase from 27.2% in 2000 to 30.3% in 2010  Redistributing unknowns may change this trend  In Table 3, after distributing unknowns, there is only an increase from 28.7% in 2000 to 29.3% in 2010

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Why Edit? TABLE 3: Population by Age without Unknowns for 2000 and 2010 Age groupNumbersPer cent Total4,1473, Less than 151,7431, to 29 years1, to 44 years to 59 years to 74 years years and over

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Principles of Editing : How to do it  In general the editing system should be: Minimalist (change only obvious errors and as few as possible) Automated (as much as possible, for both detection and correction) Systematic Consistent with other NSO statistical collections Compliant with UN or other international standards

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Fatal versus Query Edits  Types of edits: Fatal Edits: identify errors with certainty Query Edits: identify suspected errors  Fatal Edits identify fatal errors, which include invalid or missing entries as well as errors due to inconsistencies  Query Edits identify data items that fall outside subjective data bounds, or items that are relatively high or low as compared with other data on the same questionnaire  Fatal edits must be resolved but query edits more difficult to correct, have fewer benefits than the detection and resolution of fatal edits, and add more to the cost of the process  For query edits, subject-matter specialists should investigate edits developed for pilot censuses and those developed during processing to make sure that individual edit have the expected cost of census evaluation (e.g., look at hit rates or share of flags that result in changes to the original data)

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Micro-editing versus Macro-editing  Micro-editing: concerns ways to ensure validity and consistency of individual data records and relationships between records in a household  Macro-editing: checks aggregated data to make sure that they are reasonable  Example, If census results show large percentage of persons without a reported age, imputing for age (at micro level) will produce a complete data set.  BUT far more essential to make checks at macro (aggregate) level to ensure that imputation does not skew overall age distribution

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Impact of Capture Mode on Editing  Types of capture modes typically used: manual (key-entry), OMR, OCR/ICR, PDA, Internet  For key-entry, PDA, Internet: some limited detection and correction of errors can be done in “real time”  Not possible for OMR or OCR/ICR (from paper questionnaire) with scanning; limited to “batch editing” after the fact

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Manual versus Automated Editing  Manual edits may be done in several places along the editing chain – by enumerator, supervisor, field office worker, coder, key entry clerk, etc  Disadvantage is that manual editing expends enormous amount of time (months or years), energy (human resources) and cost  If data set is small, timing not so crucial and work force available, then manual editing may be feasible  Automated editing reduces time required, decreases introduction of human error, and allows for creation of edit trail (and is therefore reproducible)  Unlike manual editing, automated editing makes it feasible and efficient to impute responses based on other information in the questionnaire or on reported information for a unit with similar characteristics

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Pitfalls of Over-editing  Reduced timeliness  Increased costs  Potential distortion of true values  False sense of security

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Other Considerations  Determination of tolerance levels for error detection For most items in a census, some small percentage of the respondents will not give “acceptable” responses, for whatever reason Not every failure is pervasive and therefore may not be worthy of remedial action- see Pitfalls of Over-editing Tolerance levels indicate number of invalid and inconsistent responses allowed before editing teams take remedial action Decided by editing team including both subject-matter and data processing specialists For key items such as age and sex, typically low (1%-2%) whereas less key items such as literacy and disability, typically higher (5%-10%) Correction may occur by returning enumerators to field, conducting telephone re-interviews or by applying specific knowledge of an area  Learning from the editing process/ quality assurance systems Positive and negative feed-back loops need to be recorded to improve the quality of both the current census and future censuses and surveys Audit trails, performance measures and diagnostic statistics crucial This is often the most important outcome of editing

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Other Considerations  Cost of editing Cost of editing has not decreased in the last much in the last 20 years, although process has been rationalized by continuous exploitation of technological developments In general, editing activities take a disproportionate amount of time (and therefore staff costs) relative to other activities Excessive editing can delay census results  Archiving Both edited and unedited data files should be preserved for later analysis – and in several places Documentation should be complete enough for census planners to be able to reconstruct the same processes at a later date

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 THANK YOU!