UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Results Generated from the questionnaire disseminated prior to the workshop
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 The objective of the questionnaire To better understand data processing activities at the country level To invite country experiences with the goal of providing a forum for further collaboration on the effective use of techniques and methods in data processing To support the development and management of the workshop and future activities To understand what information and technical training is needed on the use of specific daata processing methods
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Capture: Methods for census/survey data capture Common methods used for census survey data capture were: manual data entry OMR ORC/ICR Several countries are interested advancing efficiency through the use of PDA’s and Internet
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Capture: Scanners and features used by countries: Kodak i images/sheets per min. (Philippines) Kodak dpi ( ppm) 300 dpi: (40-50 ppm (Singapore round) Fujitsu M4099D ~90 ppm [simplex] to 180 images per minute [duplex], up to 400 dpi (Malaysia) All dependent on resolution, orientation, feeding, etc.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Capture: outsourcing of processes With concern to manual data entry, the data capture process is not always outsourced. Methods included the use of a database management system such as Oracle along with CSPro where data entered, edited and coded in-house. With concern to OMR & OCR/ICR the data capture process is often partially or entirely outsourced.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Capture: Planned data capture method for next census round Some countries are undecided as of which method to choose OMR/OCR/ICR is planned for use by many countries will all or part of the process outsourced (e.g. Bangladesh, Indonesia, Sri Lanka) Mobile Devices/Internet are also proposed for use (e.g. Singapore, Iran- [surveys])
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Capture: Archiving methods and policies used for storing forms Many countries use electronic means for the storage of forms. Some countries store forms both electronically and in hardcopy format. Several countries have laws requiring the storage of forms for a given time. Issues raised in the storage of hardcopy forms are that they take up space and may be damaged after a given time period.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Editing: Coding for Major Classifications of Occupations All offices use coding for major classifications of occupations, industry and education. Occupation- most use ISCO with several countries using nationally specific systems also Industry- most use ISIC with several countries using nationally specific systems also Education- Most countries use ISCED with several countries using nationally specific systems also Ethnicity was also mentioned as a major classification in which coding is used.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Editing: Manual or Automated Coding Coding is done manually in most cases with some countries using both manual and automated methods. When automated, the software is developed in house (e.g. Egypt) or through a commercial produced such as Oracle (Lebanon) or developed by a private contractor and configured further by NSO staff (e.g. Morocco) Most countries have an editing system as a part of the census/survey processing phase The dominant error detection systems expressed within the questionnaire were validity check & consistency check Also mentioned- Across and Within record, macro tabulation
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Data Editing: In many cases manual methods for imputation are used with the following software CSPro, IMPS, SPSS, Oracle. Countries create automated routines using statistical software tools such as SPSS and STATA and batch editing programs attached with the data entry program (CSPro batch editing tool). Several countries expressed in the questionnaire that alongside software such as CSPro, editing system routines will be developed in-house (e.g Bangladesh, Rep. of Korea, Phillipines)
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Staff and Training Iran Part-time Data capture 414 Data coding 729 Error detection 450 Imputation 10 Sri Lanka Full-time Data capture 75 Data coding 40 Error detection 20 Imputation 20 Philippines Full-time Part-time Data capture Data coding Error detection Imputation Nepal Full-time Part-time Data capture Data coding Error detection 5 5 Imputation 2 2 Rep. of Korea Full-time Ad-hoc Data capture ,000 Data coding 5 70 Error detection Imputation 10 Indonesia Full-time Ad-hoc Data capture Data coding 165 1,053 Error detection 165 1,332 Imputation
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Cont… Training for each step ranges widely across countries Data Capture ~5 days to 1 month Data Coding ~5 days to 2 weeks Error Detection ~5 days to 3 weeks Imputation ~ 1 Day to 3 weeks Example: Manual Data capture - 1 week Data coding - 3 days Error detection - 3 days Imputation - 1 day Example: OMR Data capture - 12 days Data coding - 5 days Error detection - 5 days Example: PDA Data capture - 14 days (for Buildings & Housing Units Census) & 14 days (for Population Census) Data coding - 7 days Error detection - 7days
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Quality control procedure Country Examples in relation to the various steps of data processing: Cambodia: Data processing coordinator, QA Team, Verifiers (Supervisor), Verification form, Production form, QA news bulletin, Small QA meeting in a group, Targeted Training for specific individuals Indonesia: Supervisor is usually assigned to supervise three to five data capture operators to assist the operators through the completion of the data capture process. Malaysia: 1. Data Capture - Sampled check on data interpretation and verification 2. Data Coding - Checks on sampled forms 3. Imputation - Consistency check and production of a summary table 4. Tabulation - Production of dummy tables Philippines: ICR-Based Data Capture -Hand-written characters and other shading and marks are interpreted by the machine are subject to key verification through a key-from-image entry. Data Entry/Manual Coding -Entered records are subjected to sample key verification, usually using a 10%/20% sampling rate based on specified thresholds Marginal Frequencies -Items are subjected to frequency tabulations before and after the edit/imputation step to determine potential abnormalities/anomalies in the edit/imputation rules or its implementation in the software.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008 Thank You END