CHAPTER 19 Data processing

Slides:



Advertisements
Similar presentations
Slide 1Slide Slide 1 International Conference on Establishment Surveys III Montreal June 18-21, 2007 United States Department of Agriculture National Agricultural.
Advertisements

United Nations Workshop on Revision 3 of Principles and recommendations for Population and Housing Censuses and Census Evaluation Amman, Jordan, 19 – 23.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Systems Life Cycle A summary of what needs to be done.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 15.
Copyright 2010, The World Bank Group. All Rights Reserved. PROCESSING, Part 1 Data capture, editing, imputation and tabulation Quality assurance for census.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Coding and Data Processing Section B 1.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Chapter 8: Systems analysis and design
AS Module 2 Information; Management and Management and Manipulation or what to do with data, how to do it, and……... ensure it provides useful information.
Transaction Processing System  Business Transactions are certain events that occur routinely in a business firm.  A transaction is a set of activities.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Using OCR for Census Data Capture in China National Bureau of Statistics of China.
System Analysis and Design
King Fahd University of Petroleum & Minerals Department of Management and Marketing MKT 345 Marketing Research Dr. Alhassan G. Abdul-Muhmin Editing and.
Data Capture Overview United Nations Statistics Division
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
Data Capture Technology Statistical Centre Of IRAN Presented by : MS. SOMAYE AHANGAR Vice – Presidency for Strategic Planning and Supervision Statistical.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
Uganda – October 2009 Census Data Collection & Processing John Gomersall.
Status of Data Capture Technology in Population and Housing Censuses in the ESCAP region Statistics Division ESCAP.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
A Training Course for the Analysis and Reporting of Data from Education Management Information Systems (EMIS)
1 Handbook on Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods,
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
CENSUS AND TECHNOLOGY Presented by Dr. Muhammad Hanif 1.
United Nations Statistics Division
Quality assurance framework for the census of agriculture
National Population Commission (NPopC)
Use of Technology for field data capture and compilation
Proposed Outline of Volume 2
DATA COLLECTION Data Collection Data Verification and Validation.
Introduction to Marketing Research
EU-SILC Survey Process in the Czech Republic presentation for EU-SILC Methodological Workshop November 7th Martina Mysíková, Martin Zelený Social.
UN Reg. Workshop on the 2020 World Programme on
System Design Ashima Wadhwa.
UNSD Census Workshop Data Capture: Intelligent Character Recognition
Use of Technology for field data capture and compilation
Meryem Demirci United Nations Statistics Division
Business Research Methods
28 November - 1 December 2016, Amman, Jordan
Survey phases, survey errors and quality control system
28 November - 1 December 2016, Amman, Jordan
Multi-Mode Data Collection Approach
Survey phases, survey errors and quality control system
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Use of handheld electronic devices for data collection in GeoStat
Use of handheld electronic devices for census data collection
Software Systems for Survey and Census
Data Capture Process Stages
Data Capture - ICR Typical Workflow
The Computer-Assisted Personal
Optical Data Capture: Optical Mark Recognition (OMR)
Albania 2021 Population and Housing Census - Plans
Turkish Statistical Institute
Multi-Mode Data Collection Approach
CHAPTER 6 ELECTRONIC DATA PROCESSING SYSTEMS
Manual Data Capture – Key Entry
Multi-Mode Data Collection
Session 13 - country presentation by Chinadfddddddddddd Application of multi-mode data collection in micro-census in 2015 and 2020 census.
Turkish Statistical Institute Demographic Statistics Department
Presentation transcript:

CHAPTER 19 Data processing Technical review meeting on World Programme for the Census of Agriculture 2020 Volume 2 – Operational guidelines on implementing census of agriculture Rome, Italy 30-31 January 2017 CHAPTER 19 Data processing Item 4 Neli Georgieva Statistician FAO Statistics Division (ESS)

CONTENT Hardware and Software Testing computer programmes Data processing activities Data coding and entry - Data entry methods Data editing Imputation Data Validation and Tabulation

Hardware and Software The ICT strategy for the census should be part of the overall agricultural census strategy. It depends strongly on the data collection option and modality of census taking chosen. The decision needs to be taken at early stage to allow sufficient time for testing and implementing the data processing system. Key management issues need to be addressed: Strategic directions for the census program, often related to timeliness and cost; Existing technology infrastructure; Level of technical support available; Capacity of census agency staff; Technologies used in previous censuses; Establishing the viability of the technology; Cost-benefit;

Hardware and Software– cont’d Hardware requirements Main characteristics of agricultural census data processing: large amounts of data to be entered in a short time with multi-users and parallel processing mode of servers; large amounts of data storage required; relatively simple transactions; relatively large numbers of tables to be prepared; extensive use of raw data files which need to be used simultaneously. method of data capture chosen by the census office Basic hardware equipment: Many data entry devices (PCs, hand-held devices, depending on data collection mode) Central processor/server and networks Fast, high-resolution graphics printers Number of PCs /handheld devices to be carefully considered Software Allows for a smooth data processing Preferable to use standard software maintained by the manufacturer with available documentation (to allow for data portability)

Testing computer programmes Considerable time is required to write computer programmes Computer programmes prepared should be tested with data from pretests and/or pilot census. Useful to enter erroneous data to test the full range of error detection Tabulation process to be simulated during the test Data transfer to be tested during the pilot census (for CAPI, CATI, CASI)

Data processing activities Data coding and entry Data Editing Validation and tabulation Calculation of sampling error and additional data analysis. Data coding and entry Data coding: operation where original information from the paper-based questionnaire, as recorded by enumerators, is replaced by a numerical code required for processing: • Manual • Computer Data entry methods: Manual; Optical scanning; Handheld device; Internet and computer-assisted telephone interviews (CASI and CATI). Manual data entry Time consuming Subject to human error More staff needed Rigorous verification procedure needed Simple software

Data processing activities – cont’d Optical scanning O/ICR solution OMR solution Advantages: Savings in salaries (responses can be automatically coded); Additional savings if using electronic images rather than physical forms; Automatic coding provides improvements in data quality, as consistent treatment of identical responses is guaranteed; Processing time can be reduced; Form design does not need to be as stringent as that required for optical mark recognition (OMR); Enables digital filing of forms resulting in efficiency of storage.  The capture of tick-box responses is much faster than manual entry; Equipment is reasonably inexpensive; It is relatively simple to install and run; It is a well-established technology that has been used for a number of years in many countries.   Disadvantages: Higher costs of equipment (sophisticated hardware and software required); Character substitution, which affect data quality; Tuning of recognition engine and process to accurately recognize characters is critical with trade- offs between quality and cost. Handwritten responses must be written in a constrained response area. Can recognize marks made only by a special pencil on numbers or letters pre-printed on special questionnaires. Precision required in the printing process of questionnaires Restrictions on the type of paper/ink used; Precision required in cutting of sheets; Restrictions as to form design;

Data processing activities – cont’d Handheld device Use of CAPI with electronic questionnaire, data entry completed directly by the enumerators Cost-effective Allows for automatic coding and editing Allows for skip patterns Thorough testing of the data-entry application required Functional testing Usability testing Data transfer testing CASI and CATI data entry Usually administered in conjunction with other methods Similar to handheld data collection - online form are used; an application guides the respondent through the questionnaire Testing the flow and skip patterns of the online form is essential

Examples of Data scanning and computer assisted systems use for 2010 round of agricultural censuses TECHNOLOGY COUNTRY Optical character recognition (OCR) Albania, Czech Republic, Greece, Ireland, Malawi, Norway, Philippines, Sweden Intelligent character recognition (ICR) Tanzania, Canada, Cook Island CAPI/PDA Argentina, Brazil, Colombia, France, French Guyana, Iran, Jordan, Lithuania, Martinique, Mexico, Mozambique, Slovenia, Thailand, Venezuela CAWI, CATI, CAPI combined Estonia, Finland, Iceland, Italy, Latvia, Poland, Sweden, Spain Source: FAO, Metadata reports; http://www.fao.org/economic/ess/ess-wca/en/

Data editing Process involving the review and adjustment of collected census data. Purpose: to control the quality of the collected data The effect of editing questionnaires: to achieve consistency within the data and consistency within the tabulations (within and between tables) to detect and verify, correct or eliminate outliers Manual data editing (when using PAPI) Verify the completeness of the questionnaire – minimize the non-response Should begin ASAP after the data collection and close to the source of data Very often errors are due to illegible handwriting Have some advantages: identify paper-based questionnaires to be returned for completion; helps to detect poor enumeration

Data editing – cont’d Automated data editing Electronic correction of digital data Efficient editing approach for censuses, in terms of costs, required resources and processing time Checks the general credibility of the digital data with respect to missing data, range tests, and logical and/or numerical consistency Two ways: interactively at the data entry stage: immediately prompt error messages on the screen and/or may reject the data unless they are corrected; very useful for simple mistakes such as keying errors, but may greatly slow down the data entry process. aimed mainly at discovering errors made in data entry, while more difficult cases, such as non-response, are left for a separate automatic editing operation. Used with CAPI, CATI or CASI data collection methods using batch processing: after data entry; consists of a review of many questionnaires in one batch. The result is usually a file with error messages. All data collection modes.

Data editing – cont’d Automated data editing – cont’d Two categories of errors: critical - needs to be corrected; could even block further processing or data capture non-critical - produce invalid or inconsistent results without interrupting the flow of subsequent processing phases; as many as possible to be corrected but avoiding over-editing Data editing (data detection) applied at several levels: At item level, which is usually called “range checking”; At questionnaire level (checks are done across related items of the questionnaire); Hierarchical that involves checking items in related sub- questionnaires.

Imputation The process of addressing the missing, invalid or inconsistent responses identified during editing on the basis of knowledge available in the office When used, a flag should be set. Two imputation techniques commonly used: (a) cold-deck imputation (static look-up tables) (b) hot-deck imputation (dynamic look-up tables) Can be done manually or automatically by computer: Aspects to be considered for automatic editing and imputation: the immediate goal in an agricultural census is to collect data of good quality. If only a few errors are discovered, any method of correcting them may be considered satisfactory; it is important to keep a record of the number of errors discovered and the corrective action (by kind of correction); non-response can always be tabulated as such in a separate column. redundancy of information collected in the questionnaire can be useful to help detect error

Data validation and tabulation Should run parallel to the other processes All data items should be checked for consistency and accuracy for all categories at different levels of geographic aggregation Macro-edit - the process of check at aggregated level. Focuses analysis on errors which have impact on published data. Validating the data before it leaves the processing centre ensures that errors that are significant and considered important can be corrected in the final file Final file - as the source database for the production of all dissemination products. Tabulation Very important part of the census; the most visible outcomes of the whole census operation and the most used output

FEEDBACK EXPECTED Relevance of this section on organisation of field work? Main elements missing? How can it be improved to be useful for census planners?

THANK YOU