UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Software Development Languages and Environments. Programming languages High level languages are problem orientated contain many English words are easier.
STATISTICS FOR MANAGERS LECTURE 2: SURVEY DESIGN.
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Measuring Ethno-Cultural Characteristics in Population Censuses United Nations Economic Commission for Europe Statistical Division Regional Training Workshop.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
United Nations Statistics Division Principles and concepts of classifications.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
Who and How And How to Mess It up
Sampling.
5. Integration of Microdata and Metadata (9 slides)
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Basic Concept of Data Coding Codes, Variables, and File Structures.
United Nations Statistics Division Recoding the business register to ISIC Rev.4.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 15.
Copyright 2010, The World Bank Group. All Rights Reserved. PROCESSING, Part 1 Data capture, editing, imputation and tabulation Quality assurance for census.
Coding closed questions Training session 5 GAP Toolkit 5 Training in basic drug abuse data management and analysis.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
ISCO-08 - Current Status and plans to support implementation David Hunter Department of Statistics International Labour Office United Nations Expert Group.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Results Generated from.
© The McGraw-Hill Companies, 2006 Chapter 4 Implementing methods.
Designing Data Collection Forms
Chapter Thirteen Validation & Editing Coding Machine Cleaning of Data Tabulation & Statistical Analysis Data Entry Overview of the Data Analysis.
Characteristics of ERP Systems. There are some significant differences between ERP and non-ERP systems. These differences are:  In ERP systems, information.
King Fahd University of Petroleum & Minerals Department of Management and Marketing MKT 345 Marketing Research Dr. Alhassan G. Abdul-Muhmin Editing and.
Data Capture Overview United Nations Statistics Division
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Quantifying Data Advanced Social Research (soci5013)
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
Chapter 19 Editing and Coding: Transforming Raw Data into Information © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Multiple Indicator Cluster Surveys Regional Training Workshop I – Survey Design General Characteristics of MICS3 Questionnaires.
Post enumeration survey in the 2009 Pilot Census of Population, Households and Dwellings in Serbia Olga Melovski Trpinac.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
European Conference on Quality in Official Statistics 8-11 July 2008 Mr. Hing-Wang Fung Census and Statistics Department Hong Kong, China (
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Aim: Review Session 1 for Final Exploratory Data Analysis & Types of Studies HW: complete worksheet.
Chapter Fifteen Chapter 15.
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
1 Coding Michael J. Levin Harvard Center for Population and Development Studies
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Chapter X Questionnaire and Form Design. Chapter Outline Chapter Outline 1) Overview 2) Questionnaire & Observation Forms i. Questionnaire Definition.
ICCS Marker Training Hamburg July 2008 Final note on marking Reliability marking report from WinDEM will include record of scores for double-marked items,
Creating A Survey Using Office of Student Affairs Assessment The University of Georgia A-Team Training-Skills Session 1 October 30, 2007.
Outsourcing of Census Operations United Nations Statistics Division UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Coding and Data Processing Section A 1.
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Sampling & Simulation Chapter – Common Sampling Techniques  For researchers to make valid inferences about population characteristics, samples.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Survey Training Pack Session 3 – Questionnaire Design.
© Copyright McGraw-Hill CHAPTER 14 Sampling and Simulation.
Data Entry, Coding & Cleaning SPSS Training Thomas Joshua, MS July, 2008.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
1 Terminal Management System Usage Overview Document Version 1.1.
Bangkok, Thailand, September 2008
Dar es Salaam, Tanzania, 9-13 June 2008
Warm up – Unit 4 Test – Financial Analysis
Indexing and Hashing Basic Concepts Ordered Indices
Indicator 3.05 Interpret marketing information to test hypotheses and/or to resolve issues.
Presentation transcript:

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Coding of Census Information: An Overview United Nations Statistics Division

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Outline of Presentation  What is coding?  Coding methodologies  Coding indexes  Types of coding operations  Types of codes Open-ended questions  Coding systems  Coding mechanics  Sources of coding errors

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May What is coding?  Process in which census questionnaire entries are assigned numerical and/ or alphanumeric values  Objective is to prepare data in a form suitable for entry into computer and for further analysis by users  Done by setting up possible responses to each question in the census questionnaire and creating a mapping of these responses onto numerical or alphanumeric correspondences

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May Coding methodologies  Simple Straight forward Limited to reference to one question on the census form, e.g., birthplace  Structured Used for complex topics (e.g. occupation, industry, education, etc.) Reference may be made to more than one question Coding rules can be built into the structured coding system to guide the operators

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Coding methodologies (contd)  Bounded  Used where it is necessary to obtain different levels of detail before a code can be assigned  Commonly used for addresses  Coder starts a search at broader geographic level (e.g., province, district, municipality, etc.) then moves to lower levels (e.g., city, street, etc.), as necessary to obtain a classification code.

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May Coding Indexes  Regardless of system used, they all rely on coding indexes  The indexes are lists of typical responses likely to be given on a census form that have associated classification code assigned to them  Important that the lists of typical responses be based on what respondents typically report and not simply contain the categories in the classification structure, reflecting the fact that respondents do not provide answers in classification terms but in everyday language  Thus they enable responses to be “mapped” onto the various classification structures  Quality of these indices paramount; the time and effort to build them should not be under-estimated  Indexes are not static and sometimes need to be updated during processing to cater to new responses

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May Types of coding operations Coding operations may involve one of the three options:  Assigning numerical codes to responses recorded in words or in a form requiring modification before data entry/capture e.g. items such as geographic location, occupation, industry, etc.  Rewriting numeric codes recorded say on a questionnaire to a separate coding sheet to facilitate data entry.  Use of pre-coded entries on questionnaires which may be used directly for data entry

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May Types of codes  Pre-coded answers  Office coding

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 (a) Pre-coded answers  Better with closed-ended questions; the numbers in the questionnaire boxes are used to code answers to the closed-ended questions  To the extent possible pre-coded responses should be used in census questionnaires with numerical or alphanumeric codes  Coding categories should be mutually exclusive and exhaustive  Pros: easier to develop codes saves time  Cons: can not be used for many open-ended questions

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 (b) Office coding  Not all census questions can be pre-coded, e.g., those requiring open-ended answers  Full range of responses may not be known and therefore cannot be coded on the spot, so coding is done after enumeration

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Open-ended Questions: Advantages  Allows respondents to express themselves, instead of in words chosen by the census planner  Particularly appropriate for more complex concepts such as occupation  Researchers can see how respondents actually think about the topic at hand  Different analysts with different research interests can find information of value for them from the answers to the same questions

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Open-ended Questions: Disadvantages  Different respondents may approach the same question from different perspectives so that their answers may not be fully comparable  Open-ended questions are a common source of measurement error on censuses  They are more difficult to analyze than closed-ended questions because census coders must code responses into categories before analysis can begin. The coding may involve grouping together respondents who provided similar answers. Because no two respondents may ever give identical answers, the coder may fill in details of an answer by making guesses about what a respondent meant to say.

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Open-ended Questions: Issues around Coding  Not all questions in a census may be pre-coded (e.g., many related to economic characteristics)  Need to have trained personnel to determine appropriate codes and to match them with the existing coding lists on the basis of information supplied by respondents  “Other” category is usually included because often the full range of responses is not known  Note that often there are questions which are not intended to carry previously determined codes, therefore, responses are coded after the fact in the office

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May Coding systems.  Coding becomes necessary because computer editing and tabulation of textual material is not practical  Textual and verbal responses have to be replaced by codes via the following types of interventions: Manual Computer-assisted Automatic Combination of some of the above

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 (a)Manual/clerical coding  Coding clerks manually match responses to code indexes/books  They then manually enter codes onto a form for later data capture and processing  Pro: Simple  Cons: Tedious Subject to bias and over-coding (a coder may be overzealous to find a code even if it is not obvious Subject to higher errors than other types of coding.

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 (b) Computer-assisted coding  Computerized systems (mainframes, PCs, etc.) used to assist coders  Indexes used are as described before, but this time they are computer-based. The associated codes are stored in a database file and accessed during the coding operation  A typist can sit at a computer terminal and type from coding sheets or coding sheets may not be required as the coder can sit at the computer and type each response from the questionnaire directly

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Computer-assisted coding (contd)  Practical execution: Coder types a few characters of each word in the response Computer returns a matching list from an appropriate coding index Coder selects the matching index entry from the list of possibilities The computer automatically records the code corresponding to the matching index entry  Example: for “poultry farmer” coder enters “far pou”

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Computer-assisted coding (contd)  Pros: Relatively more efficient More coding rules can be incorporated into the system to guide the processors, which results in better quality data Suitable for structured coding in particular  Cons: Relatively complex Takes time and substantial cost to develop

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 (c) Automatic coding  A computerized algorithm matches captured textual response (e.g., from ICR) against indexes, and assigns code number to the majority of cases without any human intervention  Typically involves a scoring mechanism where a particular score is required before a response is regarded as a match  Matching rates depend on algorithms used and types of variables  When a score is above a certain level, the response is considered acceptable and the automatic coding is implemented  When a score is below a certain level, usually human intervention is necessary

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Automatic coding (contd)  Pros: Speed High efficiency Good quality Especially suitable for structured coding  Cons: Complex High cost Risk of systematic errors in case of faults with matching algorithms and indexes

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May Coding mechanics  NSO often develop list of common codes for some items used both in census and in related surveys; e.g., birthplace, language, ethnicity/race, citizenship  Example of common coding scheme for “place” might be 3-digit code with hierarchy for different levels of geography; i.e., first digit is broadest level of geography, and third digit is finest level of geography  Common problem that occurs is when definitions differ or change between censuses (or between a census and a survey) for variables such as work or ethnicity; NSO needs to develop policy on how to take these changes into account to accommodate the production of coherent trends  For “Simple Coding” NSO must set list of codes for possible responses to questions  E.g. Sex of respondent: male-1, female-2;  E.g. Reason for being economically non-active: housewife-0, student-1, retired-2, too young-3, too old-4, pensioner-5, other-7

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Coding mechanics (contd)  For “Structured Coding”, the are a number of international classification systems that NSOs can use directly or adapt to their own national variants  Examples: (a)International Standard Industrial Classification, ISIC Rev. 4 Type of codeLevel CategoryCode Two digit codeDivisionManufacturing of food 10 Three Digit codeGroupManufacturing of grain mill products, starches and starch products, e.g. 106 Four Digit codeClassManufacturing of grain mill products, e.g. 1061

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Coding mechanics ( contd) (b) International Standard Classification of Occupations, ISCO-88 Type of codeLabeling of levelName of categoryCode Two digit codeSub-major GroupSales and services elementary occupations 91 Three Digit code Minor GroupStreet vendors and related workers 911 Four Digit codeOccupationStreet food vendors9111

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May Sources of coding errors  Coding rules might be deficient  Coding rules may not be properly applied  Developing a quality code operation is difficult since coding can be highly subjective  Coding operations can be large in censuses and therefore difficult to manage

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008  THANK YOU