Download presentation
Presentation is loading. Please wait.
Published byAndra Davidson Modified over 9 years ago
1
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding of Census Information: An Overview United Nations Statistics Division
2
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Outline of Presentation What is coding? Coding methodologies Coding indexes Types of coding operations Types of codes Open-ended questions Coding systems Coding mechanics Sources of coding errors
3
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 1. What is coding? Process in which census questionnaire entries are assigned numerical and/ or alphanumeric values Objective is to prepare data in a form suitable for entry into computer and for further analysis by users Done by setting up possible responses to each question in the census questionnaire and creating a mapping of these responses onto numerical or alphanumeric correspondences
4
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 2. Coding methodologies Simple Straight forward Limited to reference to one question on the census form, e.g., birthplace Structured Used for complex topics (e.g. occupation, industry, education, etc.) Reference may be made to more than one question Coding rules can be built into the structured coding system to guide the operators
5
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding methodologies (contd) Bounded Used where it is necessary to obtain different levels of detail before a code can be assigned Commonly used for addresses Coder starts a search at broader geographic level (e.g., province, district, municipality, etc.) then moves to lower levels (e.g., city, street, etc.), as necessary to obtain a classification code.
6
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 3. Coding Indexes Regardless of system used, they all rely on coding indexes The indexes are lists of typical responses likely to be given on a census form that have associated classification code assigned to them Important that the lists of typical responses be based on what respondents typically report and not simply contain the categories in the classification structure, reflecting the fact that respondents do not provide answers in classification terms but in everyday language Thus they enable responses to be “mapped” onto the various classification structures Quality of these indices paramount; the time and effort to build them should not be under-estimated Indexes are not static and sometimes need to be updated during processing to cater to new responses
7
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 4. Types of coding operations Coding operations may involve one of the three options: Assigning numerical codes to responses recorded in words or in a form requiring modification before data entry/capture e.g. items such as geographic location, occupation, industry, etc. Rewriting numeric codes recorded say on a questionnaire to a separate coding sheet to facilitate data entry. Use of pre-coded entries on questionnaires which may be used directly for data entry
8
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 5.Types of codes Pre-coded answers Office coding
9
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (a) Pre-coded answers Better with closed-ended questions; the numbers in the questionnaire boxes are used to code answers to the closed-ended questions To the extent possible pre-coded responses should be used in census questionnaires with numerical or alphanumeric codes Coding categories should be mutually exclusive and exhaustive Pros: easier to develop codes saves time Cons: can not be used for many open-ended questions
10
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (b) Office coding Not all census questions can be pre-coded, e.g., those requiring open-ended answers Full range of responses may not be known and therefore cannot be coded on the spot, so coding is done after enumeration
11
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Open-ended Questions: Advantages Allows respondents to express themselves, instead of in words chosen by the census planner Particularly appropriate for more complex concepts such as occupation Researchers can see how respondents actually think about the topic at hand Different analysts with different research interests can find information of value for them from the answers to the same questions
12
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Open-ended Questions: Disadvantages Different respondents may approach the same question from different perspectives so that their answers may not be fully comparable Open-ended questions are a common source of measurement error on censuses They are more difficult to analyze than closed-ended questions because census coders must code responses into categories before analysis can begin. The coding may involve grouping together respondents who provided similar answers. Because no two respondents may ever give identical answers, the coder may fill in details of an answer by making guesses about what a respondent meant to say.
13
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Open-ended Questions: Issues around Coding Not all questions in a census may be pre-coded (e.g., many related to economic characteristics) Need to have trained personnel to determine appropriate codes and to match them with the existing coding lists on the basis of information supplied by respondents “Other” category is usually included because often the full range of responses is not known Note that often there are questions which are not intended to carry previously determined codes, therefore, responses are coded after the fact in the office
14
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 6. Coding systems. Coding becomes necessary because computer editing and tabulation of textual material is not practical Textual and verbal responses have to be replaced by codes via the following types of interventions: Manual Computer-assisted Automatic Combination of some of the above
15
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (a)Manual/clerical coding Coding clerks manually match responses to code indexes/books They then manually enter codes onto a form for later data capture and processing Pro: Simple Cons: Tedious Subject to bias and over-coding (a coder may be overzealous to find a code even if it is not obvious Subject to higher errors than other types of coding.
16
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (b) Computer-assisted coding Computerized systems (mainframes, PCs, etc.) used to assist coders Indexes used are as described before, but this time they are computer-based. The associated codes are stored in a database file and accessed during the coding operation A typist can sit at a computer terminal and type from coding sheets or coding sheets may not be required as the coder can sit at the computer and type each response from the questionnaire directly
17
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Computer-assisted coding (contd) Practical execution: Coder types a few characters of each word in the response Computer returns a matching list from an appropriate coding index Coder selects the matching index entry from the list of possibilities The computer automatically records the code corresponding to the matching index entry Example: for “poultry farmer” coder enters “far pou”
18
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Computer-assisted coding (contd) Pros: Relatively more efficient More coding rules can be incorporated into the system to guide the processors, which results in better quality data Suitable for structured coding in particular Cons: Relatively complex Takes time and substantial cost to develop
19
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (c) Automatic coding A computerized algorithm matches captured textual response (e.g., from ICR) against indexes, and assigns code number to the majority of cases without any human intervention Typically involves a scoring mechanism where a particular score is required before a response is regarded as a match Matching rates depend on algorithms used and types of variables When a score is above a certain level, the response is considered acceptable and the automatic coding is implemented When a score is below a certain level, usually human intervention is necessary
20
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Automatic coding (contd) Pros: Speed High efficiency Good quality Especially suitable for structured coding Cons: Complex High cost Risk of systematic errors in case of faults with matching algorithms and indexes
21
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 7. Coding mechanics NSO often develop list of common codes for some items used both in census and in related surveys; e.g., birthplace, language, ethnicity/race, citizenship Example of common coding scheme for “place” might be 3-digit code with hierarchy for different levels of geography; i.e., first digit is broadest level of geography, and third digit is finest level of geography Common problem that occurs is when definitions differ or change between censuses (or between a census and a survey) for variables such as work or ethnicity; NSO needs to develop policy on how to take these changes into account to accommodate the production of coherent trends For “Simple Coding” NSO must set list of codes for possible responses to questions E.g. Sex of respondent: male-1, female-2; E.g. Reason for being economically non-active: housewife-0, student-1, retired-2, too young-3, too old-4, pensioner-5, other-7
22
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding mechanics (contd) For “Structured Coding”, the are a number of international classification systems that NSOs can use directly or adapt to their own national variants Examples: (a)International Standard Industrial Classification, ISIC Rev. 4 Type of codeLevel CategoryCode Two digit codeDivisionManufacturing of food 10 Three Digit codeGroupManufacturing of grain mill products, starches and starch products, e.g. 106 Four Digit codeClassManufacturing of grain mill products, e.g. 1061
23
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding mechanics ( contd) (b) International Standard Classification of Occupations, ISCO-88 Type of codeLabeling of levelName of categoryCode Two digit codeSub-major GroupSales and services elementary occupations 91 Three Digit code Minor GroupStreet vendors and related workers 911 Four Digit codeOccupationStreet food vendors9111
24
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 8. Sources of coding errors Coding rules might be deficient Coding rules may not be properly applied Developing a quality code operation is difficult since coding can be highly subjective Coding operations can be large in censuses and therefore difficult to manage
25
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.