Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.

Similar presentations


Presentation on theme: "UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data."— Presentation transcript:

1 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding of Census Information: An Overview United Nations Statistics Division

2 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Outline of Presentation  What is coding?  Coding methodologies  Coding indexes  Types of coding operations  Types of codes Open-ended questions  Coding systems  Coding mechanics  Sources of coding errors

3 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 1. What is coding?  Process in which census questionnaire entries are assigned numerical and/ or alphanumeric values  Objective is to prepare data in a form suitable for entry into computer and for further analysis by users  Done by setting up possible responses to each question in the census questionnaire and creating a mapping of these responses onto numerical or alphanumeric correspondences

4 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 2. Coding methodologies  Simple Straight forward Limited to reference to one question on the census form, e.g., birthplace  Structured Used for complex topics (e.g. occupation, industry, education, etc.) Reference may be made to more than one question Coding rules can be built into the structured coding system to guide the operators

5 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding methodologies (contd)  Bounded  Used where it is necessary to obtain different levels of detail before a code can be assigned  Commonly used for addresses  Coder starts a search at broader geographic level (e.g., province, district, municipality, etc.) then moves to lower levels (e.g., city, street, etc.), as necessary to obtain a classification code.

6 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 3. Coding Indexes  Regardless of system used, they all rely on coding indexes  The indexes are lists of typical responses likely to be given on a census form that have associated classification code assigned to them  Important that the lists of typical responses be based on what respondents typically report and not simply contain the categories in the classification structure, reflecting the fact that respondents do not provide answers in classification terms but in everyday language  Thus they enable responses to be “mapped” onto the various classification structures  Quality of these indices paramount; the time and effort to build them should not be under-estimated  Indexes are not static and sometimes need to be updated during processing to cater to new responses

7 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 4. Types of coding operations Coding operations may involve one of the three options:  Assigning numerical codes to responses recorded in words or in a form requiring modification before data entry/capture e.g. items such as geographic location, occupation, industry, etc.  Rewriting numeric codes recorded say on a questionnaire to a separate coding sheet to facilitate data entry.  Use of pre-coded entries on questionnaires which may be used directly for data entry

8 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 5.Types of codes  Pre-coded answers  Office coding

9 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (a) Pre-coded answers  Better with closed-ended questions; the numbers in the questionnaire boxes are used to code answers to the closed-ended questions  To the extent possible pre-coded responses should be used in census questionnaires with numerical or alphanumeric codes  Coding categories should be mutually exclusive and exhaustive  Pros: easier to develop codes saves time  Cons: can not be used for many open-ended questions

10 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (b) Office coding  Not all census questions can be pre-coded, e.g., those requiring open-ended answers  Full range of responses may not be known and therefore cannot be coded on the spot, so coding is done after enumeration

11 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Open-ended Questions: Advantages  Allows respondents to express themselves, instead of in words chosen by the census planner  Particularly appropriate for more complex concepts such as occupation  Researchers can see how respondents actually think about the topic at hand  Different analysts with different research interests can find information of value for them from the answers to the same questions

12 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Open-ended Questions: Disadvantages  Different respondents may approach the same question from different perspectives so that their answers may not be fully comparable  Open-ended questions are a common source of measurement error on censuses  They are more difficult to analyze than closed-ended questions because census coders must code responses into categories before analysis can begin. The coding may involve grouping together respondents who provided similar answers. Because no two respondents may ever give identical answers, the coder may fill in details of an answer by making guesses about what a respondent meant to say.

13 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Open-ended Questions: Issues around Coding  Not all questions in a census may be pre-coded (e.g., many related to economic characteristics)  Need to have trained personnel to determine appropriate codes and to match them with the existing coding lists on the basis of information supplied by respondents  “Other” category is usually included because often the full range of responses is not known  Note that often there are questions which are not intended to carry previously determined codes, therefore, responses are coded after the fact in the office

14 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 6. Coding systems.  Coding becomes necessary because computer editing and tabulation of textual material is not practical  Textual and verbal responses have to be replaced by codes via the following types of interventions: Manual Computer-assisted Automatic Combination of some of the above

15 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (a)Manual/clerical coding  Coding clerks manually match responses to code indexes/books  They then manually enter codes onto a form for later data capture and processing  Pro: Simple  Cons: Tedious Subject to bias and over-coding (a coder may be overzealous to find a code even if it is not obvious Subject to higher errors than other types of coding.

16 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (b) Computer-assisted coding  Computerized systems (mainframes, PCs, etc.) used to assist coders  Indexes used are as described before, but this time they are computer-based. The associated codes are stored in a database file and accessed during the coding operation  A typist can sit at a computer terminal and type from coding sheets or coding sheets may not be required as the coder can sit at the computer and type each response from the questionnaire directly

17 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Computer-assisted coding (contd)  Practical execution: Coder types a few characters of each word in the response Computer returns a matching list from an appropriate coding index Coder selects the matching index entry from the list of possibilities The computer automatically records the code corresponding to the matching index entry  Example: for “poultry farmer” coder enters “far pou”

18 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Computer-assisted coding (contd)  Pros: Relatively more efficient More coding rules can be incorporated into the system to guide the processors, which results in better quality data Suitable for structured coding in particular  Cons: Relatively complex Takes time and substantial cost to develop

19 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 (c) Automatic coding  A computerized algorithm matches captured textual response (e.g., from ICR) against indexes, and assigns code number to the majority of cases without any human intervention  Typically involves a scoring mechanism where a particular score is required before a response is regarded as a match  Matching rates depend on algorithms used and types of variables  When a score is above a certain level, the response is considered acceptable and the automatic coding is implemented  When a score is below a certain level, usually human intervention is necessary

20 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Automatic coding (contd)  Pros: Speed High efficiency Good quality Especially suitable for structured coding  Cons: Complex High cost Risk of systematic errors in case of faults with matching algorithms and indexes

21 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 7. Coding mechanics  NSO often develop list of common codes for some items used both in census and in related surveys; e.g., birthplace, language, ethnicity/race, citizenship  Example of common coding scheme for “place” might be 3-digit code with hierarchy for different levels of geography; i.e., first digit is broadest level of geography, and third digit is finest level of geography  Common problem that occurs is when definitions differ or change between censuses (or between a census and a survey) for variables such as work or ethnicity; NSO needs to develop policy on how to take these changes into account to accommodate the production of coherent trends  For “Simple Coding” NSO must set list of codes for possible responses to questions  E.g. Sex of respondent: male-1, female-2;  E.g. Reason for being economically non-active: housewife-0, student-1, retired-2, too young-3, too old-4, pensioner-5, other-7

22 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding mechanics (contd)  For “Structured Coding”, the are a number of international classification systems that NSOs can use directly or adapt to their own national variants  Examples: (a)International Standard Industrial Classification, ISIC Rev. 4 Type of codeLevel CategoryCode Two digit codeDivisionManufacturing of food 10 Three Digit codeGroupManufacturing of grain mill products, starches and starch products, e.g. 106 Four Digit codeClassManufacturing of grain mill products, e.g. 1061

23 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Coding mechanics ( contd) (b) International Standard Classification of Occupations, ISCO-88 Type of codeLabeling of levelName of categoryCode Two digit codeSub-major GroupSales and services elementary occupations 91 Three Digit code Minor GroupStreet vendors and related workers 911 Four Digit codeOccupationStreet food vendors9111

24 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 8. Sources of coding errors  Coding rules might be deficient  Coding rules may not be properly applied  Developing a quality code operation is difficult since coding can be highly subjective  Coding operations can be large in censuses and therefore difficult to manage

25 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008  THANK YOU


Download ppt "UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data."

Similar presentations


Ads by Google