Download presentation
Presentation is loading. Please wait.
Published byKathryn Moore Modified over 9 years ago
1
1 Coding Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com
2
2 Coding Currently, most countries scan their censuses, but frequently continue to key their surveys Currently, most countries scan their censuses, but frequently continue to key their surveys Still, certain variables need to be translated from words into numbers Still, certain variables need to be translated from words into numbers Coding is the process of making machine-readable numbers and alphanumerics Coding is the process of making machine-readable numbers and alphanumerics
3
3 Coding Considerations Investment When developing a coding scheme, census and survey staff must consider the returns of each investment of time, energy and funds When developing a coding scheme, census and survey staff must consider the returns of each investment of time, energy and funds Coding considerations are reasonably insignificant for small countries or small surveys since the amount of processing is much less than for a census Coding considerations are reasonably insignificant for small countries or small surveys since the amount of processing is much less than for a census
4
4 Some packages can easily accept and work with alphanumeric data Some packages can easily accept and work with alphanumeric data However, most packages have difficulties categorizing and performing calculations (sums, percentages, medians etc.) when non-numeric data are included However, most packages have difficulties categorizing and performing calculations (sums, percentages, medians etc.) when non-numeric data are included Coding Considerations Software
5
5 Codes that are completely alphabetic characters or a combination of alphabetic characters and numbers (alphanumerics) should be avoided whenever possible Codes that are completely alphabetic characters or a combination of alphabetic characters and numbers (alphanumerics) should be avoided whenever possible When forms are scanned, alphanumerics are not a great problem, but many computer packages require considerable manipulation in their use When forms are scanned, alphanumerics are not a great problem, but many computer packages require considerable manipulation in their use In many cases, editing programs require that alpha characters be placed between quotation marks, or in some other manner, in order to process them In many cases, editing programs require that alpha characters be placed between quotation marks, or in some other manner, in order to process them
6
6 Coding Considerations Editing Scanned data don’t suffer as much from additional columns of information Scanned data don’t suffer as much from additional columns of information For example, for codes 1 through 9, the scanner may pick up an alpha character, or a blank, or a stray mark converted to some readable character For example, for codes 1 through 9, the scanner may pick up an alpha character, or a blank, or a stray mark converted to some readable character These issues are readily handled in the edit as described later These issues are readily handled in the edit as described later But when two columns are used for an item, for example relationship, scanning will introduce errors that would otherwise not be present when a single column is used But when two columns are used for an item, for example relationship, scanning will introduce errors that would otherwise not be present when a single column is used
7
7 Coding Considerations Editing When two columns are used for an item, say codes 1 to 10, then you introduce a whole new realm of errors When two columns are used for an item, say codes 1 to 10, then you introduce a whole new realm of errors Instead of legal values 1 to 9, you now have values coming in that could range anywhere from 0 to 99, as well as the aforementioned alpha characters, blanks, and stray marks Instead of legal values 1 to 9, you now have values coming in that could range anywhere from 0 to 99, as well as the aforementioned alpha characters, blanks, and stray marks In most cases, the subject specialists provide the edit specifications for the item, but these values automatically increase the time and complexity of the edit, and could decrease the quality of the final data set. In most cases, the subject specialists provide the edit specifications for the item, but these values automatically increase the time and complexity of the edit, and could decrease the quality of the final data set.
8
8 Coding Considerations Editing I. Common Problems When the editors receive a value of 13 for relationship, they must start making strategic decisions about what to do with this value. When the editors receive a value of 13 for relationship, they must start making strategic decisions about what to do with this value. –Was it meant to be 3, and the 1 is erroneous? –Was it meant to be 10, and the 3 is wrong?
9
9 Coding Considerations Editing II. Common Problems Many countries could have up to 12 items of information on fertility (children in the household, children elsewhere, children dead etc.) Many countries could have up to 12 items of information on fertility (children in the household, children elsewhere, children dead etc.) The issue here is how many digits each of those items should be The issue here is how many digits each of those items should be –When two columns are used, the boys in the house could be anywhere from 0 to 99; –When only one column is used the numbers can only range from 0 to 9;
10
10 Coding Considerations Editing II. Common Problems Since it is extremely unlikely that a female would have more than 9 boy children in the household, having two digits introduces high probability of picking up stray marks or scanning misreads – reading 9 for a 0, for example, so 91 children instead of 01. Since it is extremely unlikely that a female would have more than 9 boy children in the household, having two digits introduces high probability of picking up stray marks or scanning misreads – reading 9 for a 0, for example, so 91 children instead of 01. However, for total children in the house, total children elsewhere, total children dead, and total children, two columns might be more appropriate However, for total children in the house, total children elsewhere, total children dead, and total children, two columns might be more appropriate Much of these decisions depends on the fertility levels in the country Much of these decisions depends on the fertility levels in the country
11
11 Coding Considerations Good Practice The following set of standard codes covers the majority of relationships for most countries: The following set of standard codes covers the majority of relationships for most countries: 1. Head of household (or householder) 2. Spouse 3. Child 4. Adopted or step-child 5. Sibling 6. Parent 7. Grandchild 8. Other relative 9. Nonrelative Some countries add a “0” code for head of household and can then add a 10 th category to the others. Some countries add a “0” code for head of household and can then add a 10 th category to the others.
12
12 Coding Considerations Good Practice Many countries, particularly those experiencing the HIV/AIDS epidemic need much more detailed information than can be provided by these codes. Many countries, particularly those experiencing the HIV/AIDS epidemic need much more detailed information than can be provided by these codes. Specific information on children-in-law, parents-in-law, grandparents, nieces and nephews, and so forth become crucial in analyzing the HIV/AIDS situation in a country Specific information on children-in-law, parents-in-law, grandparents, nieces and nephews, and so forth become crucial in analyzing the HIV/AIDS situation in a country In this situation, additional codes are required for the statistical office to carry out its mission, and so two digit codes are required. In this situation, additional codes are required for the statistical office to carry out its mission, and so two digit codes are required.
13
13 Coding Considerations Good Practice Once the decision is made to use two columns, the subject matter specialists for this item may choose to use the columns to have significance. For example: Once the decision is made to use two columns, the subject matter specialists for this item may choose to use the columns to have significance. For example: Code 10Head of Household31Parent 11Spouse32Parent-in-Law 12Sibling33Uncle/Aunt 13Sibling’s Spouse41Grandchild 21Child77Other Relative 22Adopted Child88Non-Relative 23Step Child90Institutional Population 24Niece/Nephew
14
14 Coding Considerations Good Practice This type of coding, should be considered for certain social and economic variables. This type of coding, should be considered for certain social and economic variables. Ethnicity; Ethnicity; –the major tribal or ethnic grouping would be in the first of two columns and the minor tribal or ethnic grouping (like a sect) would be in the second digit Occupation/Industry; Occupation/Industry; –the first digit would be for the major occupation/industry, the second digit for the minor occupation/industry, and the third digit for specific occupation or industry Note: Most international coding schemes, by the United Nations agencies, the U.S. Census Bureau, and others, already have the levels imbedded in the codes, so the statistical office does not have to do any additional work. Note: Most international coding schemes, by the United Nations agencies, the U.S. Census Bureau, and others, already have the levels imbedded in the codes, so the statistical office does not have to do any additional work.
15
15 Coding Considerations Common Codes A set of common codes for closely related variables can reduce coding errors and assist the data processors during the edit A set of common codes for closely related variables can reduce coding errors and assist the data processors during the edit Common codes also allow data processors, where appropriate, to use an entry from one item to determine another Common codes also allow data processors, where appropriate, to use an entry from one item to determine another –For example, in many countries, place codes (birthplace, parental birthplace, previous residence, work place), language, ethnicity/race, and citizenship are very similar –A common coding scheme for “place” might be developed as three-digit codes with the first digit representing the continent, the second the region, and the third the specific country.
16
16 Coding Considerations Common Codes The structure of coding can facilitate the coding process as well as later processing during editing, tabulation and analysis The structure of coding can facilitate the coding process as well as later processing during editing, tabulation and analysis For large countries with many immigrants or ethnic groups, codes based on continent, region and country, with different codes or digits assigned to each, would be preferable to a simple listing For large countries with many immigrants or ethnic groups, codes based on continent, region and country, with different codes or digits assigned to each, would be preferable to a simple listing National census/statistical offices can also use country numerical codes developed by international organizations such as the United Nations Statistics Division (United Nations, 1999). National census/statistical offices can also use country numerical codes developed by international organizations such as the United Nations Statistics Division (United Nations, 1999).
17
17 Coding Considerations Common Codes GroupBirthplaceCitizenshipLanguageEthnicity France/French10 Spain/Spanish20 Latin America25 2025 Philippines/Filipino30 Iiokano32 Tagalog32 England/English40 Canada50 4050 USA52 4052 Examples of common codes for selected items
18
18 Coding Considerations Final Notes If a group of items on a questionnaire is not independent of each other, national census/survey staff probably should not ask all of them. The editing team must decide, on a case-by-case basis, when to use other items directly for assignment, and when to use other available variables If a group of items on a questionnaire is not independent of each other, national census/survey staff probably should not ask all of them. The editing team must decide, on a case-by-case basis, when to use other items directly for assignment, and when to use other available variables When definitions differ between censuses (or between a census and a survey) for variables such as work or ethnicity, the national census/statistical office must decide how to take these changes into account, both for currently edited data and for datasets from the prior census, in order to show trends. If the original, unedited data are available, data processors can make changes to the appropriate edits and rerun all of them. When definitions differ between censuses (or between a census and a survey) for variables such as work or ethnicity, the national census/statistical office must decide how to take these changes into account, both for currently edited data and for datasets from the prior census, in order to show trends. If the original, unedited data are available, data processors can make changes to the appropriate edits and rerun all of them.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.