Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.

Similar presentations


Presentation on theme: "Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census."— Presentation transcript:

1 Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census Technology for SPECA and CIS member countries (Astana, 7-8 June 2007)

2 Paolo Valente - UNECE Statistical Division Slide 2 Content: 1.Coding 2.Editing and imputation Reference material:  Handbook on Census Management for Population and Housing Censuses (Chapter IV, sections D-F)  Handbook on Population and Housing Census Editing

3 Paolo Valente - UNECE Statistical Division Slide 3 1. Census data coding Questions:  How did you code the data in the last census?  Were you satisfied or not with coding?  What problems did you find in coding?  Any problems with specific variables?

4 Paolo Valente - UNECE Statistical Division Slide 4 Census data coding  Data coding = Assigning classification codes to the responses written on the census form  Coding systems:  Manual  Computer assisted  Automatic  Mix of a), b) or c)  Coding methodologies:  Simple (1 or 2 words): ex. Birth place  Structured (> 1 question): ex. Occupation  Hierarchical: ex. Address

5 Paolo Valente - UNECE Statistical Division Slide 5 Manual data coding  Clerks identify code using “code books”, and write it in the census form for later processing  Pros:  Easy to implement  No technology needed  Cons:  Time consuming  Labor intensive  Risk of inconsistency

6 Paolo Valente - UNECE Statistical Division Slide 6 Computer-assisted coding  Assisted by computerized system  Computer-based code books  How it works:  Coder type only few characters  System selects matching list  Coder choose right code  Code automatically recorded by the system

7 Paolo Valente - UNECE Statistical Division Slide 7 Computer-assisted coding  Pros:  Efficiency  Good quality  Particularly suitable for structured coding (possibility to include coding rules)  Cons:  Relatively complex system  Long time needed for development  Cost relatively high

8 Paolo Valente - UNECE Statistical Division Slide 8 Automatic coding  Based on computerized algorithms  No human intervention  Text captured by ICR and matched against indexes  A score is assigned by the system to the matched response:  If score is above certain level, response accepted  If score is below level, human intervention is needed (computer-assisted coding)

9 Paolo Valente - UNECE Statistical Division Slide 9 Automatic coding  Matching rates depend on algorithms used and type of variable  Maximum matching rates in ideal circumstances:  For simple variables (birth place), approx. 80%  For complex variables (occupation, industry), approx. 50%  All responses not matched have to be processed with computer assisted coding

10 Paolo Valente - UNECE Statistical Division Slide 10 Automatic coding  Pros:  High efficiency  Good quality (if system developed accurately)  Consistency  Particularly suitable for structured coding (possibility to include coding rules)  Cons:  Very complex system  Long time needed for development  High cost  Risk of systematic errors in case of faults in matching algorithms or indexes

11 Paolo Valente - UNECE Statistical Division Slide 11 Coding – Practices in 2000 round  In general CIS countries used manual coding  About half of UNECE countries used automatic coding, in combination with computer-assisted or manual coding  In most cases software developed in-house  Software for automatic coding:  ACTR (Automated Coding by Text Recognition) developed by Statistics Canada, also used by Italy, UK See “Measuring Population and Housing”, Chapter III  Integrated software system, including computer assisted coding: CSPro (US Census Bureau)

12 Paolo Valente - UNECE Statistical Division Slide 12 Coding in the 2010 census round Questions:  What are your plans for coding data of next census?  Are you considering computer-assisted coding?  Why? …or why NOT?

13 Paolo Valente - UNECE Statistical Division Slide 13 2. Editing and imputation Questions on editing:  Which data did you edit in the last census?  How did you edit the data?  Did you have any problems?

14 Paolo Valente - UNECE Statistical Division Slide 14 2. Editing and imputation Questions on imputation:  Did you impute any missing data? If yes:  For which variables?  What method and software you used?  Did you produce statistics on imputation rates?

15 Paolo Valente - UNECE Statistical Division Slide 15 Editing and imputation  Editing = Detecting and correcting errors in census data  Imputation = assigning values to missing data  The two concepts are related and the two terms are sometimes used in different ways

16 Paolo Valente - UNECE Statistical Division Slide 16 Editing and imputation  Different types of errors:  Coverage errors (ex. omissions, duplicates)  Enumerator errors  Respondent errors  Coding errors  Data entry errors but also…  Editing errors!

17 Paolo Valente - UNECE Statistical Division Slide 17 Editing and imputation  Important not only to detect errors, but also to identify causes, in order to take appropriate measures and improve overall quality  Objectives of editing and imputation:  Improve quality of census data  Facilitate analysis of census data  Identify types and sources of errors

18 Paolo Valente - UNECE Statistical Division Slide 18 Editing and imputation  Dilemma: what should be edited and what should NOT be edited?  Complex editing systems can be difficult and expensive to implement, and in some cases may introduce distortions  Go for relatively simple editing system!

19 Paolo Valente - UNECE Statistical Division Slide 19 Editing and imputation  In general, the editing system should be:  Minimalist (only obvious errors)  Automated (as much as possible)  Systematic  Compliant with other NSI procedures  Compliant with intl. standards

20 Paolo Valente - UNECE Statistical Division Slide 20 Editing and imputation General guidelines for editing:  Make the fewest required changes possible  Eliminate obvious inconsistencies  Supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or comparable group as a guide

21 Paolo Valente - UNECE Statistical Division Slide 21 Editing and imputation Example of inconsistent information 1:  Reference person and spouse have same sex

22 Paolo Valente - UNECE Statistical Division Slide 22 Editing and imputation Example of inconsistent information 2:  Excessive age difference between mother and children

23 Paolo Valente - UNECE Statistical Division Slide 23 Editing and imputation Editing approaches:  Top-down: Items in sequence, from first to last  Multiple variable (Fellegi-Holt):  A set of statements and relationships among variables are checked in the household  The edit keeps track of all false statements  The system assess how to best changes the data

24 Paolo Valente - UNECE Statistical Division Slide 24 Editing and imputation Imputation methods:  Static imputation (or “cold deck”)  Used mainly for missing values only  Value assigned from predetermined set, or distribution of valid responses  The set of values does not change over time  Dynamic imputation (or “hot deck”)  Used for missing or inconsistent values  Value assigned from “donor” with similar characteristics, that changes constantly  Response imputations change over time See “Handbook on Census Editing”, Ch. II.E and Annex V

25 Paolo Valente - UNECE Statistical Division Slide 25 Editing and imputation  Types of edits:  Fatal edits identify errors with certainty  Query edits identify suspected errors  Structure edits  Check coverage and relations between different units: persons, households, housing units, enumeration areas etc.  Edits for population and housing items See “Handbook on Census Editing”, Chapters III, IV and V

26 Paolo Valente - UNECE Statistical Division Slide 26 Editing and imputation Practices in 2000 round  Most ECE countries (33 out of 40) performed computer-supported editing, including several CIS countries  22 countries performed automatic imputations  Most countries developed specific software  Some countries used SAS, Oracle, SQL, CSPro See “Measuring Population and Housing”, Chapter III

27 Paolo Valente - UNECE Statistical Division Slide 27 Editing and imputation Plans for 2010 round Questions:  What are your plans for editing and imputation?  What editing approaches/methods are you considering?

28 Paolo Valente - UNECE Statistical Division Slide 28 Editing and imputation Plans for 2010 round Questions:  For which variables would you consider imputation of missing values?


Download ppt "Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census."

Similar presentations


Ads by Google