Download presentation
Presentation is loading. Please wait.
1
Treatment of Missing Data Pres. 8
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
2
Treatment of Missing Data
Why are some data missed? Refusals Item non-response Time constraints Paucity of resources Lax enumerators Units not found Insufficient data for matching, etc. United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010 2
3
Treatment of Missing Data
Four types of missing data Unit missing data - Household non-interview Item missing data - When some information for household or person is available and some information is not available Unresolved match or residence status – When match or residence status in P-sample could not be determined for PES Estimation Unresolved enumeration status – When correct or erroneous enumeration status in E-sample could not be determined for PES estimation United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010 3
4
How to treat missing data ?
A. doing nothing B. use only the complete records C. use a weighting method D. impute a missing value United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
5
A. Doing nothing If missing data are very few, it may not have significant effect on data usages and one can ignore them Requires to work with an incomplete dataset with missing data United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
6
B. Use only the complete records
Easy but risky option. The subset of respondents may be: too small to be significant, Non representative of the total population under study Estimates may be seriously biased, unless non-response doesn’t depend on any of the variables of interest This option can be envisaged only for a rapid descriptive analysis United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
7
C. Use a weighting method
Unit non-response: Increase the respondents’ weight to compensate for the non-respondents. The objective is to produce roughly unbiased estimates Item non-response: Possible to use reweighting methods but the main disadvantage is to have different weights for the same record (one for each of the variables). That’s why it is generally not used for item non-response United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
8
D. Imputation The process of imputation changes one or more responses or missing values in a record or several records to ensure internally coherent records result Before using any imputation method, the best strategy is to start with manual study of responses; imputation can then handle the remaining unresolved edit failures Two methods of imputation: Cold Deck and Hot Deck Cold Deck Imputation: Used mainly for missing or unknown values (not for inconsistent/invalid values) Values are imputed on a proportional basis from a distribution of valid responses (e.g., from previous census) In doing so, cold deck draws values from a fixed (but possibly outdated) distribution of values United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
9
D. Imputation (contd.) Hot Deck or Dynamic Imputation:
Used for both missing data and inconsistent/invalid items Uses one or more variables to estimate the likely response based on data about individuals with similar characteristics The “donor set” (or imputation matrix) constantly changes through updating; therefore, imputations dynamically change during the process of editing all the records Thus, hot deck draws from a distribution that dynamically changes with each imputation and eventually (through modifications) “approaches” the distribution of current data set Caution: if the different items for a particular record have unknown values, hot deck may not use the same “donor” to impute for both missing values; in this case, it is preferable to use the same donor for both items United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
10
F. Imputation (contd.) Unresolved match or residence status in P-sample: Estimate probabilities of match (residence) status Form cells/groups to estimate probabilities Each cell be homogenous with respect to probability to be estimated Different/hetrogenous Probabilities between cells/groups Use reasons for field follow-up to form cells Unresolved enumeration status in E-sample: Estimate probabilities of correct enumeration Different/hetrogenous probabilities between cells/groups United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
11
G. Imputation (contd.) Essential to evaluation, process planning and management: i) number of cases of each type of error; ii) non-response rates for each item; iii) imputation rates for each item, …. Important to generate edit trail showing all data changes and substituted values with their tallies If original value of data is changed in any way; flags should be added onto each item that is changed or imputed This information is critical for planning of future censuses; e.g., As a means to investigate age threshold below which female with “child ever born” triggers a query edit and to decide if threshold should be modified for future rounds United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
12
A useful reference Handbook on Population and Housing Census Editing
Rev. 1 Available on the UNSD website and currently under printing United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
13
Thank You! United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
14
Example of Hot Deck for Sample Household (Sex Only)
ID number Relationship Sex Age Dynamic Imputation Matrix 1 39 2 35 3 13 4 10 5 40 6 99* 7 8 9 44 36 Missing Information: 9, 99 Relationship: 1=Head; 2=Spouse; 3=Child; 4=Other Relative; 5=Non-Relative Sex: 1=Male; 2=Female United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
15
Example of Hot Deck for Age (Sex and Relationship)
Initial Imputation Matrix For Age Based on Sex and Relationship Relationship Head of Household (1) Spouse (2) Son/Daughter (3) Other Relative (4) Non-Relative (5) Male (1) 35 12 40 Female (2) 32 37 United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
16
Example of Hot Deck for Age (Sex and Relationship)
ID number Relationship Sex Age 1 39 2 35 3 13 4 10 5 40 6 7 8 9 44 36 Missing Information: 9, 99 Relationship: 1=Head; 2=Spouse; 3=Child; 4=Other Relative; 5=Non-Relative Sex: 1=Male; 2=Female United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
17
Example of Hot Deck for Age (Sex and Relationship)
Initial Imputation Matrix For Age Based on Sex and Relationship Relationship Head of Household (1) Spouse (2) Son/Daughter (3) Other Relative (4) Non-Relative (5) Male (1) 35 12 40 Female (2) 32 37 39* 13* 44* 35* 36* Dynamic Imputation Matrix After Multiple Changes United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Bangkok, Thailand, May 2010
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.