Generic Statistical Business Process-Censuses

Slides:



Advertisements
Similar presentations
MICS 3 DATA ANALYSIS AND REPORT WRITING. Purpose Provide an overview of the MICS3 process in analyzing data Provide an overview of the preparation of.
Advertisements

Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
United Nations Sub-Regional Workshop on Census Data Evaluation Phnom Penh, Cambodia, November 2011 Evaluation of Fertility Data Collected from Population.
Quality assurance -Population and Housing Census Alma Kondi, INSTAT, Albania.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
Multiple Indicator Cluster Surveys Data Interpretation, Further Analysis and Dissemination Workshop Overview of Data Quality Issues in MICS.
Uses of Population Censuses and Household Sample Surveys for Vital Statistics in South Africa United Nations Expert Group Meeting on International Standards.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
Introduction to fertility In Demography, the word ‘fertility’ refers to the number live births women have It is a major component of population change.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
Workshop on Improvement of Civil Registration and Vital Statistics in SADC Region Blantyre, Malawi, December 2008 Compilation of Vital Statistics.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
2010 Indonesia Population Census Evaluation Presented in the “Workshop on Census Evaluation” Hanoi, Viet Nam 2-6 December 2013 BPS-Statistics Indonesia.
Post enumeration survey in the 2009 Pilot Census of Population, Households and Dwellings in Serbia Olga Melovski Trpinac.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
2010 World Programme on Population and Housing Censuses Workshop on Civil Registration and Vital Statistics in the UNESCWA Region Cairo, Egypt, December.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-I Evaluation of editing and.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-II Consistency check.
United Nations Sub-Regional Workshop on Census Data Evaluation Phnom Penh, Cambodia, November 2011 Evaluation of Mortality Data Collected from Population.
Chapter Two Copyright © 2006 McGraw-Hill/Irwin The Marketing Research Process.
1 Handbook on Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods,
National Population Commission (NPopC)
Quality Assurance in Population and Housing Censuses
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Introduction to fertility
Implementation of Quality indicators for administrative data
Quality assurance in population and housing census SUDAN’s EXPERIANCE in QUALITY assurance of Censuses By salah El din. A . Magid OUR EXPERIANCE IN 5.
Civil registration system and its use for vital statistics
Canadian Census E&I – Lessons Learned from 2006 with Plans for 2011
UN international demographic data collection
Why do we need to evaluate the census?
Post Enumeration Survey Census
PRODUCTION PROCESS AND FLOW
Population and Housing Topics -2021
Cheryl Chriss Sawyer Population Affairs Officer, Mortality Section
Central Statistics Organization
Multi-Mode Data Collection Approach
Mongolia country experience Gender Equality Monograph based on the 2010 Population and Housing Census Ms.Tsogzolmaa, Analyst Ms. Lkhagvadulam, Analyst.
Quality Assurance in Population and Housing Censuses
Rusinga DSS DATA MANAGEMENT.
Demographic Analysis and Evaluation
IPUMS-International Integration Process
Demographic Analysis and Evaluation
Overview of Census Evaluation and Selected Methods Pres. 2
Overview of Census Evaluation and Selected Methods Pres. 2
Demographic Analysis and Evaluation
CENSUS 2013 Key experiences Budapest, June 2017.
Standardised Social Statistics Variables Item 9 of the draft agenda
Albania 2021 Population and Housing Census - Plans
Demographic Analysis and Evaluation
Overview of Census Evaluation Methods
Integrating Gender into Population and Housing Censuses
Overview of Census Evaluation and Selected Methods Pres. 2
Multi-Mode Data Collection Approach
Evaluating the Completeness of the Civil Registration System
Evaluating the Completeness of the Civil Registration System
Treatment of Missing Data Pres. 8
Indicator 3.05 Interpret marketing information to test hypotheses and/or to resolve issues.
Tabulation and Dual System of Estimation (DSE) Pres. 9
Multi-Mode Data Collection
Recommended Tabulations of the Principles and Recommendations for Population and Housing Censuses, Rev. 2 Session 4 United Nations Statistics Division.
A handbook on validation methodology. Metrics.
Presentation transcript:

DATA VALIDATION and ANALYSIS OF PERFORMANCE OF IMPUTATION United Nations Statistics Division

Generic Statistical Business Process-Censuses Planning Questionnaire Mapping Testing Enumeration Data processing Analyzing Dissemination Evaluation Pre-enumeration operations Preliminary evaluation of data quality 2 2

Data validation during data processing Steps of data processing depend on the technology used in general, the process covers the following steps: Preparation Scannning/Data capture Coding Editing/ Imputation Validation Processing control Master file Review and validate data against predefined rules a. Identify potential problems such as missing data, inconsistency and inappropriate editing/ imputation BEFORE PRODUCING CENSUS OUTPUTS

Interpret and explain outputs Data analysis Steps for Data Analysis Prepare outputs Validation of outputs Interpret and explain outputs Apply disclosure control Finalize outputs Checking data quality with appropriate methods, Comparing the statistics with previous censuses and other relevant data sources (both internal and external) Investigating inconsistencies in the statistics

Data validation Checking population distribution by geographic areas Checking the quality of editing/imputation Checking internal consistency and missing data

Ensuring enumerated population is fully processed Data validation-1 Checking population distribution by geographic areas enumerated persons/households may not be fully captured (undercoverage) or double captured (overcoverage) Controlling captured records (people/housing units) with census documents such as: Control forms –prepared by enumerators/supervisors Reports –prepared by Local/Regional Census Committees Number of questionnaires received from the fields-prepared by the head quarters Number of scanned questionnaire-if applicable Ensuring enumerated population is fully processed

Data validation - 2 Checking the quality of editing/imputation Editing rules may be insufficient to identify all types of errors Imputation may introduce new errors in data because of incorrect application Some unexpected patterns may not be identified with editing/consistency rules

Basic definitions Editing: List of rules to determine invalid and inconsistent data Imputation : The process of resolving problems concerning invalid or inconsistent data – and missing values- identified during editing All records must respect a set of editing rules formulated to correct errors and finally disseminate reliable data

Some examples for invalid data Age Equal to 99 Instruction – if it is greater or equal to 98, write 98 If age is written in one digit, such as How to correct? 1 5

Some examples for inconsistent data Children ever born alive, living and dead children If number of children ever-born is not equal to the sum of number of living children and number of dead children Last live birth and household deaths There is an infant birth who is not alive, but no infant death registered in the household deaths Age of father/mother and children If age of father/mother is lower or few years higher than age of a child What will be decision?

Dealing with missing data What are decisions for dealing with missing data: Missing data –item non-response- will be imputed ? What variables will be imputed for missing data ? What methods will be used for imputation?

Assessing the performance of imputation Objectives Comparing the distribution of the observed values with the distribution of the imputed values Comparing the distribution of observed values to the complete distribution including the imputed values To analyze the effect of imputation on original data set To ensure the distribution of imputed values is reasonable or meets with the expected pattern

Assessing the performance of imputation Method for assessing the performance: After implementation of editing/imputation, data should be classified as follows : Observed (consistent) data: the values which meet with all editing rules Non-response or unknown : no value Inconsistent data : the values which failed at least one editing rule Imputed data for inconsistency –and non-response For this analysis, all procedures performed in the database should be identifiable

Assessing the performance of imputation Compare the distribution of the observed values with the distribution of the imputed values if non-response and inconsistent data are distributed randomly, no difference is expected between the distribution of the observed and the imputed values If there are differences between the people who responded and those who did not or not give accurate data The imputed data should not follow the same distribution as the observed data

Assessing the performance of the imputation Compare the distribution of the observed values with the distribution of all values including the imputed values In general, imputed values should have a minimal effect on the distribution of the complete data Unless the non-response rate is particularly high or the bias for certain characteristics

Understanding data editing and potential errors Data on deaths in the household – cases where age of deceased was hot-decked show different age pattern of mortality than cases that were not subject to imputation Indicates that the rules followed by the hotdeck are introducing a bias and are not reliable Source: Estimation of mortality using the 2001 South Africa census data, Rob Dorrington, Tom Moultrie and Ian Timaeus, Centre for Actuarial Research, University of Cape Town

Understanding data editing and potential errors Boundary of school age Boundary of working age Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Table 2: Distribution of bedrooms   Observed responses Imputed responses Difference Total Change Number of bedrooms (Imputed-Observed) Including imputed (total-observed) N % (1) (2) (3) (4) (5)=(4)-(2) (6)=(1)+(3) (7) (8)=(7)-(2) 62 0.3 5 0.8 0.5 67 0.014 1 2,378 10.7 124 19.2 8.5 2,502 10.9 0.240 2 6,097 27.4 192 29.8 2.3 6,289 27.5 0.066 3 9,375 42.2 228 35.3 -6.8 9,603 42.0 -0.192 4 3,279 14.7 70 -3.9 3,349 14.6 -0.110 809 3.6 19 2.9 -0.7 828 -0.020 6 166 0.7 0.0 171 0.001 7 39 0.2 40 -0.001 8 or more 27 0.1 28 22,232 100 645 22,877 0.000 Max Change Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Assessing the performance of imputation Maximum change

Assessing the performance of imputation Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Assessing the performance of imputation Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Assessing the performance of imputation Summary indexes at the variable level Maximum absolute percent change Maximum absolute percent change across all categories for each variable Dissimilarity Index Degree of change of two distributions (observed and total including imputed values) at the variable level Imputation rate Share of the imputed records in the total records

Assessing the performance of imputation Maximum absolute percent change between the observed and final (imputed) distributions across all categories within each of the questions Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Assessing the performance of imputation Maximum absolute percent change between the observed and final (imputed) distributions across all categories within each of the questions Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Index of dissimilarity To assess the degree of change induced by imputation on the initial distribution of variables Where; k : categories of the variable f : percentage distribution of the variable before imputation f * : percentage distribution of the variable after imputation

Index of dissimilarity 0 ≤ ID ≤ 100 It assumes a 0 value when the two distributions before and after imputation are equal It is greater than 0 when they are different and reaches its maximum value of 100 when there is maximum dissimilarity between the two distributions when both are concentrated in one category which is different from each other

Index of dissimilarity ID 1.9 Source: England and Wales, Office for National Statistics, 2011 Census:Item Edit and Imputation: Evaluation Report, June 2012

Assessing the performance of imputation Source: Albania, Quality Dimensions of 2011 Population and Housing Census, May 2014

Assessing the performance of imputation Source: Albania, Quality Dimensions of 2011 Population and Housing Census, May 2014

Data validation-3 Checking internal consistency Objectives: Ensuring all records meet with editing rules Ensuring there is no unusual/unexpected values

How to validate Prepare tables for preliminary analysis of census results The list of tables should be prepared based on editing rules and relation between variables Tables should present all possible conditions in data without eliminating any category to verify the results for example: Marital status by all age groups, Completed level of education by all age groups Tables should present missing data

Some examples of tables Tables for analyzing age difference between members of households Age interval between father/mother and children At least 12-14 years and at most 65 for males, 50 for females Age interval between grand parents and grand children At least 30 years

Some examples of tables Distribution of household size Accuracy of household size considering the number of persons enumerated in one page– such as 5, 10, … There might be errors in combining the census forms belonging to same household

Some examples of tables CEB, CS and CD Relation between number of children ever-born, number of living children and number of dead children – CEB=CS+CD Relation between age and number of children ever born

CEB – quality assessment Fertility CEB – quality assessment Parities wrong ?

CEB – quality assessment Mongolia, 1989 Census (Source: IPUMS) Parity 15-19 20-24 25-29 30-34 35-39 40-44 45-49 105,548 43,676 9,824 2,711 987 865 726 1 4,827 30,834 15,350 5,432 2,185 1,302 1,488 2 896 17,309 23,960 10,659 4,479 2,217 2,053 3 834 5,382 19,279 11,159 4,923 2,663 1,950 4 199 1,828 11,831 11,922 6,974 3,525 2,658 5 68 477 5,730 11,189 7,426 4,933 3,379 6 53 2,161 7,568 6,348 4,442 3,619 7 25 707 3,737 4,551 3,638 2,977 8 15 23 263 2,355 3,879 3,986 3,706 9 61 119 746 2,190 2,747 3,059 10 419 1,300 2,433 3,253 11 147 743 1,183 1,667 12 22 38 262 845 1,299 13 19 161 403 898 14 20 82 242 392 15+ 72 235 629 Unknown 218 65 58 35 Parities wrong ? Implausible parities – following IUSSP manual, here we will recode them as unknown (which will then be re-distributed based on the El Badry method if appropriate) If imputation or other forms of editing the data were used, the analyst should be aware of this

Age at death of children (in month) declared by the mother, Nepal 1975 Quality assessment Age at death of children (in month) declared by the mother, Nepal 1975

Some examples of tables Education Educational attainment- highest level completed Consistency with school attendance Relation with age –minimum age for completing school Usually it is calculated by taking minimum age for entering school plus number of years required for completing a school. Example: Minimum age for primary education is age 6 If primary education requires 8 years, minimum age for completing primary school would be age 13

School attendance – quality assessment Expected pattern ? Expected pattern ?