Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND
2 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI Data processing infrastructure XML Files Statistical Files Golden Record Metadata SDMX Questionaries
Key elements of census process in terms of census quality Census planning - scope of census, Data sources, Data collecting, Data storing, Data processing, Development of census results, Dissemination of census results, Census Metadata System. Census Quality 3
CENSUS PLANNING 4
Census planning Quality aspects: relevance, accuracy, costs including the burden on respondents, information security Determining the data scope defined in Act including: Compliance with needs of domestic and EU users, Quality of data source, Coherence and comparability of results from census 2011 and 2002, Census Quality 5
DATA ACQUISITION 6
7 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI Data acquisition XML Files Statistical Files Golden Record Metadata SDMX Questionaries
Files format: Flat files, XML files, Local Databases XML files integration, Data acquisition 8
Data acquisition - Portal 9
Datasources Quality aspects: accuracy, timeliness and punctuality, comparability and coherence, costs including the burden on respondents, information security Assessment of data sources quality for census: analyses of methodological compliance of concepts definitions from registers with those adopted in statistics and the UNECE and EUROSTAT Recommendations for the 2010 Censuses on Population and Housing, developing methodology for compliance analyses, constructing the IT system PiK for describing, comparing and assessing coherence level, Census Quality – data acquisition 10
Registers developing methodology for assessing the quality: dimensions, quality indicators, evaluation and description of sources quality, MATRIX that represents the possibility of obtaining the values for the census from registers: census variable compliance indicators (methodology compliance indicator), register suitability indicators (population coverage indicator for data from the register), Census Quality – data acquisition 11
Data sets developing methodology for assessing the quality, evaluation and description of data sets quality, developing methodology for improving source data sets quality – rules for: standardization, normalization, de- duplication, editing, imputation, calibration Census Quality – data acquisition 12
CENSUS FRAME PREPARATION 13
Citizens, buildings and dwelling list preparing, Citizens, buildings and dwelling list and statistical data integration, Census Frame preparing. Census Frame preparation 14 Goal Frame Preparation, Random Sample preparation,
Quality of Census Frame 15 Census frame pre-census revision - checking in field by enumerators Census frame preparation – validation and updating in counties,
Enumerator tracking
18
19
20
21
22
Census Completeness Monitoring
24
TRANSFORMATION TO STATISTICAL REGISTER 25
26 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI Source data collection and preparation XML Files Statistical Files Golden Record Metadata SDMX Questionaries
Registers loading into data laboratory envroiment,Denormalization,Standarization,Deduplication,Validation,Data completion,Vocabulary validation and automatic correction,Statistical files (register) generation, Source data collection and preparation 27
Collecting data Quality aspects: accuracy, costs including the burden on respondents, information security Collecting data from information systems Central registers, Distributed registers, format / file structure (XSD schemas), data transfer platform, application for encrypted data transfer, application for validation and data set control Census Quality – collection and preparation 28
Data loading to Operational Microdatabase,ValidationManual and automatic correction (cleaning),Deduplication,Variables calculating, Source data loading and correction 29
30 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI CAxI XML Files Statistical Files Golden Record Metadata SDMX Questionaries
CAII - Computer Assisted Internet Interview, CAPI - Computer Assisted Personal Interview, CATI - Computer Assisted Telephone Interviewing. CAxI 31 CAXI
Collecting data from respondents: CAII, CAPI, CATI; CAxI input validation: Numerical data validation (answers within boundaries) Cross question arithmetical validation Hints and automatic answer completion Dictionaries and drop down menus CAxI logical validation: Answers determined by questions Cross question logical validation Data collection logical paths Census Quality – data collection by electronic questionare 32
Data storing Quality aspects: information security Data storing in Operational Microdata Base, Notification of Operational Microdata Base to registration by General Inspector for Protection of Personal Data, Census Quality 33
GOLDEN RECORD, 34
35 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI Golden Record generation XML Files Statistical Files Golden Record Metadata SDMX Questionaries
36 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI Export to Analitycal Microdata Base XML Files Statistical Files Golden Record Metadata SDMX Questionaries
Integration with Census Frame and CAxI data,Validation,Correction,Operational Imputation,Transfer proper values to Golden Record, Golden Record generation 37 Registers 1..n CAxI Golden Record OMB Layers
Transition Tables Preparing,Golden Records anonymisation,Transfer to Analitycal Microdatabase, Export to Analitycal Microdata Base 38
Data processing Quality aspects: accuracy Developing quality indicators for data sets at each stage of data processing and the procedures for calculating their value, Developing procedures for bringing data from administrative sources to full compliance or minimum discrepancy with appropriate methodology adopted in statistics, Developing procedures for normalization, editing of data sets from the administrative systems, including the imputation of data (administrative data sets), Developing procedures for synchronization of data from administrative systems, Developing rules for linking data from different administrative systems, Developing rules for linking data from administrative systems with data from CAII, CAPI, CATI, Developing rules for calculation of Golden Record census variables, Developing rules for anonymisation of Golden Record census data. Census Quality 39
ANALITYCAL MICRODATABASE 40
41 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI Analitycal Microdata Base XML Files Statistical Files Golden Record Metadata SDMX Questionaries
Analitycal Microdata Base - process 42 Process data Load data and metadata Integrate data Classify and code data Edit and validate data Impute Derive new variables Wage Aggregate Create files Analyse Produce preliminary statistics Check quality Analyse Prepare statistics for Dissemination Approve Disseminate Prepare and modify dissemiation systems Prepare products Manage products Promote Monitor Archive Manage metainformation Manage quality
Functionality 43 Administration Information Security Management Data Processing Information Analisys Requirement and Product Management Dissemination Metadata Quality Management Analitycal Microdatabase
Development of census results Quality aspects: relevance, accuracy, comparability and coherence Developing rules for missing data completion - imputation and calibration, Developing rules for creating derived objects - creation of new objects (households, families), Developing a model / method of data estimation with the use of the data from administrative systems and sample surveys, Developing rules for calculating data outputs. Census Quality 44
DISEMINATION 45
Dissemination of census results Quality aspects: relevance, timeliness and punctuality, accessibility and clarity, comparability and coherence, information security Designing Analitycal Microdata Base features including compliance with users needs, accessibility and clarity of census data. Census Quality - disemination 46
METAINFORMATION MANAGEMENT 47
48 XML TXT Registry 1 Metadata server Operational Microdata Base Registry 2 Registry n Analitycal Microdata Base ETL Tools Portal CAXI Metadata server XML Files Statistical Files Golden Record Metadata SDMX Questionaries
Metainformation management 49 Metainformation Definition Bussines ReferencialConceptualMethodicalQualityStructural Technical System Postprocessi ng
Census Metadata System Quality aspects: accessibility and clarity Developing quality indicators at each stage of census and the procedures for calculating their value. Census Quality – metainformation 50
51 POLAND