Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, for UNECE Training Workshop on Census Technology for SPECA member countries, Astana, 7-8 June 2007
Content A. Systematic documenting of census data sets B. Why to disseminate microdata? C. Microdata Management Toolkit
A. Systematic documenting of census data sets
A good census dataset is.. Documented clearly Contains no surprises Allows users to –Start working effectively quickly –Find the data they are interested in –Understand what the data are measuring and how the data have been created –Assess the quality of the data
Evolving documentation technology Own documentation standards => International metadata standards National practices => International good practices. Ad hoc tools => Structuring tools, databases Text-based codebooks => XML-based codebooks
Maintain metadata in a centralised database Manage definitions, methodology information, variable information, data collection information in one place Ensures consistency across data holdings Approach useful for planning, data collection, processing, analysis and dissemination
Good practices in data documentation Explanatory material –Minimum material required to ensure the long-term viability and functionality of a dataset Contextual information –Material about the context in which the data was collected, and how it was put to use –Enables the secondary user to fully understand the background and processes behind the data collection exercise. Cataloguing material –Bibliographic record of the dataset, for proper acknowledgement and citation –Basic instrument used for resource discovery
B. Why to disseminate microdata?
Untapped potential of microdata for national development Even the best planned tabulations cannot exhaustively bring out all valuable information from census data Diversity, disparities and related causalities are best analysed from microdata, e.g. –Tracking the effects of policy interventions on target groups –Determining dimensions of within-country disparities The quality of research would improve => Return on data collection would increase => National policies could be targeted better => More efficient use of public resources
Factors that might hinder microdata dissemination - Discussion Concerns about data confidentiality Ambiguous or missing national legislation Narrow mandate of statistical agency Concerns about data quality Low demand from data users
International initiatives Marrakech Action Plan on Statistics, an_for_Statistics.pdf an_for_Statistics.pdf International Household Survey Network, IHSN Microdata Management Toolkit ESCAP-World Bank-PARIS21 project on improving access to survey microdata in Asia and the Pacific
ESCAP project on improving access to survey microdata in Asia and the Pacific, Household surveys and population and housing censuses, not establishment surveys Assessment of status of microdata dissemination Regional inventory and data archive of household surveys Regional advocacy and training workshops On-site training and technical advice on documentation and anonymization
C. Microdata Management Toolkit
Microdata Management Toolkit – Summary A set of software tools for the documentation, archiving, dissemination and preservation of microdata 1. Metadata Editor –Document survey data in accordance with international standards 2. CD-Rom Builder –Generates user-friendly outputs, such as CDs, websites, for dissemination and archiving 3. The Explorer –For viewing metadata –For re-exporting data to various formats
Download and use The Toolkit can be downloaded from documentation&lvl3=toolkit documentation&lvl3=toolkit Except Metadata Editor, all Toolkit components are available for free Nesstar Editor: One free license for NSOs of the World bank IDA countries (e.g. Afghanistan, Georgia, Kyrgyz Republic, Moldova, Tajikistan)
Metadata Editor Documents survey data in accordance with international standards Data Documentation Initiative (DDI) Dublin Core Metadata Initiative (DCMI) Data & metadata in one single file Data can be imported from various formats, incl. statistical packages Produces survey documentation in PDF format
Extensible Mark-up Language (XML) Language to describe data using tags Tags conceptually the same as fields in databases XML files are regular text files Can be edited with text editors XML files, like databases, can be: Searched and queried Edited Tutorial:
XML example Multiple Indicator Cluster Survey 2005 MICS National Statistics Office (NSO) United Nations Children Fund Popstan National 5,000 households, stratified two stages 98 percent
XML advantages Creation of a comprehensive checklist of useful metadata elements Potential to assess the content of a file by determining whether particular tags are, or are not, within that file Creation of a dataset catalogue which can be queried for key metadata elements Potential to transform the file into more user- friendly formats, such as HTML, PDF XML files can be exchanged across networks or over the Internet using web services or SOAP
CD-ROM Builder Integrates with Metadata Editor Generates user-friendly outputs (CD-Rom, website) for dissemination and archiving (HTML format) Allows customization –Branding: look and feel of CD or website –Content: single or multiple surveys
CD-ROM Builder process Create new CD-ROM Project Add a survey to the project and select its type and branding 1 2 Selecting a consisting survey by opening the DDI-XML or Nesstar file The survey branding determines the overall look and feel of the CD The survey type determines the default metadata content Selecting a consisting survey by opening the DDI-XML or Nesstar file The survey branding determines the overall look and feel of the CD The survey type determines the default metadata content Click the Save button to generate the HTML interface 3 After a few minutes, your CD Project is ready for publishing! 4
CD-ROM Builder sample outputs
Demonstration of Metadata Editor A live demonstration with Popstan dataset, on-screen in English and Russian
Thank you!
Discussion, questions, answers