Download presentation
Presentation is loading. Please wait.
Published byDiana Heath Modified over 9 years ago
1
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey Research Association (ESRA) Session 2.10 The role and benefit of structured metadata in survey research II
2
Outline The Project: Creation of DDI compliant metadata from the contents of a proprietary database Project for the World Bank supported by funding from PARIS21 The Source: IPUMS International database of metadata The Target: IHSN Microdata Toolkit [DDI Codebook based] The Process: Mapping a proprietary database to DDI Codebook standard The Results: Identification of improvements for IPUMS Limitations of ISHN Microdata Toolkit not attributable to DDI Limitations of distribution metadata for supporting on-the-ground processes 2011-07-202W.L. Thomas
3
Project The project was to extract metadata from the IPUMS database system and to create IHSN Microdata Toolkit compatible metadata for the original data files and for the IPUMS extracts of data for developing countries. IPUMS contains metadata and data for 106 censuses from developing countries world wide Goals: Facilitate access to existing full data sets within the IPUMS member country Populate a number of microdata catalogs based on NADA 2011-07-203W.L. Thomas
4
Integrated Public Use Microdata Series, International Census microdata for social and economic research (62 countries, 185 censuses, 397 million person records) IPUMS-International is a project dedicated to collecting and distributing census microdata from around the world. Its purpose is to: Collect and preserve data and documentation Harmonize data Disseminate the data to registered researchers at no cost https://international.ipums.org/international/ 2011-07-204W.L. Thomas
5
IPUMS Metadata Extensive, detailed metadata held as original documents, structured text files, and structured database Basic division is between metadata to: Inform the user Drive the harmonization and/or search and extraction engine Retains the original metadata for all included original variables: Description Source Question Enumerator Instructions Adds information on harmonization process and variables 2011-07-205W.L. Thomas
6
IHSN Microdata Toolkit Developed by the World Bank Data Group for the IHSN to promote the adoption of international standards and best practices for microdata documentation, dissemination and preservation. Consists of 3 modules Microdata Editor: Documents using DDI-Codebook and Dublin Core standards Explorer: Views metadata and exports to common formats CD-ROM Builder: Create user-friendly CD-ROM and web products http://ihsn.org/home/index.php?q=tools/toolkit 2011-07-206W.L. Thomas
7
Process 2011-07-20W.L. Thomas7 The project focused on metadata within the IPUMS database system which could be mapped out to generate a IHSN Microdata Toolkit compatible set of metadata The desired output consisted of A set of Dublin Core descriptions of related technical documents and copies of those documents A set of metadata for the original country data A set of metadata and data for the content of the public extract system (harmonized and country specific variables) All outputs would be delivered to the country NSO for their use Metadata content and document copies would be provided to IHSN for inclusion in their catalog
8
IPUMS Process: Creating the Database 2011-07-20W.L. Thomas8 STEP 1: Incoming data and metadata Prepare data for entry into IPUMS harmonization process and database Data arrives in non-specific formats and reflects the original structure of the data at the NSO or a standard extract format Metadata arrives in non- specific format with inconsistent content coverage (may or may not have English language equivalents)
9
IPUMS Process: Creating the Database 2011-07-20W.L. Thomas9 STEP 1: Incoming data and metadata Prepare data for entry into IPUMS harmonization process and database STEP 2: Intermediary data and metadata Clean and restrict for confidentiality Harmonize Establish metadata links Data is selected and reformatted into household and person records, IPUMS generated variables are added for processing Basic metadata is entered into a spreadsheet including; original name, description, IPUMS name, translations, frequency (generated), and all extant content types (including errors). Creation of database
10
IPUMS Process: Creating the Database 2011-07-20W.L. Thomas10 STEP 1: Incoming data and metadata Prepare data for entry into IPUMS harmonization process and database STEP 2: Intermediary data and metadata Clean and restrict for confidentiality Harmonize Establish metadata links STEP 3: IPUMS Clean, confidential, harmonized data Metadata includes links to original data, questions, instructions, and transformation processes Original data structures have not been retained in the database, includes only the cleaned, restricted content and harmonized variables created by IPUMS
11
Process Constraints 2011-07-20W.L. Thomas11 Working from the IPUMS database we were only able to construct microdata from STEP 2 and STEP 3 Linking back to original metadata and data structure would require individualized processing (entry and linking) Had we been able to construct metadata for STEP 1 it would not have been IHSN Microdata Toolkit compatible, just DDI-Codebook compatible Linking information held as file names Alphabetic codes Person records only (households required construction) Multi-record relational data base structures for data
12
Results 2011-07-20W.L. Thomas12 Modified products delivered Within 24 hours of submission, all metadata files were loaded into the IHSN NADA based catalog and were searchable down to the variable level in that system The MPC is now capable of creating DDI/IHSN/NESSTAR compatible codebooks for all extracts on all microdata systems Countries can create full extracts of their census data to match the intermediary structure (household/person record) found in the metadata, import it to the Microdata Toolkit and explore or create desired output easily Countries can easily compare or create products from the harmonized data from their country or download additional data from the IPUMS International system for cross-national comparison
13
Insights 2011-07-20W.L. Thomas13 The IPUMS database captured all required metadata elements either directly or in related structured files More information should be moved to the database Informational items that could be structured (i.e., dates) should have both unstructured and structured content Database should be expanded to capture input structures and retain the full process metadata from Step 1 to Step 3 (for curation as well as process purposes) Simplifies the shift to DDI Lifecycle output by simple addition of comparative information The IHSN Microdata Toolkit is limited in its ability to support legacy data and legacy data systems on the ground Most limitations are due to software rather than standards constraints
14
Distribution and Process Metadata 2011-07-20W.L. Thomas14 IPUMS, like most metadata databases, contains the majority of required objects for supporting distribution metadata in a DDI format (codebook or lifecycle) The process of mapping to a detailed standard raised specific issues and highlighted possible improvements to the database content and structure Brought to light the need to capture more metadata at an earlier point in the process. Metadata produced outside the database structure was lost Object level version control was missing so intermediary content was lost
15
Conclusions 2011-07-20W.L. Thomas15 The ability to create distribution metadata in a standard format increases utility for users, provides new product options, provides easy means of populating external catalogs The process of mapping to a well formed standard provides an opportunity for content and process improvement Capture of incoming metadata is critical to shifting from a distribution metadata structure to a process metadata structure Process structures not only can capture the development of the data within the system but can be used to drive the process and data and metadata checks
16
Contacts 2011-07-20W.L. Thomas16 Wendy Thomas – Project manager wlt@umn.edu Benjamin Youngdahl – Programmer beny@umn.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.