Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Data Interpretation, Further Analysis and Dissemination Workshop Basic Concepts of Further Analysis.
Advertisements

Dissemination of U.S. Census Data and Results: The role of ICPSR First Conference of Al-Khawarezmi Committee on Statistics Doha, Qatar 6-8 December 2010.
Variance Estimation: Drawing Statistical Inferences from IPUMS-International Census Data Lara L. Cleveland IPUMS-International November 14, 2010 Havana,
REPUBLIC OF RWANDA National Institute of Statistics Prepared by Emmanuel GATERA National Institute of Statistics of Rwanda Management Information Systems.
1 Assortative Mating Patterns in the Developing World Albert Esteve* and Robert McCaa** Presented by: Sula Sarkar** * Centre d ’ Estudis Demogr à fics.
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
WORKSHOP ON INTEGRATING GLOBAL CENSUS MICRO DATA Paris, June 7 – 10, 2006 UGANDA COUNTRY REPORT by Andrew Mukulu.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Steven Ruggles Minnesota Population Center.
5. Integration of Microdata and Metadata (9 slides)
Original dataOriginal data. (various) Reformat dataReformat data: structural issues draw sample confidentiality (general tools) Data dictionary. (txt/pdf)
Labor Statistics in the United States Grace York March 2004.
Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota Population Center.
IPUMS-EurAsia, : Changing Patterns of Microdata Use * * * Robert McCaa, Professor of Population History University.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Robert McCaa and Steven Ruggles Minnesota.
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
Census Bureau – Fernando Casimiro, Coordinator Lisboa IPUMS - Portugal Country Report.
Uses of Population Censuses and Household Sample Surveys for Vital Statistics in South Africa United Nations Expert Group Meeting on International Standards.
Lecture 3: Data sources Health inequality monitoring: with a special focus on low- and middle-income countries.
U.S. Census Bureau Demographic Census 2000 July 8, 2003.
Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center
RATIONALE The storage in a smart phone would cost (in 2011 dollars) $7,571 in 2001 $212,040 in 1991 $3,796,800 in 1981 $56,168,800 in 1971 $1,233,179,000.
United Nations Demographic Yearbook Data Collection System Adriana Skenderi United Nations Statistics Division Third Regional Workshop on Production and.
United Nations Expert Group Meeting on International Standards for Civil Registration and Vital Statistics Systems, June 2011, New York Collection,
Saadia GreenbergElena Fazio Office of Performance and Evaluation Administration on Aging US Department.
U.S. Decennial Census Finding and Accessing Data Summer Durrant October 20, 2014 Data & Geographical Information Librarian Research Data Services
American Factfinder Workshop Nola du Toit Spring 2007.
Design and Use of the IPUMS-International Data Series
Roomers and Boarders: Melissa Scopilliti, University of Maryland, Maryland Population Research Center; Population Division, U.S. Census Bureau.
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
Data: Hidden Gems for the Quantitatively Challenged Paul Bern, Bobray Bordelon, Don Broach Social Science Reference Center Princeton University Library.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
Integrating ACS with the World’s Census Data: ACS Microdata and the IPUMS Presented at the Pre-ALAP ACS/IPUMS Workshop November 16, 2010 Trent Alexander.
Workshop on the Improvement of Civil Registration and Vital Statistics in the SADC Region, Blantyre, Malawi, 1 – 5 December 2008 Vital statistics and their.
Workshop on Improvement of Civil Registration and Vital Statistics in SADC Region Blantyre, Malawi, December 2008 Compilation of Vital Statistics.
TerraPop Vision An organizational and technical framework to preserve, integrate, disseminate, and analyze global-scale spatiotemporal data describing.
Design and Use of the IPUMS-International Data Serieshttp://international.ipums.org Matt Sobek Minnesota Population Center
The Minnesota Data Harmonization Projects Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek.
IPUMS-International Methods Matt Sobek Minnesota Population Center
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
CensusInfo Technical Support Egypt, 3-7 May 2010 Overview of CensusInfo as Tool for Dissemination of Census Data.
Canadian Census. What is a census? An official count of the population of a country A statistical portrait of a country and its people Procedure of systematically.
United Nations Regional Seminar on Census Data Dissemination and Spatial Analysis Amman - Jordan 16 – 19 May 2011 Determination of the scope and form of.
IPUMS Microdata Relation to head Marital status Literacy Occupation.
American Community Survey (ACS) Product Types: Tables and Maps Samples Revised
 Background Data harmonization Data output  Web: Variable documentation system  Web: Data extract system IPUMS Dissemination System.
MEASURE DHS Questionnaire issues July 10, 2007 By: Martin Vaessen.
Integrated Public Use Microdata Series IPUMSwww.ipums.org Matt Sobek Minnesota Population Center
The Integrated Public Use Microdata Series database IPUMSwww.ipums.org Lab 1 Background on the IPUMS and SPSS.
TerraPop Mission Enabling research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment.
Workshop on Collection and Dissemination of Socio-economic Data from Population and Housing Censuses New Delhi, India, May 2012 United Nations Demographic.
IPUMS-International Process Matt Sobek Minnesota Population Center
Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center
Census Office Fernando Casimiro Geneva, July 2010 Portugal – Census results tailored to user needs «
Integrated Public Use Microdata Series IPUMS Internationalwww.ipums.org Matt Sobek Minnesota Population Center
Integrated Public Use Microdata Series IPUMSwww.ipums.org.
Improving the Use and Usability of Survey Data: the LSMS Experience Gero Carletto DEC Data Group The World Bank.
Data access and development: The IPUMS perspective United Nations Commission on Population and Development The data revolution in action: National and.
2014 Kenya Demographic and Health Survey (KDHS) Survey Methodology Follow along on
Matt Sobek Minnesota Population Center
Data Processing Hollerith 1921
Collaboration and Outreach
TerraPop Goals Lower barriers to conducting interdisciplinary human-environment interactions research by making data with different formats from different.
CyberGIS: Reston, VA, September 22, 2018
Terra Populus Data Domains
IPUMS-International Integration Process
TerraPop Goals Lower barriers to conducting interdisciplinary human-environment interactions research by making data with different formats from different.
The IPUMS-International Dissemination System
Presentation transcript:

Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesota

Integrated Public Use Microdata Series

U.S. Labor Force Participation: Men Women

Steve Ruggles 1995: “King of Quant” President Population Association of America

New U.S. Data From Ancestry.com

 We build data infrastructure for research community. Specialize in data harmonization.  World’s largest collection of individual population and health data, across 9 projects.  50,000 registered users from over 100 countries.  Free Minnesota Population Center

MPC Data Dissemination, Gigabytes per week

MPC Data Projects

The Problem 1.Combining data from multiple sources is time consuming  Discovery  Data management 2.It’s error prone  Recoding data  Overlook documentation 3.Hard to replicate results 4.Discourages comparative research

Outline  Harmonization methods  Dissemination system  International projects  Integrated DHS  Terra Populus  IPUMS-International

Terminology Harmonization: Combining datasets collected at different times or places into a single, consistent data series. “Integration” Metadata: Data about data. Documentation in broadest sense.

Relation to head Marital status Education Occupation Microdata

Summary Data

Harmonization Methods  Metadata  Data  Dissemination

Systematize Metadata (record layout file, pdf)

MPC Data Dictionary

Water Access Convert Questionnaires to Metadata (Mexico 2000)

Metadata: Questionnaire Text

Water access Bedrooms Rooms XML-Tagged Questionnaire Text

Data: Variable Harmonization Marital Status: IPUMS-International Bangladesh = Unmarried 2 = Married 3 = Widowed 4 = Divorced/separated Mexico = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Kenya = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated

Translation Table Input Bangladesh = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed Mexico = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Kenya = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated

LabelCode Translation Table Harmonized 1 = Never married1 = Married, civil & relig 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Single Married or in union Married, formally Civil Religious Civil and religious Monogamous Polygamous Consensual union Separated Divorced 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated Mexico 1970 Input Bangladesh 2011 Kenya 1999 Divorced or separated 3 Widowed 4

LabelCode Translation Table Harmonized 1 = Never married 1 = Married, civil & relig 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Single Married or in union Married, formally Civil Religious Civil and religious Monogamous Polygamous Consensual union Separated Divorced 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated Mexico 1970 Input Bangladesh 2011 Kenya 1999 Divorced or separated 3 Widowed 4

Data Dissemination System

Variables Page

238 censuses

Sample Filtering

Variables Page – Filtered

Variable Page: Marital Status

Variable Codes (Marital status)

Variable Codes (Marital status)

Variable Codes (Marital status)

Variable Page: Marital Status

Variable Comparability Discussion (Marital status)

Variable Page: Documentation

Questionnaire Text

(Marital status, Cambodia)

Variables Page

Extract Summary

Case Selection

Age of spouse Employment status of father Occupation of father Attached Characteristics

Extract Summary

Download or Revise Extract

On-line Analysis

The International Projects

Integrated DHS

 Foremost source of health information for the developing world  Funded by USAID  Since 1980s, over 300 surveys, 90 countries  Topics: fertility, nutrition, HIV, malaria, maternal and child health, etc Demographic and Health Surveys

 5-year NIH grant (end of year 2)  Focus on Africa, with India  Partnership with ICF-International and USAID IDHS Project

Motivation: DHS is incredibly valuable, but it’s hard to capitalize on its full potential. Problem:  Data discovery  Dispersed documentation  Data management  Variable changes over time Not unique to DHS: endemic to any survey that’s persisted over decades. Why an Integrated DHS?

DHS Research Process Example: Find data on female genital cutting Survey Search Tool

Recode notes Data dictionary Just the woman file – for one survey. 61 to go. Still need Report (377 page pdf) Contains questionnaire and sample design information Errata file

DHS “Recode Variables” make it more harmonized than most surveys  Consistent variable names  Each DHS phase has a shared model questionnaire But:  6 phases over 25+ years  Country control over final wording of surveys  Country-specific variables  The recode variables can be a two-edged sword At least the DHS variables are already harmonized, right?

Ghana 1993 V130 Ghana 2008 V130 India 1992 V130 India 2005 V130 Harmonization: Religion

Harmonization: Female Circumcision Ever Circumcised

Timeline: 2014 (current) 9 countries, 39 samples Much of woman files Women of child bearing age as unit of analysis

Timeline: countries, 69 samples Complete the woman files Children & birth files

Timeline: countries, 94 samples Men and couples files

Timeline: Next grant 41 African countries, 130+ samples 11 Asian countries, 32+ samples

Beta

Lower barriers to conducting research on population and the environment. Motivation: The data from different domains have incompatible formats, and few researchers have the skills to combine them Terra Populus Goal

5 year grant NSF  At mid-point: year 3 TerraPop

6 countries:  Argentina  Brazil  Malawi  Spain  United States  Vietnam Population Microdata

Tabulations of census data for administrative units Area-level Data

Land cover from satellite images (Global Land Cover 2000) Agricultural use from satellites and government records (Global Landscapes Initiative) Climate from weather stations (WorldClim) Environmental Data Rasters (Grid Cells)

Microdata Area-level data Rasters Mix and match variables originating in any of the data structures Obtain output in the data structure most useful to you Location-Based Integration

Individuals and households with their environmental and social context Microdata Area-level data Rasters Location-Based Integration

Summarized environmental and population Microdata Area-level data Rasters County ID G G G G G G G County ID Mean Ann. Temp. Max. Ann. Precip. G G G G G G G County ID Mean Ann. Temp. Max. Ann. Precip. Rent, Rural Rent, Urban Own, Rural Own, Urban G G G G G G G characteristics for administrative districts Location-Based Integration

Rasters of population and environment data Microdata Area-level data Rasters Location-Based Integration

Rasterization of Area-Level Data

Area-Level Summary of Raster Data

Linkages across data formats rely on administrative unit boundaries Particular needs  Lower level boundaries  Historical boundaries Boundaries are Key

Geographic Harmonization

Web interface will change significantly in fall 2014 Fast microdata tabulator needed Beta Version

IPUMS-International

Census microdata from around world Funded by NSF and NIH Motivation:  Provide data access  Preservation

Khartoum, CBS-Sudan

Dhaka, Bangladesh Bureau of Statistics

IPUMS-International Participating Disseminating

IPUMS Censuses Per Country

Variables Included in Extracts

Top Institutional Users

Millennium Development Goals Ratio of literate women to men, years old Source: Cuesta and Lovatón (2014) 1990 Census round

Millennium Development Goals Source: Cuesta and Lovatón (2014) Data Source: IPUMS-International, Minnesota Population Center Census 1993 Census 2005 Colombia: Adolescent Birth Rate

 Data acquisition  Outreach: developing countries  Virtual data enclave IPUMSI Future

Thank you!