Download presentation
Presentation is loading. Please wait.
Published byLesley Miller Modified over 9 years ago
1
Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesota sobek@umn.edu
3
Integrated Public Use Microdata Series
4
U.S. Labor Force Participation: 1850-2012 Men Women
5
Steve Ruggles 1995: “King of Quant” President Population Association of America
6
New U.S. Data From Ancestry.com
7
We build data infrastructure for research community. Specialize in data harmonization. World’s largest collection of individual population and health data, across 9 projects. 50,000 registered users from over 100 countries. Free Minnesota Population Center
8
MPC Data Dissemination, 1993-2012 Gigabytes per week
9
MPC Data Projects
10
The Problem 1.Combining data from multiple sources is time consuming Discovery Data management 2.It’s error prone Recoding data Overlook documentation 3.Hard to replicate results 4.Discourages comparative research
11
Outline Harmonization methods Dissemination system International projects Integrated DHS Terra Populus IPUMS-International
12
Terminology Harmonization: Combining datasets collected at different times or places into a single, consistent data series. “Integration” Metadata: Data about data. Documentation in broadest sense.
13
Relation to head Marital status Education Occupation Microdata
14
Summary Data
15
Harmonization Methods Metadata Data Dissemination
16
Systematize Metadata (record layout file, pdf)
17
MPC Data Dictionary
18
Water Access Convert Questionnaires to Metadata (Mexico 2000)
19
Metadata: Questionnaire Text
20
Water access Bedrooms Rooms XML-Tagged Questionnaire Text
21
Data: Variable Harmonization Marital Status: IPUMS-International Bangladesh 2011 1 = Unmarried 2 = Married 3 = Widowed 4 = Divorced/separated Mexico 1970 1 = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Kenya 1999 1 = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated
22
Translation Table Input Bangladesh 2011 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed Mexico 1970 1 = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Kenya 1999 1 = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated
23
LabelCode Translation Table Harmonized 1 = Never married1 = Married, civil & relig 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Single Married or in union Married, formally Civil Religious Civil and religious Monogamous Polygamous Consensual union Separated Divorced 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated 100 200 210 211 212 213 214 215 220 00 310 320 00 Mexico 1970 Input Bangladesh 2011 Kenya 1999 Divorced or separated 3 Widowed 4
24
LabelCode Translation Table Harmonized 1 = Never married 1 = Married, civil & relig 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Single Married or in union Married, formally Civil Religious Civil and religious Monogamous Polygamous Consensual union Separated Divorced 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated 100 200 210 211 212 213 214 215 220 00 310 320 00 Mexico 1970 Input Bangladesh 2011 Kenya 1999 Divorced or separated 3 Widowed 4
25
Data Dissemination System
27
Variables Page
28
238 censuses
29
Sample Filtering
30
Variables Page – Filtered
31
Variable Page: Marital Status
32
Variable Codes (Marital status)
33
Variable Codes (Marital status)
34
Variable Codes (Marital status)
35
Variable Page: Marital Status
36
Variable Comparability Discussion (Marital status)
37
Variable Page: Documentation
38
Questionnaire Text
39
(Marital status, Cambodia)
40
Variables Page
41
Extract Summary
42
Case Selection
43
Age of spouse Employment status of father Occupation of father Attached Characteristics
44
Extract Summary
45
Download or Revise Extract
46
On-line Analysis
47
The International Projects
48
Integrated DHS
49
Foremost source of health information for the developing world Funded by USAID Since 1980s, over 300 surveys, 90 countries Topics: fertility, nutrition, HIV, malaria, maternal and child health, etc Demographic and Health Surveys
50
5-year NIH grant (end of year 2) Focus on Africa, with India Partnership with ICF-International and USAID IDHS Project
51
Motivation: DHS is incredibly valuable, but it’s hard to capitalize on its full potential. Problem: Data discovery Dispersed documentation Data management Variable changes over time Not unique to DHS: endemic to any survey that’s persisted over decades. Why an Integrated DHS?
52
DHS Research Process Example: Find data on female genital cutting Survey Search Tool
55
Recode notes Data dictionary Just the woman file – for one survey. 61 to go. Still need Report (377 page pdf) Contains questionnaire and sample design information Errata file
56
DHS “Recode Variables” make it more harmonized than most surveys Consistent variable names Each DHS phase has a shared model questionnaire But: 6 phases over 25+ years Country control over final wording of surveys Country-specific variables The recode variables can be a two-edged sword At least the DHS variables are already harmonized, right?
57
Ghana 1993 V130 Ghana 2008 V130 India 1992 V130 India 2005 V130 Harmonization: Religion
58
Harmonization: Female Circumcision Ever Circumcised
59
Timeline: 2014 (current) 9 countries, 39 samples Much of woman files Women of child bearing age as unit of analysis
60
Timeline: 2015 15 countries, 69 samples Complete the woman files Children & birth files
61
Timeline: 2017 21 countries, 94 samples Men and couples files
62
Timeline: Next grant 41 African countries, 130+ samples 11 Asian countries, 32+ samples
63
Beta
64
Lower barriers to conducting research on population and the environment. Motivation: The data from different domains have incompatible formats, and few researchers have the skills to combine them Terra Populus Goal
65
5 year grant NSF At mid-point: year 3 TerraPop
66
6 countries: Argentina Brazil Malawi Spain United States Vietnam Population Microdata
67
Tabulations of census data for administrative units Area-level Data
68
Land cover from satellite images (Global Land Cover 2000) Agricultural use from satellites and government records (Global Landscapes Initiative) Climate from weather stations (WorldClim) Environmental Data Rasters (Grid Cells)
69
Microdata Area-level data Rasters Mix and match variables originating in any of the data structures Obtain output in the data structure most useful to you Location-Based Integration
70
Individuals and households with their environmental and social context Microdata Area-level data Rasters Location-Based Integration
71
Summarized environmental and population Microdata Area-level data Rasters County ID G17003100001 G17003100002 G17003100003 G17003100004 G17003100005 G17003100006 G17003100007 County ID Mean Ann. Temp. Max. Ann. Precip. G1700310000121.2768 G1700310000223.4589 G1700310000324.3867 G1700310000421.5943 G1700310000524.1867 G1700310000624.4697 G1700310000725.6701 County ID Mean Ann. Temp. Max. Ann. Precip. Rent, Rural Rent, Urban Own, Rural Own, Urban G1700310000121.276831291063637365 G1700310000223.4589294910751469717 G1700310000324.3867341815891108617 G1700310000421.59431882425202142 G1700310000524.18672416572426197 G1700310000624.46972560934950563 G1700310000725.67012126653321215 characteristics for administrative districts Location-Based Integration
72
Rasters of population and environment data Microdata Area-level data Rasters Location-Based Integration
73
Rasterization of Area-Level Data
74
Area-Level Summary of Raster Data
75
Linkages across data formats rely on administrative unit boundaries Particular needs Lower level boundaries Historical boundaries Boundaries are Key
76
Geographic Harmonization
79
Web interface will change significantly in fall 2014 Fast microdata tabulator needed Beta Version
80
IPUMS-International
81
Census microdata from around world Funded by NSF and NIH Motivation: Provide data access Preservation
82
Khartoum, CBS-Sudan
83
Dhaka, Bangladesh Bureau of Statistics
84
IPUMS-International Participating Disseminating
85
IPUMS Censuses Per Country
87
Variables Included in Extracts
88
Top Institutional Users
89
Millennium Development Goals Ratio of literate women to men, 15-24 years old Source: Cuesta and Lovatón (2014) 1990 Census round
90
Millennium Development Goals Source: Cuesta and Lovatón (2014) Data Source: IPUMS-International, Minnesota Population Center Census 1993 Census 2005 Colombia: Adolescent Birth Rate
91
Data acquisition Outreach: developing countries Virtual data enclave IPUMSI Future
92
Thank you! sobek@umn.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.