Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,

Similar presentations


Presentation on theme: "Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,"— Presentation transcript:

1 Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesota sobek@umn.edu

2

3 Integrated Public Use Microdata Series

4 U.S. Labor Force Participation: 1850-2012 Men Women

5 Steve Ruggles 1995: “King of Quant” President Population Association of America

6 New U.S. Data From Ancestry.com

7  We build data infrastructure for research community. Specialize in data harmonization.  World’s largest collection of individual population and health data, across 9 projects.  50,000 registered users from over 100 countries.  Free Minnesota Population Center

8 MPC Data Dissemination, 1993-2012 Gigabytes per week

9 MPC Data Projects

10 The Problem 1.Combining data from multiple sources is time consuming  Discovery  Data management 2.It’s error prone  Recoding data  Overlook documentation 3.Hard to replicate results 4.Discourages comparative research

11 Outline  Harmonization methods  Dissemination system  International projects  Integrated DHS  Terra Populus  IPUMS-International

12 Terminology Harmonization: Combining datasets collected at different times or places into a single, consistent data series. “Integration” Metadata: Data about data. Documentation in broadest sense.

13 Relation to head Marital status Education Occupation Microdata

14 Summary Data

15 Harmonization Methods  Metadata  Data  Dissemination

16 Systematize Metadata (record layout file, pdf)

17 MPC Data Dictionary

18 Water Access Convert Questionnaires to Metadata (Mexico 2000)

19 Metadata: Questionnaire Text

20 Water access Bedrooms Rooms XML-Tagged Questionnaire Text

21 Data: Variable Harmonization Marital Status: IPUMS-International Bangladesh 2011 1 = Unmarried 2 = Married 3 = Widowed 4 = Divorced/separated Mexico 1970 1 = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Kenya 1999 1 = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated

22 Translation Table Input Bangladesh 2011 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed Mexico 1970 1 = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Kenya 1999 1 = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated

23 LabelCode Translation Table Harmonized 1 = Never married1 = Married, civil & relig 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Single Married or in union Married, formally Civil Religious Civil and religious Monogamous Polygamous Consensual union Separated Divorced 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated 100 200 210 211 212 213 214 215 220 00 310 320 00 Mexico 1970 Input Bangladesh 2011 Kenya 1999 Divorced or separated 3 Widowed 4

24 LabelCode Translation Table Harmonized 1 = Never married 1 = Married, civil & relig 4 = Divrc or separated 1 = Unmarried 2 = Married 3 = Widowed 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single Single Married or in union Married, formally Civil Religious Civil and religious Monogamous Polygamous Consensual union Separated Divorced 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated 100 200 210 211 212 213 214 215 220 00 310 320 00 Mexico 1970 Input Bangladesh 2011 Kenya 1999 Divorced or separated 3 Widowed 4

25 Data Dissemination System

26

27 Variables Page

28 238 censuses

29 Sample Filtering

30 Variables Page – Filtered

31 Variable Page: Marital Status

32 Variable Codes (Marital status)

33 Variable Codes (Marital status)

34 Variable Codes (Marital status)

35 Variable Page: Marital Status

36 Variable Comparability Discussion (Marital status)

37 Variable Page: Documentation

38 Questionnaire Text

39 (Marital status, Cambodia)

40 Variables Page

41 Extract Summary

42 Case Selection

43 Age of spouse Employment status of father Occupation of father Attached Characteristics

44 Extract Summary

45 Download or Revise Extract

46 On-line Analysis

47 The International Projects

48 Integrated DHS

49  Foremost source of health information for the developing world  Funded by USAID  Since 1980s, over 300 surveys, 90 countries  Topics: fertility, nutrition, HIV, malaria, maternal and child health, etc Demographic and Health Surveys

50  5-year NIH grant (end of year 2)  Focus on Africa, with India  Partnership with ICF-International and USAID IDHS Project

51 Motivation: DHS is incredibly valuable, but it’s hard to capitalize on its full potential. Problem:  Data discovery  Dispersed documentation  Data management  Variable changes over time Not unique to DHS: endemic to any survey that’s persisted over decades. Why an Integrated DHS?

52 DHS Research Process Example: Find data on female genital cutting Survey Search Tool

53

54

55 Recode notes Data dictionary Just the woman file – for one survey. 61 to go. Still need Report (377 page pdf) Contains questionnaire and sample design information Errata file

56 DHS “Recode Variables” make it more harmonized than most surveys  Consistent variable names  Each DHS phase has a shared model questionnaire But:  6 phases over 25+ years  Country control over final wording of surveys  Country-specific variables  The recode variables can be a two-edged sword At least the DHS variables are already harmonized, right?

57 Ghana 1993 V130 Ghana 2008 V130 India 1992 V130 India 2005 V130 Harmonization: Religion

58 Harmonization: Female Circumcision Ever Circumcised

59 Timeline: 2014 (current) 9 countries, 39 samples Much of woman files Women of child bearing age as unit of analysis

60 Timeline: 2015 15 countries, 69 samples Complete the woman files Children & birth files

61 Timeline: 2017 21 countries, 94 samples Men and couples files

62 Timeline: Next grant 41 African countries, 130+ samples 11 Asian countries, 32+ samples

63 Beta

64 Lower barriers to conducting research on population and the environment. Motivation: The data from different domains have incompatible formats, and few researchers have the skills to combine them Terra Populus Goal

65 5 year grant NSF  At mid-point: year 3 TerraPop

66 6 countries:  Argentina  Brazil  Malawi  Spain  United States  Vietnam Population Microdata

67 Tabulations of census data for administrative units Area-level Data

68 Land cover from satellite images (Global Land Cover 2000) Agricultural use from satellites and government records (Global Landscapes Initiative) Climate from weather stations (WorldClim) Environmental Data Rasters (Grid Cells)

69 Microdata Area-level data Rasters Mix and match variables originating in any of the data structures Obtain output in the data structure most useful to you Location-Based Integration

70 Individuals and households with their environmental and social context Microdata Area-level data Rasters Location-Based Integration

71 Summarized environmental and population Microdata Area-level data Rasters County ID G17003100001 G17003100002 G17003100003 G17003100004 G17003100005 G17003100006 G17003100007 County ID Mean Ann. Temp. Max. Ann. Precip. G1700310000121.2768 G1700310000223.4589 G1700310000324.3867 G1700310000421.5943 G1700310000524.1867 G1700310000624.4697 G1700310000725.6701 County ID Mean Ann. Temp. Max. Ann. Precip. Rent, Rural Rent, Urban Own, Rural Own, Urban G1700310000121.276831291063637365 G1700310000223.4589294910751469717 G1700310000324.3867341815891108617 G1700310000421.59431882425202142 G1700310000524.18672416572426197 G1700310000624.46972560934950563 G1700310000725.67012126653321215 characteristics for administrative districts Location-Based Integration

72 Rasters of population and environment data Microdata Area-level data Rasters Location-Based Integration

73 Rasterization of Area-Level Data

74 Area-Level Summary of Raster Data

75 Linkages across data formats rely on administrative unit boundaries Particular needs  Lower level boundaries  Historical boundaries Boundaries are Key

76 Geographic Harmonization

77

78

79 Web interface will change significantly in fall 2014 Fast microdata tabulator needed Beta Version

80 IPUMS-International

81 Census microdata from around world Funded by NSF and NIH Motivation:  Provide data access  Preservation

82 Khartoum, CBS-Sudan

83 Dhaka, Bangladesh Bureau of Statistics

84 IPUMS-International Participating Disseminating

85 IPUMS Censuses Per Country

86

87 Variables Included in Extracts

88 Top Institutional Users

89 Millennium Development Goals Ratio of literate women to men, 15-24 years old Source: Cuesta and Lovatón (2014) 1990 Census round

90 Millennium Development Goals Source: Cuesta and Lovatón (2014) Data Source: IPUMS-International, Minnesota Population Center Census 1993 Census 2005 Colombia: Adolescent Birth Rate

91  Data acquisition  Outreach: developing countries  Virtual data enclave IPUMSI Future

92 Thank you! sobek@umn.edu


Download ppt "Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,"

Similar presentations


Ads by Google