Download presentation
Presentation is loading. Please wait.
Published byDwain Stafford Modified over 6 years ago
2
TerraPop Goals Lower barriers to conducting interdisciplinary human-environment interactions research by making data with different formats from different scientific domains easily interoperable Provide an organizational and technical framework to preserve, integrate, disseminate, and analyze global-scale spatiotemporal data describing population and the environment.
3
Source Data Domains & Formats Population Microdata Area-Level Data
4
Terra Populus Data Domains
Microdata Land cover Individuals and households Population Environment Land use Disparate scientific domains – interrelated processes Multiple data formats Areal Data Climate
5
Trent Alexander, CIC Conference Presentation on IPUMS, 10/11/07
Age Birthplace Sex Mother’s birthplace Relationship Race Occupation PopulationMicrodata Structure Geographic and housing characteristics Rows Household records Person records within households Columns Variables Key points – Preserve richness of microdata Location based on administrative units
6
Microdata Availability
Thanks to attending partners: Czech Republic* Mexico* Brazil University of Barcelona Slovenia* Netherlands* Spain* Italy* Poland* *anticipating 2010 or 2011 data Croatia – legislation approved? Norway & Sweden NAPP only Bulgaria no data provided Finland, Slovakia, Kosovo Denmark?
7
Area-level Data Sources
Census tables, especially where microdata is unavailable Other types of surveys, data Agricultural surveys Economic surveys, data Election data Legal information
8
Environmental Data (Rasters) TerraPop Prototype
Land cover data from satellite images (Global Land Cover 2000) Agricultural land use data from satellites and government records (Global Landscapes Initiative) Climate data from weather stations (WorldClim)
9
Location-Based Integration
Microdata Area-level Raster
10
Location-Based Integration
Microdata Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data
11
Location-Based Integration
Microdata Individuals and households with their environmental and social context Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data
12
Location-Based Integration
Microdata Summarized environmental and population County ID G G G G G G G County ID Mean Ann. Temp. Max. Ann. Precip. Rent, Rural Rent, Urban Own, Rural Own, Urban G 21.2 768 3129 1063 637 365 G 23.4 589 2949 1075 1469 717 G 24.3 867 3418 1589 1108 617 G 21.5 943 1882 425 202 142 G 24.1 2416 572 426 197 G 24.4 697 2560 934 950 563 G 25.6 701 2126 653 321 215 County ID Mean Ann. Temp. Max. Ann. Precip. G 21.2 768 G 23.4 589 G 24.3 867 G 21.5 943 G 24.1 G 24.4 697 G 25.6 701 characteristics for administrative districts Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data
13
Location-Based Integration
Microdata Rasters of population and environment data Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data
14
Boundaries are Key Linkages across data formats rely on administrative unit boundaries Particular needs Lower level boundaries Historical boundaries
15
Administrative Unit Boundary Processing
Obtaining Linking to microdata Temporal harmonization regionalization
16
Obtaining Boundary Data
Potential sources of digital data National Statistical Offices Global Administrative Areas data (e.g. SALB, GAUL) Digitizing from images or paper maps Challenges Lower level and historical data Date mismatches with census data Code matching to microdata
17
Digitizing Boundaries Leveraging available digital data
Script input Existing digital data Rough digitized boundaries Script output Relevant boundaries from digital data Relationship between digital and digitized units Advantages Preserve accuracy and detail Flag areas needing more work
18
Code Matching Codes link boundaries to microdata records, connect people to places Boundary data may or may not include codes Approach Name matching, when possible Map observations – digitizing script captures codes Research on boundary changes Boundary shape attributes IPUMS microdata
19
Temporal Harmonization
Purpose Create consistent units for time-series analysis Top-down strategy Start with first administrative level units Harmonize 2nd level units within 1st level “containers” Script to create “least common denominator” units Applicable when maps from multiple years are available Creates aggregate units encompassing areas with boundary changes Constructs source-harmonized crosswalk
20
“Erase” interior boundaries applicable to only one census
Apply harmonized codes Also aids in code matching Crosswalk
21
Regionalization Confidentiality concerns require minimum 20,000 population in each unit disseminated REDCAP tool Constructs regions by combining units Regions meet minimum population threshold Contiguity constrained Combines units that are similar in terms of a selected variable Currently in testing phase REDCAP Algorithms and parameters Optimization variables (e.g., pop. density, education, occupation) Testing on Malawi TAs, Brazil 2000 municipios Guo’s NSF grant number is , in case they ask
22
Regionalization - Lilongwe, Malawi
Units < 20K combined with neighbors to meet threshold Specific aggregation depends on Optimization variable Algorithm Need to check this map on the projector….may be tough to see
23
Beyond Administrative Boundaries
Arbitrary boundaries rasterization
24
Arbitrary Boundaries Watersheds, buffers around features, etc.
Near-term Summarize rasters to user-supplied boundaries Identify administrative units intersecting user-supplied boundaries Future Reallocation based on uniform distribution assumption Reallocation based on other assumptions
25
Rasterization Prototype - All cells in unit get the same value
Use lowest level units available Rates only, not counts Future – Distribute based on ancillary data Requires research on available methods May provide as service – users select: Ancillary data Weights Spatial distribution parameters
26
Acknowledgements IPUMS-International Participating Countries
Brazil Bulgaria Czech Republic Germany Italy Ireland IPUMS-International Supporters & Partners Eurostat Universitat Autònoma de Barcelona Mexico Netherlands Poland Slovenia Spain
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.