Presentation is loading. Please wait.

Presentation is loading. Please wait.

CyberGIS: Reston, VA, September 22, 2018

Similar presentations


Presentation on theme: "CyberGIS: Reston, VA, September 22, 2018"— Presentation transcript:

1 CyberGIS: Reston, VA, September 22, 2018
TerraPopulus is a relatively new project at the Minnesota Population Center. The project is led by the MPC in collaboration with our partners at the University of Minnesota Libraries, the Institute on the Environment at the University of Minnesota, CIESIN at Columbia University, and ICPSR at the University of Michigan.

2 Mission Statement Enabling research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment.

3 Overview Future Collaborators Big Heterogeneous Data
Location Integration Future Paragon Dynamic Tabulator Terra Explorer TerraPop API Collaborators

4 Big Heterogeneous Data

5 TerraPop Data Formats Microdata:
Characteristics of individuals and households Area-level data: Characteristics of places defined by boundaries Raster data: Values tied to spatial coordinates

6 Summarization Tabulation Join Contextual data Dasymetric Mapping
Zonal Statistics Spatial Reallocation Join Contextual data Area-Level Data Microdata Rasters Summarization Tabulation Dasymetric Mapping

7 Location-Based Integration
Microdata  Area-level  Raster

8 Location-Based Integration
Microdata Mix and match variables originating in any of the data structures Obtain output in the data structure most useful to you Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

9 Location-Based Integration
Microdata Summarized environmental and population County ID G G G G G G G County ID Avg. Ann. Temp. Avg. Ann. Precip. Rent, Rural Rent, Urban Own, Rural Own, Urban G 21.2 768 3129 1063 637 365 G 23.4 589 2949 1075 1469 717 G 24.3 867 3418 1589 1108 617 G 21.5 943 1882 425 202 142 G 24.1 2416 572 426 197 G 24.4 697 2560 934 950 563 G 25.6 701 2126 653 321 215 County ID Mean Ann. Temp. Max. Ann. Precip. G 21.2 768 G 23.4 589 G 24.3 867 G 21.5 943 G 24.1 G 24.4 697 G 25.6 701 characteristics for administrative districts Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

10 Swap this out for a Latin American country

11 Location-Based Integration
Microdata Individuals and households with their environmental and social context Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

12 Location-Based Integration
Microdata Rasters of population and environment data Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

13 Current Work Data Paragon Tabulation Geovisualization

14 Data Aggregate census data Gridded Population of the World
Historical data (48 countries) Variables in addition to population by sex (65 countries) Gridded Population of the World Environmental data CRU monthly time series – precipitation & temperature Vegetation characteristics – NDVI, greenness Elevation and derived characteristics Soils Species distribution (GBIF)

15 Raster Data MODIS Land Data Earth Science Climate Datasets
Yearly land cover data derived from the MODIS Terra and Aqua satellites, available for 2001 – 2013 5 land cover classifications, 240 Gigabytes Earth Science Aster 30 Meter DEM resolution Gigabytes TAUDEM derivatives: slope, solar radiance, wetness index will result in about 6-8 more Terabytes of data Climate Datasets NetCDF Format Climate Research Unit – 40 Gigabytes

16 Paragon

17 Joins in Distributed Databases
Create a temporary table TMP Reconstitute area on each node as TMP Join TMP with the two local partition of line line1 line2 area2 TMP TMP area1 … nodeN node1 node2

18 Spatial Join Paragon Query: select a.gid , b.gid from edges_merge_ca_shall as a, arealm_merge_ca_shall as b where st_crosses(a.geom, b.geom) ; PostgreSQL (standalone): 463 seconds Stado-Spatial (2 nodes): 96 seconds

19 Tabulation

20 Tabulator Generates area-level data from microdata using geographic level codes National, First Level (e.g. State), Second Level (e.g. County) Parquet on Apache Spark High Compression Ratio 8 Gigabytes gzip compressed 3 Gigabytes parquet compressed Columnar Storage (3,000+)

21 Query Performance 1 12 million 5 seconds 9 seconds 10 seconds 10
Number of datasets Number of records Time to aggregate by 1 column Time to aggregate by 2 columns Time to aggregate by 3 columns 1 12 million 5 seconds 9 seconds 10 seconds 10 25 million 7 seconds 18 seconds 20 82 million 12 seconds 27 seconds 56 128 million 7.5 seconds 20 seconds 30 seconds

22 Visualization

23 Landing Page

24 Terra Populus Software Stack
Geospatial Data Processing Web Application Geospatial Server

25


Download ppt "CyberGIS: Reston, VA, September 22, 2018"

Similar presentations


Ads by Google