CyberGIS: Reston, VA, September 22, 2018

Slides:



Advertisements
Similar presentations
Using American FactFinder John DeWitt Project Manager Social Science Data Analysis Network Lisa Neidert Data Services Population Studies Center.
Advertisements

Global Land Cover 2000 China Window. Data preparation climatic stratification of China VGT data’s preparation remove the cloud contamination synthesizing.
TOOLS FOR FINDING GIS DATA ON THE INTERNET Purdue University Erin Huang Graduate Research Assistant Purdue University Nicole Kong GIS.
Oh Myyy! Dr. Robert S. Chen Director and Senior Research Scientist CIESIN, The Earth Institute.
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 20 – 11 – 2011 College Of Computer Science and Information, Information Systems.
NCAR GIS Program : Bridging Gaps
BlogMyData A Virtual Research Environment for collaborative visualization of environmental data Andrew Milsted | 14 September 2010.
Aggregate data Also called summary data, tabular data Counts of things for places (e.g. counties) or entities Examples: –census volumes –HSUS –ICPSR files.
2. Point Cloud x, y, z, … Complete LiDAR Workflow 1. Survey 4. Analyze / “Do Science” 3. Interpolate / Grid USGS Coastal & Marine.
Global and continental population databases “Supply side view” What has been done Related developments Possible next steps.
UNDERSTANDING SPATIAL DISTRIBUTION OF ASTHMA USING A GEOGRAPHICAL INFORMATION SYSTEM Mohammad A. Rob Management Information Systems University of Houston-Clear.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Shuming Bao China Data Center University of Michigan Spatial Intelligence for Demographic and Economic Information of China.
RATIONALE The storage in a smart phone would cost (in 2011 dollars) $7,571 in 2001 $212,040 in 1991 $3,796,800 in 1981 $56,168,800 in 1971 $1,233,179,000.
U.S. Decennial Census Finding and Accessing Data Summer Durrant October 20, 2014 Data & Geographical Information Librarian Research Data Services
MODIS Subsetting and Visualization Tool: Bringing time-series satellite-based land data to the field scientist National Aeronautics and Space Administration.
Contributions from TU Dresden / GLUES Fakultät Forst, Geo- und Hydrowissenschaften, Fachrichtung Geowissenschaften, Professur Geoinformationssysteme Matthias.
GIS in Real Estate Phil Hurvitz CAUP-Urban Form Lab April 13, 2005.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
TerraPop Vision An organizational and technical framework to preserve, integrate, disseminate, and analyze global-scale spatiotemporal data describing.
ORNL DAAC Spatial Data Access Tool (SDAT): Internet tools to access and visualize land-based data National Aeronautics and Space Administration
The Minnesota Data Harmonization Projects Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek.
Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,
MODIS Land Product Subsets Suresh K. Santhana Vannan, Robert B. Cook, Bruce E. Wilson, Lisa M. Olsen HDF and HDF-EOS Workshop XII October 15 – October.
Raster Concepts.
Where to find LiDAR: Online Data Resources.
Ex_Water Yield Model Data needs 1.Soil depth,an average soil depth value for each cell. The soil depth values should be in millimeters (Raster) Source:
INTRODUCTION TO GEOGRAPHICAL INFORMATION SCIENCE RSG620 Week 1, Lecture 2 April 11, 2012 Department of RS and GISc Institute of Space Technology, Karachi.
06/21/2012 Aggregations of spatial information for data dissemination.
Introduction to Spatial Calculation Estimation of Areas Susceptible to Flood and Soil Loss.
Introducing ArcGIS Chapter 1. Objectives  Understand the architecture of the ArcGIS program.  Become familiar with the types of data files used in ArcGIS.
Deutscher Wetterdienst Consolidation of the software for the generation of External Parameters and extension with new raw data sets Hermann Asensio.
New Data & New Services Xiaosen Wang China Data Center University of Michigan
Vegetation Index Visualization of individual composite period. The tool provides a color coded grid display of the subset region. The tool provides time.
The Integrated Public Use Microdata Series database IPUMSwww.ipums.org Lab 1 Background on the IPUMS and SPSS.
1 Land accounts in Europe – current state and outlook Land accounts 01/10/2015 Daniel Desaulty
TerraPop Mission Enabling research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment.
Data Stewardship at the NOAA Data Centers Sub Topic - Value Added Products ESIP Federation Meeting, Washington, DC January 6-8, 2009.
U.S. Department of the Interior U.S. Geological Survey Automatic Generation of Parameter Inputs and Visualization of Model Outputs for AGNPS using GIS.
IRI/LDEO Climate Data Library M.Benno Blumenthal, Michael Bell, John del Corral, Remi Cousin, and Haibo Liu International Research Institute for Climate.
Developing the Vegetation Drought Response Index (VegDRI): Monitoring Vegetation Stress from a Local to National Scale Brian Wardlow National Drought Mitigation.
Improving the Use and Usability of Survey Data: the LSMS Experience Gero Carletto DEC Data Group The World Bank.
Data access and development: The IPUMS perspective United Nations Commission on Population and Development The data revolution in action: National and.
Lessons Learned from the production of Gridded Population of the World Version 4 (GPW4) Columbia University, CIESIN, USA EFGS October 2014.
The Bear River Watershed Information System Jeffery S. Horsburgh Utah Water Research Laboratory Utah State University David.
The BOP (Billion Object Platform) and WorldMap / Dataverse Integration Harvard Center for Geographic Analysis Tuesday, July 12, 2016 Ben Lewis, Mercè Crosas,
IRI/LDEO Climate Data Library M.Benno Blumenthal, Michael Bell, and John del Corral International Research Institute for Climate and Society Columbia University.
Geo-referenced data and DLI aggregate data sources
Statistics, Census, and GIS Data for China Studies
Data Processing Hollerith 1921
Developing the Vegetation Drought Response Index (VegDRI): Monitoring Vegetation Stress from a Local to National Scale Dr. Brian Wardlow National Drought.
DataNet Collaboration
Collaboration and Outreach
TerraPop Goals Lower barriers to conducting interdisciplinary human-environment interactions research by making data with different formats from different.
SciDataCon September, 2016 Greg Yetman Kytt MacManus
An Introduction to VegDRI
Global Statistical Geospatial Framework – interoperability challenges
URBDP 422 Urban and Regional Geo-Spatial Analysis
Data Queries Raster & Vector Data Models
Introduction to D4Science
Terra Populus Data Domains
TerraPop Goals Lower barriers to conducting interdisciplinary human-environment interactions research by making data with different formats from different.
Lecture 2 Components of GIS
Satellite data that we’ve acquired
Adding Value to Registries through Geospatial Big Data Fusion Geospatial Health Context Big Table Facilitating Geospatial Analysis in Health Research.
Merging statistics and geospatial information Grants 2012
Presentation transcript:

CyberGIS: Reston, VA, September 22, 2018 TerraPopulus is a relatively new project at the Minnesota Population Center. The project is led by the MPC in collaboration with our partners at the University of Minnesota Libraries, the Institute on the Environment at the University of Minnesota, CIESIN at Columbia University, and ICPSR at the University of Michigan.

Mission Statement Enabling research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment.

Overview Future Collaborators Big Heterogeneous Data Location Integration Future Paragon Dynamic Tabulator Terra Explorer TerraPop API Collaborators

Big Heterogeneous Data

TerraPop Data Formats Microdata: Characteristics of individuals and households Area-level data: Characteristics of places defined by boundaries Raster data: Values tied to spatial coordinates

Summarization Tabulation Join Contextual data Dasymetric Mapping Zonal Statistics Spatial Reallocation Join Contextual data Area-Level Data Microdata Rasters Summarization Tabulation Dasymetric Mapping

Location-Based Integration Microdata  Area-level  Raster

Location-Based Integration Microdata Mix and match variables originating in any of the data structures Obtain output in the data structure most useful to you Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

Location-Based Integration Microdata Summarized environmental and population County ID G17003100001 G17003100002 G17003100003 G17003100004 G17003100005 G17003100006 G17003100007 County ID Avg. Ann. Temp. Avg. Ann. Precip. Rent, Rural Rent, Urban Own, Rural Own, Urban G17003100001 21.2 768 3129 1063 637 365 G17003100002 23.4 589 2949 1075 1469 717 G17003100003 24.3 867 3418 1589 1108 617 G17003100004 21.5 943 1882 425 202 142 G17003100005 24.1 2416 572 426 197 G17003100006 24.4 697 2560 934 950 563 G17003100007 25.6 701 2126 653 321 215 County ID Mean Ann. Temp. Max. Ann. Precip. G17003100001 21.2 768 G17003100002 23.4 589 G17003100003 24.3 867 G17003100004 21.5 943 G17003100005 24.1 G17003100006 24.4 697 G17003100007 25.6 701 characteristics for administrative districts Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

Swap this out for a Latin American country

Location-Based Integration Microdata Individuals and households with their environmental and social context Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

Location-Based Integration Microdata Rasters of population and environment data Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time Rasters Area-level data

Current Work Data Paragon Tabulation Geovisualization

Data Aggregate census data Gridded Population of the World Historical data (48 countries) Variables in addition to population by sex (65 countries) Gridded Population of the World Environmental data CRU monthly time series – precipitation & temperature Vegetation characteristics – NDVI, greenness Elevation and derived characteristics Soils Species distribution (GBIF)

Raster Data MODIS Land Data Earth Science Climate Datasets Yearly land cover data derived from the MODIS Terra and Aqua satellites, available for 2001 – 2013 5 land cover classifications, 240 Gigabytes Earth Science Aster 30 Meter DEM resolution - 500 Gigabytes TAUDEM derivatives: slope, solar radiance, wetness index will result in about 6-8 more Terabytes of data Climate Datasets NetCDF Format Climate Research Unit – 40 Gigabytes

Paragon

Joins in Distributed Databases Create a temporary table TMP Reconstitute area on each node as TMP Join TMP with the two local partition of line . . . line1 line2 area2 TMP TMP area1 … nodeN node1 node2

Spatial Join Paragon Query: select a.gid , b.gid from edges_merge_ca_shall as a, arealm_merge_ca_shall as b where st_crosses(a.geom, b.geom) ; PostgreSQL (standalone): 463 seconds Stado-Spatial (2 nodes): 96 seconds

Tabulation

Tabulator Generates area-level data from microdata using geographic level codes National, First Level (e.g. State), Second Level (e.g. County) Parquet on Apache Spark High Compression Ratio 8 Gigabytes gzip compressed 3 Gigabytes parquet compressed Columnar Storage (3,000+)

Query Performance 1 12 million 5 seconds 9 seconds 10 seconds 10 Number of datasets Number of records Time to aggregate by 1 column Time to aggregate by 2 columns Time to aggregate by 3 columns 1 12 million 5 seconds 9 seconds 10 seconds 10 25 million 7 seconds 18 seconds 20 82 million 12 seconds 27 seconds 56 128 million 7.5 seconds 20 seconds 30 seconds

Visualization

Landing Page

Terra Populus Software Stack Geospatial Data Processing Web Application Geospatial Server