Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.

Slides:



Advertisements
Similar presentations
High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2,
Advertisements

Supporting Research on Campus - Using Cyberinfrastructure (CI) Public research use of ICT has rapidly increased in the past decade, requiring high performance.
Connecticut State Data Center at the Map and Geographic Information Center - MAGIC Connecticut State Data Center Data Collaborator for Planning, Analysis,
State of CyberGIS State of CyberGIS Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
SAN DIEGO SUPERCOMPUTER CENTER The Integration of 2 Science Gateways: CyberGIS + OpenTopography Choonhan Youn, Nancy Wilkins-Diehr, SDSC Christopher Crosby,
A CyberGIS Environment for Near-Real-Time Spatial Analysis of Social Media Data Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory.
Accelerating TauDEM as a Scalable Hydrological Terrain Analysis Service on XSEDE 1 Ye Fan 1, Yan Liu 1, Shaowen Wang 1, David Tarboton 2, Ahmet Yildirim.
Collection Development Policies for Digital Maps and Geospatial Information Princeton University Library NGDA Collections Workshop Stanford University.
GIS: The Grand Unifying Technology. Introduction to GIS  What is GIS?  Why GIS?  Contributing Disciplines  Applications of GIS  GIS functions  Information.
U.S. Department of the Interior U.S. Geological Survey National Geospatial Technical Operations Center Towards a More Consistent Framework for Disseminated.
Rapid Raster Projection Transformation and Web Service Using High-performance Computing Technology 2009 AAG Annual Meeting Las Vegas, NV March 25 th, 2009.
GIS Overview. What is GIS? GIS is an information system that allows for capture, storage, retrieval, analysis and display of spatial data.
1 Geographic Information Systems (GIS) Fundamentals for Program Managers.
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Panel: Strategies for CyberGIS Partner Engagement.
DISTRIBUTION OF LIDAR DATA VIA THE INTERNET Michael Hearne and Andrew Meredith Technology Planning and Management Corporation Coastal Remote Sensing Program.
Welcome to Mapping Tom Sellsted – City of Yakima, Washington Vladimir Strinski – Hitech Systems.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment U.S.
CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
Workshop on Census Cartography and Management, Bangkok, Thailand, 15–19 October 2007 Software Options for Operational GIS in Professional Environments.
WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.
U.S. Department of the Interior U.S. Geological Survey Accurate Projection of Small-Scale Raster Datasets 21 st International Cartographic Conference 10.
Cartographic Modeling Language Approach for CyberGIS: A Demonstration with Flux Footprint Modeling Michael E. Hodgson, April Hiscox, Shaowen Wang, Babak.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
Geographic Information System GIS This project is implemented through the CENTRAL EUROPE Programme co-financed by the ERDF GIS Geographic Inf o rmation.
U.S. Department of the Interior U.S. Geological Survey Reprojecting Raster Data of Global Extent Auto-Carto 2005: A Research Symposium March, 2005.
material assembled from the web pages at
CyberGIS in Action CyberGIS in Action Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
National Earth Science Infrastructure Program AuScope Limited Headquarters School of Earth Sciences University of Melbourne Victoria 3010 Tel
Implementing a Geographic Information Science Research Agenda to Address Challenging Issues Michael P. Finn and E. Lynn Usery U.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Migrating your Data.
1 GISolve – TeraGrid GIScience Gateway Shaowen Wang Department of Geography and Grid Research & educatiOn ioWa (GROW) The University of Iowa May.
Dr. Michael P. Bishop Professor and Haynes Chair in Geosciences Department of Geography.
Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.
قسم الجيوماتكس Geomatics Department King AbdulAziz University Faculty of Environmental Design GIS Components GIS Fundamentals GEOM 121 Reda Yaagoubi, Ph.D.
Realizing CyberGIS Vision through Software Integration Anand Padmanabhan, Yan Liu, Shaowen Wang CyberGIS Center for Advanced Digital and Spatial Studies.
Where to find LiDAR: Online Data Resources.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
- 1 - HDF5, HDF-EOS and Geospatial Data Archives HDF and HDF-EOS Workshop VII September 24, 2003.
The Geosciences are a discipline that is strongly data driven, and large data sets are often developed by researchers and government agencies. The complexity.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Software Options for.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM Virtual Geophysics Laboratory (VGL): Scientific workflows Exploiting the Cloud Josh.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
USU, RENCI, BYU, UNC, UVA, CUAHSI, Tufts, Texas, Purdue, Caktus
1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Visualization Tool for Environmental Modeling Wade Spires 1, Michael W. Berry 1, Eric A. Carr 2, Louis J. Gross 2 References Gross, Louis J. ATLSS Home.
U.S. Department of the Interior U.S. Geological Survey Projecting Global Raster Databases July 11, 2002 Joint International Symposium on GEOSPATIAL THEORY,
GEOSPATIAL CYBERINFRASTRUCTURE. WHAT IS CYBERINFRASTRUCTURE(CI)?  A combination of data resources, network protocols, computing platforms, and computational.
Czech Technical University in Prague Faculty of Transportation Sciences Department of Transport Telematics Doc. Ing. Pavel Hrubeš, Ph.D. Geographical Information.
CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Black and White Introduction to Cyberinfrastructure Eric Shook Department of Geography Kent State University.
CYBER-GIS FOR SCIENTIFIC DISCOVERIES. Global Forest Change Hansen, M. C. et al (2013). High-Resolution Global Maps of 21st-Century Forest Cover Change.
Joslynn Lee – Data Science Educator
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Status and Challenges: January 2017
Eric Shook Department of Geography Kent State University
Geographic Information System
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment  Michael.
2009 AAG Annual Meeting Las Vegas, NV March 25th, 2009
Presentation transcript:

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological Survey Michael P. Finn High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014

Collaborators Shaowen Wang, Anand Padmanabhan, Yan Liu – University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel – USGS, Center of Excellence for Geospatial Information Science (CEGIS) Kristina H. Yamamoto – USGS, National Geospatial Technical Operations Center Babak Behzad – UIUC, Department of Computer Science Eric Shook – Kent State University, Department of Geography Qingfeng (Gene) Guan – China University of Geosciences

Where Do We Want to Go? Geospatial Analytics – Spatial Modeling – Geovisualization (GeoViz/ Visual Analytics) For Decision Makers (agencies/ citizens) – Protect natural resources – Empower cultures – Provide for our future

Geospatial Analytics Spatial Modeling/ Geovisualization

Data / Software Geospatial Methods, Technologies, and Applications GIScience and Cyberinfrastructure Geospatial Toolkits Geospatial Analytics (Spatial Modeling / GeoViz) So: -Where have we been? -Where are we now? -Where do we want to go?

Data Analog  Digital “Big” Data Spatial Data (geometric structure) Data: Open? – mostly – Findable, Accessible, Exploitable (standard format) Example: USGS Data holdings – 8 Layers of the National Map – Soon: Hyperspectral cubes and LiDAR point cloud s

Quality Level Horizontal Point Spacing (meters) Vertical Accuracy (centimeters) Description High accuracy and resolution lidar example: lidar data collected in the Pacific Northwest Medium-high accuracy and resolution lidar 31-2<18.5 Medium accuracy and resolution lidar – analogous to USGS specification v. 13 and most data collected to date Early or lower quality lidar and photogrammetric elevations produced from aerotriangulated NAIP imagery Lower accuracy and resolution, primarily from IfSAR The National Map- Elevation: Quality Levels

Big Spatial Data Geographic data of high resolution and covering large areas creates big spatial data Remotely-sensed images – One-meter resolution NAIP images for Dent County, Missouri (1,955 km²) require 800 GB of storage space (more than 4 Pb equivalent for U.S.) – Atlanta footprint of 0.33 m resolution color images is almost 1 Tb of data – Satellite images with finer than one meter resolution – LiDAR data of level 1 (8 pts per square meter), level 2 (2 points per square meter)

Big Spatial Data USGS 3DEP – Level 2 LiDAR for all of U.S. except Alaska which is acquiring level 5 IfSAR – Data volume for point cloud, intensity images, and bare Earth elevation model – 7 to 9 petabytes – Processing and file creation usually doubles to triples the storage requirements Other geospatial data – USGS National Hydrography Dataset based on 1:24,000 scale about 700 GB (equivalent resolution 12 m; accuracy 25 m RMSE) New project to extract hydrography from level 2 lidar – How big will the vector (< 1 m Resolution) dataset be that results?

Software Computer compiled/ scripting languages – Manipulate data Software – Commercial? Open? Modifiable code? Functional? Tools: SAS (SPSS)/ R/ MATLAB, etc., etc….. GIS Software: Esri ArcGIS/ QGIS – and image processing S/W: Imagine/ ENVI – Libraries: GDAL Example software: mapIMG (based on CGTP; open)

Geospatial Methods, Technologies, and Applications Analytical Cartography – Mathematical Cartography – Since roughly the 18 th Century Quantitative Geography – Since 1960s GIS (and image processing S/W) – Since about the 1970s – combining data & software  GIS Packages – Legacy of primarily commercial software Open Source Software – Since roughly 1980s OpenGIS? – early wide-spread but often spotty “open” GIS – Foundation for maturity, expansion, and further openness

Here we are/ where are we going? Open GIS: Technology and Applications (exploitable) Hardware and Operating Systems evolving Data Storage trying to keep pace with Big Data Advanced GeoViz on cusp of exploding HPC  High-Performance Spatial Computing Increasing Spatiotemporal fidelity Cyberinfrastructure

CyberGIS Cyberinfrastructure (eScience) HPC & GIScience A balance/ interaction between theory/ data (Rey, 2013) Collaborative Research Standards (for interoperability)

NSF CyberGIS Project NSF Software Infrastructure for Sustained Innovation Award – USGS/ CEGIS Participation Cyberinfrastructure resources – XSEDE – Blue Waters supercomputer allocation – Open Science Grid Integration – CyberGIS Toolkit – CyberGIS Gateway – GISolve middleware services 14

CyberGIS Software Environment From Liu et al. (2014)

CyberGIS Toolkit Software Components PABM – Parallel Agent-Based Modeling pRasterBlaster – Parallel Map Reprojection Parallel PySAL (Python Spatial Analysis Library) Spatial Text An open and reliable software toolbox for high-end users Hide compute complexity A rigorous software building, testing, packaging, and deployment framework Focused on computational intensity, performance, scalability, and portability in various CI environments Easy to configure and use

Scalable Raster Processing Need for scalable map reprojection in CyberGIS analytics – Spatial analysis and modeling Distance calculation on raster cells requires appropriate projection – Visualization Reprojection for faster visualization on Web Mercator base maps pRasterBlaster integration in CyberGIS Toolkit and Gateway – Software componentization: librasterblaster, pRasterBlaster, MapIMG – Build, test, and documentation – Gateway user interface 17

Performance Profiling Performance profiling is an important tool for developing scalable and efficient high performance applications Performance profiling identified computational bottlenecks in pRasterBlaster Demonstration of one example of the value of profilers for pRasterBlaster in the next slides

A Computational Bottleneck: Symptom 19

A Computational Bottleneck: Symptom 20

A Computational Bottleneck: Cause

A Computational Bottleneck: Analysis Spatial data-dependent performance anomaly – The anomaly is data dependent – Four corners of the raster dataset were processed by processors whose indexes are close to the two ends Exception handling in C++ is costly – Coordinate transformation on nodata area was handled as an exception Solution – Remove C++ exception handling part 22

A Computational Bottleneck: Performance Improvement

A Computational Bottleneck: Summary Symptom – Processors responsible for polar regions spent more time than those processing equatorial region Cause – Corner cells were mapped to invalid input raster cells generating exceptions – C++ exception handling was expensive Solution – Removed C++ exception handling – Corner cells need not to be processed They now contribute less time of computation 24

pRasterBlaster Component View 25 librasterblasterpRasterBlasterMapIMG Cyberinfrastructure Service ProvidersGIS ProgrammersEnd Users via API CyberToolkit

Performance Test: -On an XSEDE supercomputer (Trestles at the San Diego Supercomputing Center) -Using a parallel file system (Luster) and MPI I/O (vs. traditional Network File System (NFS)) -40GB data -Processor cores were increased from 256 to 1024

Obstacles, Issues, Challenges Parallel I/O (particularly raster) is the proverbial long pole in tent Raster decomposes nicely (embarrassingly parallel) File I/O (especially output file re-composition) is a huge bottleneck Lessons learned; one of our prime contributions to the community (to date) : optimized parallel I/O for raster – GeoTIFF (SPTW – Simple Parallel TIFF Writer) led by David Mattli, USGS – HDF5 parallel work by Babak Bahzad, UIUC

Computational Challenges Converting legacy (linear) code to HPC (parallel) environment requires a lot of skilled manpower Scaling to large-scale analysis using HPC resources is difficult Cyberinfrastructure-based computational analysis needs in-depth knowledge and expertise on computational performance profiling and analysis 28

Geospatial Analytics Spatial Modeling/ Geovisualization Solving “Changing World” Problems Smart Decisions Protecting Natural Resources Democratizing Science Empowering cultures Products and Services for society and its citizens Data & Software  Solving (Geospatial) Problems

Geospatial Analytics Spatial Modeling/ Geovisualization

References Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia. Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Batltimore, Maryland. Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: /cpe Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological Survey Questions? High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014