High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2,

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Computing in the Humanities, Arts, and Social Sciences Marshall Scott Poole Director Kevin Franklin Executive Director.
U.S. Department of the Interior U.S. Geological Survey Agency Report, WGISS #22 September 15, 2006 Lyndon R. Oleson U.S. Geological Survey Center for Earth.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Kansas LiDAR Data Acquisition. Established by the GIS Policy Board in 1991 Central repository of GIS databases of statewide/regional importance Designated.
New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
1 US activities and strategy :NSF Ron Perrott. 2 TeraGrid An instrument that delivers high-end IT resources/services –a computational facility – over.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
Connecticut State Data Center at the Map and Geographic Information Center - MAGIC Connecticut State Data Center Data Collaborator for Planning, Analysis,
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
State of CyberGIS State of CyberGIS Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
SAN DIEGO SUPERCOMPUTER CENTER The Integration of 2 Science Gateways: CyberGIS + OpenTopography Choonhan Youn, Nancy Wilkins-Diehr, SDSC Christopher Crosby,
SAN DIEGO SUPERCOMPUTER CENTER Choonhan Youn Viswanath Nandigam, Nancy Wilkins-Diehr, Chaitan Baru San Diego Supercomputer Center, University of California,
A CyberGIS Environment for Near-Real-Time Spatial Analysis of Social Media Data Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory.
Accelerating TauDEM as a Scalable Hydrological Terrain Analysis Service on XSEDE 1 Ye Fan 1, Yan Liu 1, Shaowen Wang 1, David Tarboton 2, Ahmet Yildirim.
GIS at MIT Lisa Sweeney Head, MIT GIS Services
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Panel: Strategies for CyberGIS Partner Engagement.
Best Practices: Integration of OpenTopography DEM data with UIUC Viewshed tool SDSC OT team.
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment U.S.
CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
A High-Throughput Computational Approach to Environmental Health Study Based on CyberGIS Xun Shi 1, Anand Padmanabhan 2, and Shaowen Wang 2 1 Department.
U.S. Department of the Interior U.S. Geological Survey Accurate Projection of Small-Scale Raster Datasets 21 st International Cartographic Conference 10.
Cartographic Modeling Language Approach for CyberGIS: A Demonstration with Flux Footprint Modeling Michael E. Hodgson, April Hiscox, Shaowen Wang, Babak.
Jeremy D. Bartley Kansas Geological Survey An Introduction to an Index of Geospatial Web Services.
U.S. Department of the Interior U.S. Geological Survey Analysis of Resolution and Resampling on GIS Data Values E. Lynn Usery U.S. Geological Survey University.
U.S. Department of the Interior U.S. Geological Survey Reprojecting Raster Data of Global Extent Auto-Carto 2005: A Research Symposium March, 2005.
CyberGIS in Action CyberGIS in Action Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
Implementing a Geographic Information Science Research Agenda to Address Challenging Issues Michael P. Finn and E. Lynn Usery U.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
U.S. Department of the Interior U.S. Geological Survey Access to MODIS Land Data Products Through the Land Processes DAAC John Dwyer and Carolyn Gacke,
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
قسم الجيوماتكس Geomatics Department King AbdulAziz University Faculty of Environmental Design GIS Components GIS Fundamentals GEOM 121 Reda Yaagoubi, Ph.D.
Realizing CyberGIS Vision through Software Integration Anand Padmanabhan, Yan Liu, Shaowen Wang CyberGIS Center for Advanced Digital and Spatial Studies.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
U.S. Department of the Interior U.S. Geological Survey Elements of a Global Model: An Example of Sea Level Rise and Human Populations at Risk E. Lynn Usery.
U.S. Department of the Interior U.S. Geological Survey The National Map Mark L. DeMulder Director, National Geospatial Program.
Managing Enterprise GIS Geodatabases
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
USU, RENCI, BYU, UNC, UVA, CUAHSI, Tufts, Texas, Purdue, Caktus
1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
U.S. Department of the Interior U.S. Geological Survey Automatic Generation of Parameter Inputs and Visualization of Model Outputs for AGNPS using GIS.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
GEOSPATIAL CYBERINFRASTRUCTURE. WHAT IS CYBERINFRASTRUCTURE(CI)?  A combination of data resources, network protocols, computing platforms, and computational.
GeoServer Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Black and White Introduction to Cyberinfrastructure Eric Shook Department of Geography Kent State University.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Shaowen Wang 1, 2, Yan Liu 1, 2, Nancy Wilkins-Diehr 3, Stuart Martin 4,5 1. CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5
Principles of GIS Fundamental database concepts Shaowen Wang
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment  Michael.
What's New in eCognition 9
What's New in eCognition 9
What's New in eCognition 9
Presentation transcript:

High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2, Anand Padmanabhan1,3,5, Eric Shook1,3, Shaowen Wang1,2,3,4,5, and Yanli Zhao1,3 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI) 2 Department of Computer Science 3 Department of Geography and Geographic Information Science 4 Department of Urban and Regional Planning 5 National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign Michael P. Finn and E. Lynn Usery U.S. Geological Survey U.S. Department of the Interior

Outline Introduction NED data access Interfaces and performance issues Computational challenges Data-intensive spatial analysis Experience and solutions CyberGIS Scalable spatial data access and analytics Concluding discussions Overall flow of the presentation: Introduce NED and its broad usage (2 examples: cybergis analytical environment; “great flood” movie) NED data access is different from simple file sharing/downloading; therefore indicates the development of highly usable download client tools and different programming pattern in integrating the downloading step in application logic Now that data is downloaded, using big data in spatial analysis can be computationally prohibitive: memory, I/O, CPU time. High-performance spatial analysis can reduce CPU time, then the bottleneck can be I/O: 1) there might be two many intermediate input/output steps during an analysis; 2) a single I/O step on big data can slow down the whole analysis. Solutions: 1) reduce intermediate I/O steps through the integration of geospatial data processing libraries and analysis methods; 2) use parallel computing to reduce single I/O time

National Elevation Dataset (NED) Digital elevation models (DEM) Product of the USGS National Map Resolutions: 3-meter, 10-meter, 30-meter Formats: ArcGrid, GridFloat, IMG Organized as 1 degree x 1 degree tiles Sizes (U.S. continent) 10-meter: 936 tiles; 440GB raw files; 1TB with pyramid tiles http://nationalmap.gov/elevation.html

NED Access Challenges Data integration and processing User interface Data are stored on multiple file/database servers Data processing is needed to extract subsets of data from the data collection Downloading becomes complex, involving processing operations such as location, extraction, aggregation, archiving, and transfer among data servers Computationally intensive User interface Usability is crucial to make big data usable Programmable interface for automatic downloading

CyberGIS Analytics Based on NED CyberGIS: high-performance and collaborative GIS based on cyberinfrastructure http://cybergis.org Viewshed analysis http://sandbox.cigi.illinois.edu Web Mapping Service for online visualization NED WMS layer built using GeoServer Pre-generated pyramid tiles for 20-level zooming CyberGIS Gateway

The Great Flood Project A 75-minute multimedia work of original music and film inspired by the 1927 Mississippi River floods http://www.ncsa.illinois.edu/News/Stories/ELLNORAflood/ Contributors include Bill Frisell, Grammy Award-winning guitarist and composer Bill Morrison, Obie-winning experimental filmmaker Illinois Emerging Digital Research and Education in Arts Media Institute (eDream) Advanced Visualization Laboratory (AVL) at the National Center for Supercomputing Applications (NCSA) CyberInfrastructure and Geospatial Information Laboratory (CIGI), University of Illinois at Urbana-Champaign Used NED Approximately 70GB 10-meter NED tiles covering the Mississippi river valley were used for creating the 3D landscape animation

Open YouTube URL http://www.youtube.com/watch?v=Lgy7mDJ_fVI Relevant parts: 0:00 – 0:24, historical maps; 0:25 – 1:16, 3D digital map animation based on 1/3 arc sec NED

NED Data Access

NED Download: User Interface Download tool web interface http://cumulus.cr.usgs.gov/webappcontent/neddownloadtool/NEDDownloadToolDMS.html New interface National Map Viewer: http://viewer.nationalmap.gov/viewer/ This slide and next one show the inconvenience of NED downloading tools. Reason: these tools were developed primarily based on how data were produced and hosted; less on how data should be used by users.

NED Downloading Process 1. Queue a request 2. Launch data extractor Click each URL 3. Extract data 4. Archive data files File list 5. Notify data readiness 6. User download Please repeat 936 times to get all 1 degree x 1 degree tiles for U.S. continent!

NED Downloading Web Service Interface Start download Check status Download This slide illustrates the trend of programming big data downloading: there will be no request-response call in just one round (takes too long, blocking main program); indicating the use of asynchronous programming model to overlap downloading with processing and need to synchronize them. Cleanup

NED Downloader Goal Software Status Provide an easy-to-use NED downloading utility by supporting batch downloads and managing downloading status transition automatically Software Linux-based Bash + PHP Open source (MIT license) Hosted on CyberGIS SVN http://svn.cybergis.org/pub/ned-downloader/ Status Used by the National Science Foundation CyberGIS project team for NED data integration and the Great Flood project Facts: We used this downloader to keep a copy of 1/3 arcsec NED dataset; converted it to geotiff format; created pyramid tiles for 20-level zooming; and published it as WMS; We used this downloader to download 70GB 1/3 arcsec NED dataset files for the Mississippi river valley area. They were used for making the “Great Flood” movie by the Advanced Visualization Laboratory @ NCSA.

Computational Challenges in Related CyberGIS Analytics

Why CyberGIS? Most of commonly used GIS software is based on sequential computing Not scalable for big data analytics Many runtime Input/output (I/O) steps in an analysis workflow Transfer of big data to / from cyberinfrastructure resources

Viewshed Analysis Input DEM High-performance viewshed computation HTTP downloading Data processing using GDAL commands High-performance viewshed computation Exploiting Graphic Processing Units (GPU) Output transfer GridFTP – a parallel file transfer protocol Computational bottlenecks The test viewshed analysis (see figure) handled 3.9GB raster data in total 1.8GB input NED; 436MB output; 1.67GB runtime output Execution time: 4 minutes 55 seconds Input data transfer – 21 seconds; input data processing - 114 seconds; Computing - 65 seconds; output data processing - 88 seconds; output transfer – 7 seconds Input/output data processing took 68.4% of analysis time

Resolving Computational Bottlenecks Reduce the number of runtime I/O steps Employ high-performance I/O techniques CPU GPU … Input Processing Analysis Output Processing Input Data Storage Input Files Transfer Output Data Storage Transfer Input Output Output Files Transfer Input Output Transfer Transfer Input Output Transfer

Experience and Solutions

CyberGIS Approach Tightly couple geospatial data processing libraries to eliminate unnecessary I/O operations Exploit parallel I/O for geospatial data processing Integrate high-performance data transfer capability in CyberGIS analytics

Integrated CyberGIS Architecture CyberGIS Software Environment Applications Scalable Analytical Libraries Scalable Data Libraries Spatial Middleware Dependent Libraries Geospatial Parallel Computing GRASS NetCDF OpenMP CUDA GDAL HDF5 MPI CyberGIS computational resources Parallel File Systems Processors Memory Network

Highlights Analytical libraries Data libraries Spatial middleware pRasterBlaster (a high-performance map reprojection library under joint development by CEGIS and CIGI) Data libraries Parallel Geospatial I/O library (pGIO) with NetCDF/HDF5 support is to be released soon GDAL+MPI IO for parallel I/O of GeoTIFF format is under development Spatial middleware GridFTP transfer between CyberGIS data source sites and XSEDE sites CEGIS <-> supercomputer centers (NCSA, SDSC, TACC) CyberGIS computational resources CEGIS high-performance computers CIGI cloud infrastructure Key national cyberinfrastructure environments NSF XSEDE (http://xsede.org) Open Science Grid (http://opensciencegrid.org)

Parallel I/O Strategies Row-wise I/O Column-wise I/O Block-wise I/O … P0 P1 P2 Pn P0 P1 P0 . . . P1 P2 P2 … Pn Storage Device Storage Device Storage Device Pn

High-Performance Data Transfer CEGIS White lines: high-speed network connections among supercomputer centers Blue lines: parallel data transfer connections between CEGIS and accessible supercomputer centers Background image source: https://www.xsede.org/documents/10157/169907/xsedenet.pdf

Data Transfer Service between USGS and XSEDE Technology GridFTP, a secure and high-performance data transfer protocol Data transfer service setup USGS GridFTP server: usgs-ybother.srv.mst.edu Globus Toolkit 5 Data transfer capability Parallel data channels for large dataset transfer Data transfer is initiated in the CyberGIS Gateway as a third- party transfer Transfer rate: up to 100MB/second XSEDE

Concluding Discussions Usability of NED can be significantly improved if the data access interface can be made more friendly Big data require cyberinfrastructure and significant computational power for scalable data access and analytics CyberGIS has emerged as a new-generation GIS for resolving these challenges and represent significant opportunities for the National Map communities

References Canters, F. (2002). Small-Scale Map Projection Design. London: Taylor & Francis. Finn, Michael P., and David M. Mattli (2012). User’s Guide for the mapIMG 3: Map Image Reprojection Software Package. U. S. Geological Survey Open-File Report 2011-1306, 12 p.. Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67. Wang, S., Anselin, L., Bhaduri, B., Crosby, C., Goodchild, M. F., Liu, Y., and Nyerges, T. L (2013). CyberGIS Software: A Synthetic Review and Integration Roadmap. International Journal of Geographical Information Science, DOI:10.1080/13658816.2013.776049 Wang, S., and Liu, Y. (2009) TeraGrid GIScience Gateway: Bridging Cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23 (5): 631–656. Zhao, Y., Padmanabhan, A., and Wang, S. (2013) A Parallel Computing Approach to Viewshed Analysis of Large Terrain Data Using Graphics Processing Units. International Journal of Geographical Information Science, 27 (2): 363-384.

DISCLAIMER & ACKNOWLEDGEMENT DISCLAIMER: Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. Government ACKNOWLEDGEMENT: This work is supported in part by the National Science Foundation (NSF) under Grant Numbers: BCS-0846655 and OCI-1047916. Computational experiments used the NSF Extreme Science and Engineering Discovery Environment (XSEDE) (Award Number SES090019), which is supported by NSF under Grant Number OCI-1053575

Contact: usery@usgs.gov or shaowen@illinois.edu High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Comments / Questions? Contact: usery@usgs.gov or shaowen@illinois.edu University of Illinois at Urbana-Champaign CyberInfrastructure and Geospatial Information Laboratory Department of Computer Science Department of Geography and Geographic Information Science Department of Urban and Regional Planning National Center for Supercomputing Applications U.S. Department of the Interior U.S. Geological Survey