Accelerating TauDEM as a Scalable Hydrological Terrain Analysis Service on XSEDE 1 Ye Fan 1, Yan Liu 1, Shaowen Wang 1, David Tarboton 2, Ahmet Yildirim.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
SAN DIEGO SUPERCOMPUTER CENTER The Integration of 2 Science Gateways: CyberGIS + OpenTopography Choonhan Youn, Nancy Wilkins-Diehr, SDSC Christopher Crosby,
SAN DIEGO SUPERCOMPUTER CENTER Choonhan Youn Viswanath Nandigam, Nancy Wilkins-Diehr, Chaitan Baru San Diego Supercomputer Center, University of California,
A CyberGIS Environment for Near-Real-Time Spatial Analysis of Social Media Data Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory.
Celso Ferreira¹, Francisco Olivera², Dean Djokic³ ¹ PH.D. Student, Civil Engineering, Texas A&M University ( ² Associate.
The GEON LiDAR Workflow: An Internet-Based Tool for the Distribution and Processing of LiDAR Point Cloud Data Christopher J. Crosby, J Ramón Arrowsmith,
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Panel: Strategies for CyberGIS Partner Engagement.
Sharing Geographic Content
Mobile Mapping Systems (MMS) for infrastructural monitoring and mapping are becoming more prevalent as the availability and affordability of solutions.
Best Practices: Integration of OpenTopography DEM data with UIUC Viewshed tool SDSC OT team.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment U.S.
William Lorensen GE Research Niskayuna, NY February 12, 2001 Insight Segmentation and Registration Toolkit.
CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
A High-Throughput Computational Approach to Environmental Health Study Based on CyberGIS Xun Shi 1, Anand Padmanabhan 2, and Shaowen Wang 2 1 Department.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
material assembled from the web pages at
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
2005 Materials Computation Center External Board Meeting The Materials Computation Center Duane D. Johnson and Richard M. Martin (PIs) Funded by NSF DMR.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Enabling Access to High-Resolution LiDAR Topography through Cyberinfrastructure-Based Data Distribution and Processing Christopher J. Crosby, J Ramón Arrowsmith.
Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.
Efrat Frank, Ashraf Memon, Vishu Nandigam, Chaitan Baru
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Creating Watersheds and Stream Networks
Realizing CyberGIS Vision through Software Integration Anand Padmanabhan, Yan Liu, Shaowen Wang CyberGIS Center for Advanced Digital and Spatial Studies.
Where to find LiDAR: Online Data Resources.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
- 1 - HDF5, HDF-EOS and Geospatial Data Archives HDF and HDF-EOS Workshop VII September 24, 2003.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Clearing your Desk! Software and Data Services for Collaborative Web Based GIS Analysis David Tarboton, Ray Idaszak, Jeffery Horsburgh, Dan Ames, Jon Goodall,
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
HDF and HDF-EOS Workshop VII September 24, 2003 HDF5, HDF-EOS and Geospatial Data Archives Don Keefer Illinois State Geological Survey Mike Folk Univ.
Broadening Access to Geospatial Capabilities Carol Song, Larry Biehl, Rosen Center for Advanced Computing Venkatesh Merwade, School of Civil Engineering.
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
USU, RENCI, BYU, UNC, UVA, CUAHSI, Tufts, Texas, Purdue, Caktus
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sergiu M. Dascalu.
The Claromentis Digital Workplace An Introduction
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sohei Okamoto.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Understanding your FLOW-3D simulations better with EnSight June 2012.
CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Shaowen Wang 1, 2, Yan Liu 1, 2, Nancy Wilkins-Diehr 3, Stuart Martin 4,5 1. CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department.
Best Practices for Managing and Serving Lidar and Elevation Data Cody Benkelman.
Hydrologic Terrain Processing Using Parallel Computing
Hydrologic Terrain Processing Using Parallel Computing
Flow field representations for a grid DEM
Parallel Programming By J. H. Wang May 2, 2017.
Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5
EPANET-MATLAB Toolkit An Open-Source Software for Interfacing EPANET with MATLAB™ Demetrios ELIADES, Marios KYRIAKOU, Stelios VRACHIMIS and Marios POLYCARPOU.
Digital Elevation Model Based Watershed and Stream Network Delineation
Digital Elevation Model Based Watershed and Stream Network Delineation
Terrain Analysis Using Digital Elevation Models (TauDEM)
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment  Michael.
Extending ArcGIS using programming
Presentation transcript:

Accelerating TauDEM as a Scalable Hydrological Terrain Analysis Service on XSEDE 1 Ye Fan 1, Yan Liu 1, Shaowen Wang 1, David Tarboton 2, Ahmet Yildirim 2, Nancy Wilkins-Diehr 3 1 University of Illinois at Urbana-Champaign 2 Utah State University 3 San Diego Supercomputer XSEDE’14 Atlanta, GA, July 15, 2014

Outline Introduction o TauDEM software o Parallelism o ECSS work plan Computational Intensity Analysis and Performance Improvement o Strategies o Findings & results TauDEM Gateway Application o Data integration o Workflow construction o XSEDE-enabled execution 2

Scalable DEM-based Hydrological Information Analysis Digital Elevation Models (DEM) o Geospatial topographic data o Raster and vector representation DEM-based Hydrological Information Analysis o Use of topographic information in hydrological analysis and modeling o Examples Derivation of flow directions, contributing area, stream network… Impact of High Resolution DEM Data o High resolution DEM data sources National Elevation Dataset (NED) from the U.S. Geological Survey (USGS) o 10-meter resolution: 330GB raw data o 1-meter resolution: 4-5 PB raw data OpenTopography Lidar-derived DEM data o Improved accuracy and reliability of analysis and modeling results o Revealing insights that were not possible to obtain before 3

Example: USGS NED 4

TauDEM TauDEM - A Parallel Computing Solution to DEM-based Terrain Analysis o Open source software o A suite of DEM tools for the extraction and analysis of hydrologic information from topographic data o A growing user community Parallel Computing in TauDEM o Parallel programming model: Message Passing Interface (MPI) o Spatial data decomposition Each process reads a sub-region for processing MPI communication for exchanging runtime hydrological information Each process writes a sub-region defined by output data decomposition o Parallel input/output (IO) In-house GeoTIFF library (no support for big GeoTIFF) MPI IO for DEM read and write 5

Stream and watershed delineation Multiple flow direction flow field Calculation of flow-based derivative surfaces TauDEM Channel Network and Watershed Delineation Software

Multi-File Input Model Number of processes mpiexec –n 5 pitremove... results in the domain being partitioned into 5 horizontal stripes 5 On input files (red rectangles) data coverage may be arbitrarily positioned and may overlap or not fill domain completely. All files in the folder are taken to comprise the domain. Only limit is that no one file is larger than 4 GB. Maximum GeoTIFF file size: 4 GB = about x rows and columns

Number of processes mpiexec –n 5 pitremove... results in the domain being partitioned into 5 horizontal stripes 5 Multifile option -mf 3 2 results in each stripe being output as a tiling of 3 columns and 2 rows of files 3 columns of files per stripe 2 rows of files per stripe Maximum GeoTIFF file size: 4 GB = about x rows and columns Multi-File Output Model

Computational Challenges Scalability issues o PitRemove step on 2GB DEM 681 seconds on an 8-core PC 3,759 seconds on a 64-core cluster Not acceptable on XSEDE resources Computational challenges o Scaling to large-scale analysis using massive computing resources is difficult o Cyberinfrastructure-based computational analysis needs in-depth knowledge and expertise on computational performance profiling and analysis 9

Computational Scaling Issues 10 Results collected on local cluster with Network File System (NFS) interconnet YellowStone dataset (27814x19320) o Using more processors reduced compute time, but suffered from longer execution time Chesapeake dataset (53248x53248) o Execution could not finish on D8FlowDir operation

CyberGIS-OT-TauDEM Collaboration 11 TauDEM 5.0 TauDEM 5.x Scalability Enhancement (XSEDE ECSS) CyberGISOpenTopography Lidar- derived DEMs OT TauDEM Services CyberGIS- TauDEM App DEMs USGS NED OT User DEMs TauDEM-enabled Research

ECSS Goals Enhance TauDEM for large-scale terrain analysis on massive computing resources provided on national cyberinfrastructure through rigorous computational performance profiling and analysis 12

Collaboration Team National cyberinfrastructure o Extreme Science and Engineering Discovery Environment (XSEDE) o XSEDE Extended Collaborative Support Services (ECSS) provides computational science expertise Ye Fan, Yan Liu, Shaowen Wang, National Center for Supercomputing Applications (NCSA) NSF OpenTopography LiDAR data facility o DEM generation services for LiDAR-derived TauDEM analysis o Integration of TauDEM in OpenTopography service environment o People Chaitan Baru, Nancy Wilkins-Diehr, Choonhan Yeon, San Diego Supercomputer Center (SDSC) NSF CyberGIS project o Integration of TauDEM in CyberGIS Gateway o Integration of TauDEM in advanced CyberGIS analytical services (workflow) o People University of Illinois at Urbana-Champaign (UIUC) o Yan Liu, Anand Padmanabhan, Shaowen Wang San Diego Supercomputer Center (SDSC) o Nancy Wilkins-Diehr, Choonhan Yeon 13

Performance Analysis: Challenges System-level performance variation is very difficult to identify o Computing seemed not the reason for performance slowdown o Network issue or file system issue? NFS is difficult to debug Barrier for performance profiling o Performance profiling tools deployment need system administration skills o Using performance profiling libraries may need code change o Configuring profiling parameters and interpreting profiling results are not trivial 14

Strategies Project management o Code repository TauDEM source code is moved to github to facilitate multi-party development and testing o Documentation Github wiki Google Drive o Meetings Bi-weekly teleconference Build and test o XSEDE resources: for tests using up to 1,024 processors for tests using up to 16,384 processors o Profiling tools Darshan: I/O profiling Performance profiling and analysis o Computational bottleneck analysis Focus on I/O performance o Scalability to processors o Scalability to data size o Performance optimization 15

Generic I/O Profiling Darshan profiling found anomaly on file read operations The finding is confirmed in TauDEM timing data 16

IO Bottlenecks - Input Inefficient File Reading o n processes, m files o Original version: n x m file reads for getting geo-metadata o Fix: 1 x m file reads + MPI_Bcast Coding Issue o File read deadlock situation caused by too many opened file descriptors o File not closed in time o Fix: close a file as soon as read operation is done 17

IO Bottleneck - Output Inefficient MPI IO o Original spatial domain decomposition did not consider IO performance o Improvement: domain decomposition strategy is changed to reduce the number of processes needed by an output file No Collective IO Parallel File System o Use as many OSTs on Lustre file system 18

Scalability Results Scalability Tests o Processors: up to 1,024 o Data sizes: 2GB, 12GB, 36GB DEMs IO No Longer a Bottleneck 19

Results – Resolving I/O Bottlenecks 20 #coresComputeHeader ReadData ReadData Write / / / / / / / / / / / / / / / / 1.6 Table 1. I/O Time Comparison (before / after; in seconds) (Fan et al. 2014)

Results – Execution Time 21 Figure 2. Execution time of the three most costly TauDEM functions on a 36GB DEM dataset. (Fan et al. 2014)

Next Steps More Room to Improve o 41.6 hours using 1024 cores on 36GB DEM Communication Pattern Analysis Methodological Investigation on Algorithm Design 22

CyberGIS-TauDEM Gateway Application Streamlined TauDEM Analysis in CyberGIS Gateway o Web environment o Transparent integration of DEM data sources o Customized TauDEM analysis workflow o Online visualization Status o 2 prototypes in April and May, respectively o Alpha release in early July o Beta release in August 23

24 GISolve Middleware DB Controller Job Data Visualization Data Servers Metadata Servers Mapping Servers External Data Sources CyberGIS Toolkit Data Storage Computing Environment Job Wrappers Data Retrieval Geo Data Processing Execution Setup Parallel Computing Post-processing Geo-visualization CyberGIS Application Integration Framework CyberGIS Gateway Job Panel Data Selection Geo-Input Editing Analysis Input Panels Workflow Mapping Visualiza- tion Sharing

Data Integration Multiple High Resolution DEM Sources o USGS NED (10-meter) Hosted at UIUC Map preview o OpenTopography LiDAR-derived DEMs Web service API Data Retrieval o USGS NED: wget o OT: Dynamic DEM generation and downloading o Data caching XWFS? Data Processing o Study area clipping o Multi-file generation o Reprojection o GDAL library ( o High-performance map reprojection Collaborative work with USGS 25

Analysis Workflow Approach o 26 TauDEM functions o Template-based customization of TauDEM functions Pre-defined dependency Dynamic workflow construction in Gateway Data format: JSON Implementation o Interactive workflow configuration Ext JS + SigmaJS Execution o Runtime command sequence generation On Trestles: command sequence On Stampede: a set of jobs linked based on job dependency 26

Visualization Visualization Computation o Reprojection o Pyramid generation for multiple zoom levels o Coloring (symbology) Online Visualization o Each product is a map layer accessible through the OGC-standard Web Mapping Service (WMS) 27

DEMO 28

Concluding Discussions Multidisciplinary collaboration is a key to the success so far Great potential for further performance improvement Performance profiling and analysis at large scale is critical Guidelines for future software research and development o Explicit computational thinking in software development lifecycle (design, coding, testing) o Performance analysis remains challenging. o Collaboration with computational scientists and conducting performance profiling on cyberinfrastructure are important o Cyberinfrastructure provides a set of abundant and diverse computational platforms for identifying computational bottlenecks and scaling code performance CyberGIS-TauDEM Gateway application significantly lowers the barrier of conducting large-scale TauDEM analyses by community users 29

Acknowledgements XSEDE (NSF Grant No ) This material is based in part upon work supported by NSF under Grant Numbers and TauDEM team work is supported by the US Army Research and Development Center contract No. W912HZ-11-P-0338 and the NSF Grant Number Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation 30

Thanks! 31