1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.

Slides:



Advertisements
Similar presentations
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Advertisements

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Rapid Raster Projection Transformation and Web Service Using High-performance Computing Technology 2009 AAG Annual Meeting Las Vegas, NV March 25 th, 2009.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
A PARALLEL FORMULATION OF THE SPATIAL AUTO-REGRESSION MODEL FOR MINING LARGE GEO-SPATIAL DATASETS HPDM 2004 Workshop at SIAM Data Mining Conference Barış.
A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Performance Engineering and Debugging HPC Applications David Skinner
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment U.S.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
U.S. Department of the Interior U.S. Geological Survey Accurate Projection of Small-Scale Raster Datasets 21 st International Cartographic Conference 10.
Cartographic Modeling Language Approach for CyberGIS: A Demonstration with Flux Footprint Modeling Michael E. Hodgson, April Hiscox, Shaowen Wang, Babak.
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
U.S. Department of the Interior U.S. Geological Survey Reprojecting Raster Data of Global Extent Auto-Carto 2005: A Research Symposium March, 2005.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
CyberGIS in Action CyberGIS in Action Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Efficient Data Accesses for Parallel Sequence Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Praveen Chandramohan (ORNL) Al Geist (ORNL) Nagiza Samatova.
EFFECTIVE LOAD-BALANCING VIA MIGRATION AND REPLICATION IN SPATIAL GRIDS ANIRBAN MONDAL KAZUO GODA MASARU KITSUREGAWA INSTITUTE OF INDUSTRIAL SCIENCE UNIVERSITY.
1/30/2003 BARC1 Profile-Guided I/O Partitioning Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University {yiwang,
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Parallel Computing Presented by Justin Reschke
Introduction to HPC Debugging with Allinea DDT Nick Forrington
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
for the Offline and Computing groups
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Density-based Hybrid Clustering
Parallel Algorithm Design
Performance Evaluation of Adaptive MPI
On Spatial Joins in MapReduce
Compiler Back End Panel
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment  Michael.
Compiler Back End Panel
2009 AAG Annual Meeting Las Vegas, NV March 25th, 2009
Department of Computer Science, University of Tennessee, Knoxville
Presentation transcript:

1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI) 2 Department of Geography and Geographic Information Science 3 Department of Computer Science 4 National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign 5 5 Center of Excellence for Geospatial Information Science U.S. Geological Survey (USGS)AutoCarto’12 A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data

Outline Overview Overview –Map re-projection –pRasterBlaster: HPC Solution to Map Re- Projection Performance Profiling Performance Profiling –pRasterBlaster Computational and Scaling Bottlenecks Conclusion Conclusion 2

Introduction Map re-projection Map re-projection –A important cartographic operation Desktop application: mapIMG Desktop application: mapIMG –Challenges exist when scaling for coarse-scale spatial dataset –Re-projecting a 1GB raster dataset can take minutes Parallel computing techniques will help scaling to large datasets Parallel computing techniques will help scaling to large datasets –Raster was born to be parallelized

Parallelizing Map Re-Projection  Map re-projection on large dataset is too slow or even impossible on desktop machines pRasterBlaster pRasterBlaster –mapIMG in HPC (High-Performance Computing) environment –Early Days Row-wise decomposition Row-wise decomposition I/O occurred directly in program inner loop I/O occurred directly in program inner loop –Rigorous geometry handling and novel resampling Resampling options for categorical data and population counts (also standard continuous data resampling methods) Resampling options for categorical data and population counts (also standard continuous data resampling methods) –Able to project/re-project large maps in short amount of time

pRasterBlaster Fast and accurate raster re-projection in three (primary) steps Fast and accurate raster re-projection in three (primary) steps Step 1: Calculate and partition output space Step 1: Calculate and partition output space Step 2: Read input and re-project Step 2: Read input and re-project Step 3: Combine temporary files Step 3: Combine temporary files

Performance Profiling: Motivation and Objectives Exploit performance profiling tools to make pRasterBlaster more scalable and efficient Exploit performance profiling tools to make pRasterBlaster more scalable and efficient –Early version was not scalable to large number of processors –Resolve computational bottlenecks to allow pRasterBlaster leverage thousands of processors Demonstrate techniques of using performance profilers Demonstrate techniques of using performance profilers –Potentially useful many GIS applications

What is performance profiling? A form of dynamic program analysis A form of dynamic program analysis Measures Measures –memory footprint of program –time complexity of program –usage of particular instructions –frequency and duration of function calls Aids program optimization Aids program optimization 7

How do profilers work? Statistical profilers Statistical profilers –Operate by sampling –Probes the program at regular intervals –Pros: Low overhead –Cons: Typically less numerically accurate and specific 8

How do profilers work? Instrumenting profilers Instrumenting profilers –Instrument target programs with additional instructions to collect required information –Pros: Much more accurate than statistical profilers –Cons: Potentially slow the program (since new instructions are added) Different kinds of instrumenting profilers Different kinds of instrumenting profilers –Manual instrumenting Done by the programmers Done by the programmers –Automatic profilers Software instruments automatically Software instruments automatically TAU and IPM used in this research. TAU and IPM used in this research. 9

Manual Instrumenting The traditional way of instrumenting C code is with the time system call, provided by the time.h library. Here is a code fragment that demonstrates its use: #include #include int main(void) { time_t start, finish;...time(&start); /* section to be timed */ time(&finish); printf("Elapsed time: %d\n", finish - start);......} 10

Manual Instrumenting in Parallel Programs Instrument the portion of the program running on individual processors Instrument the portion of the program running on individual processors #include #include int main(void) { time_t start, finish;...time(&start); /* section to be timed */ time(&finish); printf("Elapsed time on Process %d: %d\n", my_rank, finish - start);......} 11

IPM (Integrated Performance Monitoring) IPM is a portable profiling infrastructure for MPI programs – –Provides a low-overhead performance profile of the performance aspects and resource utilization of the parallel program – –Communication, computation, and IO are the primary focus – – We initially profiled pRasterBlaster with IPM to understand how communication, computation and IO usage breakdown for this application 12

TAU TAU (Tuning and Analysis Utilities) TAU performance system is a portable profiling and tracing toolkit – –Analysis of parallel programs written in Fortran, C, C++, Java, Python – – TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and state IPM is designed to profile MPI applications, while TAU is used to profile any kind of parallel applications 13

TAU for pRasterBlaster 14

TAU for pRasterBlaster 15

Computational Bottleneck I: Symptom

Cause: Workload Distribution Issue N rows on P processor cores When P is smallWhen P is big

Solution: Load Balancing 20 N rows on P processor cores When P is smallWhen P is big

Computational Bottleneck I: Summary Symptom Symptom –Load imbalance –Detected by TAU first –Verified by manual instrumenting Cause Cause –Workload distribution algorithm problem (not obvious on small platforms) Solution Solution –Revised algorithm for distributing workload 21

Computational Bottleneck II: Symptom 22

Computational Bottleneck II: Symptom 23

Computational Bottleneck II: Cause

Computational Bottleneck II: Analysis Spatial data-dependent performance anomaly Spatial data-dependent performance anomaly –The anomaly is data dependent –Four corners of the raster were processed by processors whose indexes are close to the two ends Exception handling in C++ is costly Exception handling in C++ is costly –Coordinate transformation on nodata area was handled as an exception Solution Solution –Remove C++ exception handling part 25

Computational Bottleneck II: Performance Improvement

Computational Bottleneck II: Summary Symptom Symptom –Processors responsible for polar regions spent more time than those processing equatorial region Cause Cause –Corner cells were mapped to invalid input raster cells generating exceptions –C++ exception handling was expensive Solution Solution –Removed C++ exception handling – Corner cells need not to be processed They now contribute less time of computation They now contribute less time of computation 27

Conclusions Performance profiling identified computational bottlenecks in pRasterBlaster Performance profiling identified computational bottlenecks in pRasterBlaster We demonstrated the value of profilers for pRasterBlaster – –The techniques is likely valuable for other GIS application Performance profiling is an important tool for developing scalable and efficient high performance applications Performance profiling is an important tool for developing scalable and efficient high performance applications

Future Work Identify and resolve remaining performance issues in pRasterBlaster Identify and resolve remaining performance issues in pRasterBlaster –Recently identified I/O is the next major road-block 29