AstroBEAR Parallelization Options. Areas With Room For Improvement Ghost Zone Resolution MPI Load-Balancing Re-Gridding Algorithm Upgrading MPI Library.

Slides:



Advertisements
Similar presentations
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Advertisements

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
Reference: Message Passing Fundamentals.
AstroBEAR Finite volume hyperbolic PDE solver Discretizes and solves equations of the form Solves hydrodynamic and MHD equations Written in Fortran, with.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Task Farming on HPCx David Henty HPCx Applications Support
The Asynchronous Dynamic Load-Balancing Library Rusty Lusk, Steve Pieper, Ralph Butler, Anthony Chan Mathematics and Computer Science Division Nuclear.
Chapter 3 Memory Management: Virtual Memory
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
CSE 260 – Parallel Processing UCSD Fall 2006 A Performance Characterization of UPC Presented by – Anup Tapadia Fallon Chen.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
Hybrid MPI and OpenMP Parallel Programming
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
04/06/2016Applied Algorithmics - week101 Dynamic vs. Static Networks  Ideally, we would like distributed algorithms to be: dynamic, i.e., be able to.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Data Structures and Algorithms in Parallel Computing Lecture 7.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Exploring Parallelism with Joseph Pantoga Jon Simington.
NGS computation services: APIs and.
Concurrency and Performance Based on slides by Henri Casanova.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
A Parallel Communication Infrastructure for STAPL
Parallel Programming By J. H. Wang May 2, 2017.
Computer Engg, IIT(BHU)
The University of Adelaide, School of Computer Science
NGS computation services: APIs and Parallel Jobs
Net 435: Wireless sensor network (WSN)
Department of Computer Science University of California,Santa Barbara
Parallel Programming in C with MPI and OpenMP
L21: Putting it together: Tree Search (Ch. 6)
EE 193: Parallel Computing
CMSC 611: Advanced Computer Architecture
Chapter 4: Threads.
Threads Chapter 4.
Background and Motivation
SCTP-based Middleware for MPI
Hybrid Programming with OpenMP and MPI
Prof. Leonardo Mostarda University of Camerino
Memory management Explain how memory is managed in a typical modern computer system (virtual memory, paging and segmentation should be described.
Lecture 2 The Art of Concurrency
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

AstroBEAR Parallelization Options

Areas With Room For Improvement Ghost Zone Resolution MPI Load-Balancing Re-Gridding Algorithm Upgrading MPI Library

Ghost Zone Resolution Can exceed 30% of total program execution time. Affects fixed grid as well as AMR For runs using >2 processors, 98-99% of ghost zone execution time is MPI processing.

Ghost Zone Resolution Options Duplex Transmission  Old version swaps ghost zone data serially between two processors.  Duplex transmission would have the two processors handle sending, receiving and copying concurrently. Pros:  Reduces the amount of duplicated overhead.  Makes more efficient use of worker processors. Cons:  Little reduction in the amount of MPI overhead.  Still has a high computation cost relative to the number of nodes. Status: In progress

Alternate option: Ghost Zone broadcast  Use the MPI Broadcast routines to have a grid send all its ghost zones to its neighbors at once, who then process that data and broadcast their own ghost zones when it is their turn. Pros:  Eliminates need for pairwise iteration over level (i.e., transfer would only be done once per grid). Cons:  Potential congestion if all a grid’s neighbors are on the same processor.  No guarantee that it’s an improvement over pairwise duplex transmission. Status: Speculative

Load Balancing Does it need to be done as often?  Ramses code only rebalances every ten frames.  Re-gridding happens locally as usual, but it is assumed that the AMR structure does not change enough between two iterations to warrant a load-rebalance. Pros:  Significant reduction in MPI overhead (BalanceLoads() gets called a lot).  Non-MPI overhead will likely be reduced as well, as the current load-balancing scheme recalculates the load across the entire Forest. Cons:  “patch-based AMR” vs. “tree-based AMR”; can it be adapted to AstroBEAR?  Requires implementation of some Hilbert-space algorithm—how complex/computationally intensive? Status: Speculative

Re-Gridding Parallelization Parallelization of re-gridding is handled using MPI and OpenMP Problem: MPI-1 limits thread usage  Only one thread for the worker processors and two for the master processor.  Only one thread on each processor is MPI-capable.  Performance bottlenecks happen if one processor gets tied up.

MPI with OpenMP, multi-thread MPI with OpenMP, single thread Advantage of Multiple Threads

Unfortunately... LAM MPI is not thread-safe.  You can write multi-threaded applications using LAM MPI, but it is explicitly not thread-safe and so we would be responsible for maintaining MPI exclusion.  In a collaborative development environment like AstroBEAR, this is a bad idea.  LAM is making noise about supporting this eventually, but they're not there yet. Alternatives:  Improve efficiency of pairwise message passing.  Offload more re-gridding computation to worker processors. Status: We're looking at it.