AstroBEAR Finite volume hyperbolic PDE solver Discretizes and solves equations of the form Solves hydrodynamic and MHD equations Written in Fortran, with.

Slides:



Advertisements
Similar presentations
J ACOBI I TERATIVE TECHNIQUE ON M ULTI GPU PLATFORM By Ishtiaq Hossain Venkata Krishna Nimmagadda.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Parallel & Cluster Computing Distributed Cartesian Meshes Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy,
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
An efficient parallel particle tracker For advection-diffusion simulations In heterogeneous porous media Euro-Par 2007 IRISA - Rennes August 2007.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
Reference: Message Passing Fundamentals.
Exploring Communication Options with Adaptive Mesh Refinement Courtenay T. Vaughan, and Richard F. Barrett Sandia National Laboratories SIAM Computational.
Distributed Interactive Ray Tracing for Large Volume Visualization Dave DeMarle Steven Parker Mark Hartner Christiaan Gribble Charles Hansen.
Evaluation and Optimization of a Titanium Adaptive Mesh Refinement Amir Kamil Ben Schwarz Jimmy Su.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.
Waves!. Solving something like this… The Wave Equation (1-D) (n-D)
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Application Performance Analysis on Blue Gene/L Jim Pool, P.I. Maciej Brodowicz, Sharon Brunett, Tom Gottschalk, Dan Meiron, Paul Springer, Thomas Sterling,
AstroBEAR Parallelization Options. Areas With Room For Improvement Ghost Zone Resolution MPI Load-Balancing Re-Gridding Algorithm Upgrading MPI Library.
CS 179: GPU Programming Lecture 20: Cross-system communication.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Parallel Processing LAB NO 1.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
1 CCOS Seasonal Modeling: The Computing Environment S.Tonse, N.J.Brown & R. Harley Lawrence Berkeley National Laboratory University Of California at Berkeley.
Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
1 Data Structures for Scientific Computing Orion Sky Lawlor charm.cs.uiuc.edu 2003/12/17.
Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory.
Jonathan Carroll-Nellenback University of Rochester.
Unified Parallel C at LBNL/UCB UPC AMR Status Report Michael Welcome LBL - FTG.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Multidimensional Diffusive Shock Acceleration in Winds from Massive Stars Paul P. Edmon University of Minnesota Collaborators: Tom Jones (U of M), Andrew.
1 MA/CS471 Lecture 7 Fall 2003 Prof. Tim Warburton
More on Adaptivity in Grids Sathish S. Vadhiyar Source/Credits: Figures from the referenced papers.
Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.
1 1  Capabilities: Building blocks for block-structured AMR codes for solving time-dependent PDE’s Functionality for [1…6]D, mixed-dimension building.
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
12/7/ Performance Optimizations for running NIM on GPUs Jacques Middlecoff NOAA/OAR/ESRL/GSD/AB Mark Govett, Tom Henderson.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
J. Ray, S. Lefantzi and H. Najm Sandia National Labs, Livermore Using The Common Component Architecture to Design Simulation Codes.
CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
1 Séminaire AIM (20/06/06)Romain Teyssier Modélisation numérique multi-échelle des écoulements MHD en astrophysique Romain Teyssier (CEA Saclay) Sébastien.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Strong Scalability Analysis and Performance Evaluation of a SAMR CCA-based Reacting Flow Code Sophia Lefantzi, Jaideep Ray and Sameer Shende SAMR: Structured.
The Swept Rule for Breaking the Latency Barrier in Time-Advancing PDEs FINAL PROJECT MIT FALL 2015 PROJECT SUPERVISOR: PROFESSOR QIQI WANG MAITHAM.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
AstroBEAR Overview Road to Parallelization. Current Limitations.
1 The Interactions of Aerosols, Clouds, and Radiation on the Regional Scale.
Numflux etc... & geometric source terms. Updating a grid.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Granular Flow Simulations
NGS computation services: APIs and Parallel Jobs
CI2 – Inviscid Strong Vortex-Shock Wave Interaction
Lecture 14: Inter-process Communication
Introduction to parallelism and the Message Passing Interface
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

AstroBEAR Finite volume hyperbolic PDE solver Discretizes and solves equations of the form Solves hydrodynamic and MHD equations Written in Fortran, with MPI support libraries

Adaptive Mesh Refinement Method of reducing computation in finite volume calculations Starts with a base resolution and overlays grids of greater refinement where higher resolution is needed. Grids must be properly nested For parallelization purposes, only one parent per grid

AMR Algorithm (Cunningham et al., 2009)‏ AMR(level, dt) { if (level == 0) nsteps = 1; if (level > 0) nsteps = refine_ratio; For n = 1, nsteps { DistributeGrids(level); IF (level < MaxLevel) { CreateRefinedGrids(level + 1); SwapGhostData(level + 1); } Integrate(level, n); If (level < MaxLevel) CALL AMR(level + 1, dt/refine_ratio); } If (level > 1) synchronize_data_to_parent(level); }

Parallel Communications Grids rely on external “ghost cells” to perform calculations. Data from neighboring grids needs to be copied into ghost region. Major source of scaling problems Alternate fixed-grid code (AstroCUB) has different communication method

AstroBEAR Parallel Communication TransferOverlapData() { TransferWorkerToWorkerOverlaps(); TransferMasterToWorkerOverlaps(); TransferWorkerToMasterOverlaps(); } foreach overlap transfer t { If (Worker(t.source)) SignalSendingProcessor(t.source); If (Worker(t.dest)) SignalReceivingProcessor(t.dest); IF (Worker(t.source)) SendLocalOverlapRegion(t.source); IF (Worker(t.dest)) SendLocalOverlapRegion(t.dest); }

AstroCUB Parallel Communication TransferOverlapData(Grid g) { for dim = 1, ndim { foreach boundary along dim { foreach field_type { MPI_ISEND( at ); MPI_IRECV( at ); MPI_WAIT(); }

AstroBEAR/AstroCub Comparison AstroBEAR:  Recalculates overlaps before each synchronization  Each send/receive operation is handled individually  Groups transfers based on source and destination processor (master or worker)‏  10 MPI calls per grid per timestep in 3D hydro runs AstroCUB:  Calculates overlaps once, prior to first synchronization  Send/receive operations handled together  6 MPI calls per processor per timestep in 3D hydro runs

Requirements Physics  Hydro/MHD  Cooling  Cylindrical Source  Self-Gravity  Sink Particles Numerics  MUSCL-Hancock, Runge-Kutta  Strang Splitting  Constrained Transport  Roe, Marquina Flux

Language Options Python:  Pros: Good stack trace, flexibility, resource management  Cons: Requires SciPy, GPU or hybridization for numerics C:  Pros: Speed, no interfaces required  Cons: More memory, pointer management work falls on developer Fortran:  Pros: Fast number-crunching  Cons: Clumsy data structures, more memory and pointer management for developer

Hybridization Not unheard-of in scientific codes  Cactus (Max Planck Institute)‏ We've tried it already (HYPRE)‏ Can benefit from strengths of scripting and compiled languages May result in steeper learning curve for new developers

Parallelization Improvements Transmission caching  Each processor stores its ghost zone transmission details until regrid Message packing  Sending big blocks containing many messages msg 1 msg2msg3

Parallelization Improvements, ctd.  Redundancy in root domains “Stretching” root grids initially to pull in extra data from other grids Reduces the need for refined ghost transmissions core grid stretched grid

Further Options for Improvement Refined grids: Can Berger-Rigoutsos be further simplified/parallelized?

Concerns for New Code Solver Modularity  Code should run on CPU cluster or GPU cluster Scalability  Code must run effectively on more than 64 CPUs