Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Particle acceleration in a turbulent electric field produced by 3D reconnection Marco Onofri University of Thessaloniki.
Introduction to MPI Programming (Part III)‏ Michael Griffiths, Deniz Savas & Alan Real January 2006.
“Physics at the End of the Galactic Cosmic-Ray Spectrum” Aspen, CO 4/28/05 Diffusive Shock Acceleration of High-Energy Cosmic Rays The origin of the very-highest-energy.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
Adnan Khan Lahore University of Management Sciences Peter Kramer Rensselaer Polytechnic Institute.
Emerging Flux Simulations Bob Stein A.Lagerfjard Å. Nordlund D. Benson D. Georgobiani 1.
Adnan Khan Department of Mathematics Lahore University of Management Sciences.
Åke Nordlund & Anders Lagerfjärd Niels Bohr Institute, Copenhagen Bob Stein Dept. of Physics & Astronomy, MSU, East Lansing.
“Assimilating” Solar Data into MHD Models of the Solar Atmosphere W.P. Abbett SSL UC Berkeley HMI Team Meeting, Jan 2005.
Brookhaven Science Associates U.S. Department of Energy Muon Collider/Neutrino Factory Collaboration Meeting May 26 – 28, CERN, Geneva Target Simulations.
Acknowledgments: Thanks to Professor Nicholas Brummell from UC Santa Cruz for his help on FFTs after class, and also thanks to Professor James Demmel from.
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
SSL (UC Berkeley): Prospective Codes to Transfer to the CCMC Developers: W.P. Abbett, D.J. Bercik, G.H. Fisher, B.T. Welsch, and Y. Fan (HAO/NCAR)
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Physics of fusion power Lecture 10 : Running a discharge / diagnostics.
1/18 Buoyant Acceleration: from Coronal Mass Ejections to Plasmoid P. Wu, N.A. Schwadron, G. Siscoe, P. Riley, C. Goodrich Acknowledgement: W.J.Hughes,
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Modeling Emerging Magnetic Flux W.P. Abbett, G.H. Fisher & Y. Fan.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Iterative and direct linear solvers in fully implicit magnetic reconnection simulations with inexact Newton methods Xuefei (Rebecca) Yuan 1, Xiaoye S.
University of Veszprém Department of Image Processing and Neurocomputing Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs.
Numerical simulations are used to explore the interaction between solar coronal mass ejections (CMEs) and the structured, ambient global solar wind flow.
ASCI/Alliances Center for Astrophysical Thermonuclear Flashes FLASH MHD Timur Linde FLASH MHD Timur Linde This work was supported by the ASCI Flash Center.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.
Neutrino Factory / Muon Collider Target Meeting Numerical Simulations for Jet-Proton Interaction Wurigen Bo, Roman Samulyak Department of Applied Mathematics.
Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department.
Sensitivity Analysis of Mesoscale Forecasts from Large Ensembles of Randomly and Non-Randomly Perturbed Model Runs William Martin November 10, 2005.
Convective Heat Transfer in Porous Media filled with Compressible Fluid subjected to Magnetic Field Watit Pakdee* and Bawonsak Yuwaganit Center R & D on.
Three-Dimensional MHD Simulation of Astrophysical Jet by CIP-MOCCT Method Hiromitsu Kigure (Kyoto U.), Kazunari Shibata (Kyoto U.), Seiichi Kato (Osaka.
Parallel Solution of the Poisson Problem Using MPI
M. Onofri, F. Malara, P. Veltri Compressible magnetohydrodynamics simulations of the RFP with anisotropic thermal conductivity Dipartimento di Fisica,
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Graduate Institute of Astrophysics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics Chia-Yu Hu OSU Radio Simulation Workshop.
by Arjun Radhakrishnan supervised by Prof. Michael Inggs
Amplification of twists in magnetic flux tubes Youra Taroyan Department of Physics, Aberystwyth University, users.aber.ac.uk/djp12.
Implementing Hypre- AMG in NIMROD via PETSc S. Vadlamani- Tech X S. Kruger- Tech X T. Manteuffel- CU APPM S. McCormick- CU APPM Funding: DE-FG02-07ER84730.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Photospheric MHD simulation of solar pores Robert Cameron Alexander Vögler Vasily Zakharov Manfred Schüssler Max-Planck-Institut für Sonnensystemforschung.
High performance computing for Darcy compositional single phase fluid flow simulations L.Agélas, I.Faille, S.Wolf, S.Réquena Institut Français du Pétrole.
Acoustic wave propagation in the solar subphotosphere S. Shelyag, R. Erdélyi, M.J. Thompson Solar Physics and upper Atmosphere Research Group, Department.
Emerging Flux Simulations & semi-Sunspots Bob Stein A.Lagerfjärd Å. Nordlund D. Georgobiani 1.
On Optimizing Collective Communication UT/Texas Advanced Computing Center UT/Computer Science Avi Purkayastha Ernie Chan, Marcel Heinrich Robert van de.
Simple Radiative Transfer in Decomposed Domains Tobi Heinemann Åke Nordlund Axel Brandenburg Wolfgang Dobler.
edit type on title master Fortran ISV Release I to L LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 2 Hsin-Ying Lin
Targetry Simulation with Front Tracking And Embedded Boundary Method Jian Du SUNY at Stony Brook Neutrino Factory and Muon Collider Collaboration UCLA.
November, 2008 Bermuda ITW Numerical Simulation of Infrasound Propagation, including Wind, Attenuation, Gravity and Non-linearity Catherine de Groot-Hedlin.
V.M. Sliusar, V.I. Zhdanov Astronomical Observatory, Taras Shevchenko National University of Kyiv Observatorna str., 3, Kiev Ukraine
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Ergodic heat transport analysis in non-aligned coordinate systems S. Günter, K. Lackner, Q. Yu IPP Garching Problems with non-aligned coordinates? Description.
Center for Extended MHD Modeling (PI: S. Jardin, PPPL) –Two extensively developed fully 3-D nonlinear MHD codes, NIMROD and M3D formed the basis for further.
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Unstructured Meshing Tools for Fusion Plasma Simulations
Simulations and radiative diagnostics of turbulence and wave phenomena in the magnetised solar photosphere S. Shelyag Astrophysics Research Centre Queen’s.
Numerical Simulations of Solar Magneto-Convection
High Altitude Low Opening?
Convergence in Computational Science
Multi-fluid modeling of ion-neutral interactions in the solar chromosphere with ionization and recombination effects.
Objective Numerical methods Finite volume.
Non linear evolution of 3D magnetic reconnection in slab geometry
Generation of Alfven Waves by Magnetic Reconnection
Presentation transcript:

Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG

The Sheffield Advanced Code The Sheffield Advanced Code (SAC) is a novel fully non-linear MHD code based on the Versatile Advection Code (VAC)  designed for simulations of linear and non-linear wave propagation  with gravitationally strongly stratified magnetised plasma.  Shelyag, S.; Fedun, V.; Erdélyi, R. Astronomy and Astrophysics, Volume 486, Issue 2, 2008, pp Shelyag, S.Fedun, V.Erdélyi, R.

Full Perturbed MHD Equations for Stratified media

Numerical Diffusion Central differencing can generate numerical instabilities Difficult to find solutions for shocked systems We define a hyperviscosity parameter which is the ratio of the forward difference of a parameter to third order and first order Tracking evolution of the hyperviscosity we can identify numerical noise and apply smoothing where necessary

Why MHD Using GPU’s? F(i-1,j+1)F(I,j+1)F(i+1,j+1) F(i-1,j)F(i,j)F(i+1,j) F(i-1,j-1)F(i,j-1)F(i+1,j-1) Excellent scaling with GPU’s but, Central differencing requires numerical stabilisation Stabilisation with GPU’s trickier, requires Reduction/maximum routine An additional and larger mesh Consider a simplified 2d problem Solving flux equation Derivative using central diffrencing Time step using Runge Kutta

Halo Messaging Each proc has a “ghost” layer – Used in calculation of update – Obtained from neighbouring left and right processors – Pass top and bottom layers to neighbouring processors Become neighbours ghost layers Distribute rows over processors N/nproc rows per proc – Every processor stores all N columns SMAUG-MPI implements messaging using a 2D halo model for 2D and 3D halo model for 3D Consider a 2d model – for simplicity distribute layers over a line of processes

Processor 1 Processor 2 Processor 3 Processor 4 N+1 1 N p2 min p3 max p2 min p1 min p2 max p1 min Send top layer Send bottom layer Receive top layer Receive bottom layer

MPI Implementation Based on halo messaging technique employed in SAC code void exchange_halo(vector v) { gather halo data from v into gpu_buffer1 cudaMemcpy(host_buffer1, gpu_buffer1,...); MPI_Isend(host_buffer1,...,destination,...); MPI_Irecv(host_buffer2,...,source,...); MPI_Waitall(...); cudaMemcpy(gpu_buffer2,host_buffer2,...); scatter halo data from gpu_buffer2 to halo regions in v }

Halo Messaging with GPU Direct void exchange_halo(vector v) { gather halo data from v into gpu_buffer1 MPI_Isend(gpu_buffer1,...,destination...); MPI_IRecv(gpu_buffer2,...,source...) MPI_Waitall(...); scatter halo data from gpu_buffer2 to halo regions in v } Simpler faster call structure

Progress with MPI Implementation Successfully running two dimensional models under GPU direct – Wilkes GPU cluster at The University of Cambridge – N8 - GPU Facility, Iceberg 2D MPI version is verified Currently optimising communications performance under GPU direct 3D MPI implementation is already implemented still requires testing

Orszag-Tang Test 200x200 Model at t=0.1, t=0.26, t=0.42 and t=0.58s

A Model of Wave Propagation in the Magnetised Solar Atmmosphere The model features a Flux Tube with Torsional Driver, with a fully stratified quiet solar atmosphere based on VALIIIC Grid size is 128x128x128, representing a box in the solar atmosphere of dimensions 1.5x2x2Mm Flux tube has a magnetic field strength of 1000G Driver Amplitude 200km/s

Timing for Orszag-Tang Using SAC/SMAUG with Different Architetures

Performance Results (Hyperdiffusion disabled) Grid Size (number of GPUs in brackets) With GPU direct ( time in s) Without GPU direct (time in s) 1000x1000(1) x1000(2x2) x1000(4x4) x2044(2x2) x2044(4x4) x4000(4x4) x8000(8x8) x8000(10x10) Timings in seconds for 100 iterations (Orszag-Tang test)

Performance Results (With Hyperdiffusion enabled) Grid Size (number of GPUs in brackets) Without GPU direct (time in s) 2044x2044(2x2) x2044(4x4) x4000(4x4) x8000(8x8) x8000(10x10)163.6 Timings in seconds for 100 iterations (Orszag-Tang test)

Conclusions We have demonstrated that we can successfully compute large problems by distributing across multiple GPUs For 2D problems the performance using messaging with and without GPUdirect is similar. – This is expected to change when 3D models are tested It is likely that much of the communications overhead arises from routines used transfer data within the GPU memory – Performance enhancements possible through application architecture modification Further work needed with larger models for comparisons with X86 implementation using MPI The algorithm has been implemented in 3D testing of 3D models will be undertaken over the forthcoming weeks