A Cell-by-Cell AMR Method for the PPM Hydrodynamics Code

Slides:

Advertisements

Similar presentations

Outline Flows Flow tiles Applications Assessment Conclusion.

Advertisements

Lift Theories Linear Motion.

Fluent Overview Ahmadi/Nazridoust ME 437/537/637.

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.

Dual Mesh Method in Upscaling Pascal Audigane and Martin Blunt Imperial College London SPE Reservoir Simulation Symposium, Houston, 3-5 February 2003.

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.

SSL (UC Berkeley): Prospective Codes to Transfer to the CCMC Developers: W.P. Abbett, D.J. Bercik, G.H. Fisher, B.T. Welsch, and Y. Fan (HAO/NCAR)

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.

Overview Anisotropic diffusion occurs in many different physical systems and applications. In magnetized plasmas, thermal conduction can be much more rapid.

Gravity and Orbits The gravitational force between two objects:

Systems of Linear Equations: Matrices Chapter Overview A matrix is simply a rectangular array of numbers. Matrices are used to organize information.

Simulations of Compressible MHD Turbulence in Molecular Clouds Lucy Liuxuan Zhang, CITA / University of Toronto, Chris Matzner,

Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.

Stirling-type pulse-tube refrigerator for 4 K M. Ali Etaati CASA-Day April 24 th 2008.

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.

Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.

Processing Arrays Lesson 9 McManusCOP Overview One-Dimensional Arrays –Entering Data into an Array –Printing an Array –Accumulating the elements.

1 Zonal Boundary Conditions. 2 Some Basics The flow domain is divided into zones and grids are generated within each zone. The flow equations are solved.

Compressing Bi-Level Images by Block Matching on a Tree Architecture Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY.

Graphics Lecture 14: Slide 1 Interactive Computer Graphics Lecture 14: Radiosity - Computational Issues.

Motivation – Why deal.II?  Adaptive Mesh Refinement (AMR)  Start with solving on coarse grid  Compute error  Refine mesh until error < tolerance 

AstroBEAR Overview Road to Parallelization. Current Limitations.

 A triple-beam balance determines the mass of an object in grams.  How to use a triple-beam balance: › Make sure the balance is on a level surface and.

Numflux etc... & geometric source terms. Updating a grid.

Computer Graphics Lecture 14 CLIPPING I Taqdees A. Siddiqi

Chapter 2 Memory and process management

EGR 2261 Unit 10 Two-dimensional Arrays

Dual Mesh Method in Dynamic Upscaling

Boyce/DiPrima 9th ed, Ch 10.8 Appendix A: Derivation of the Heat Conduction Equation Elementary Differential Equations and Boundary Value Problems, 9th.

Loops BIS1523 – Lecture 10.

Shear in Straight Members Shear Formula Shear Stresses in Beams

Lecture 12 The Importance of Accurate Solar Wind Measurements

(c) 2002 University of Wisconsin, CS559

Problem Solving – a Math Review

Le-Thuy Tran and Martin Berzins

Spatial Indexing I Point Access Methods.

“Consolidation of the Surface-to-Atmosphere Transfer-scheme: ConSAT

UZAKTAN ALGIILAMA UYGULAMALARI Segmentasyon Algoritmaları

Warm-up: Do you remember how to do long division? Try this: (without a calculator!)

Helioseismic data from Emerging Flux & proto Active Region Simulations

Tutorial for using Case It for bioinformatics analyses

Intro to PHP & Variables

Main Memory Management

Fluent Overview Ahmadi/Nazridoust ME 437/537/637.

While Loops BIS1523 – Lecture 12.

Eurocode 1: Actions on structures –

Lecture 22: Parallel Algorithms

Optimizing Malloc and Free

Binary Search Back in the days when phone numbers weren’t stored in cell phones, you might have actually had to look them up in a phonebook. How did you.

Using Surfcam to Produce a Numeric Control (NC) Program

Resistors and Ohm’s Law

Unit# 9: Computer Program Development

Progressive Transmission and Rendering of Foveated Volume Data

Lecture: DRAM Main Memory

Sound waves... light waves... water waves....

I. The Problem of Molding

Virtual-Time Round-Robin: An O(1) Proportional Share Scheduler

A Robust Data Structure

File Storage and Indexing

Hippocampal “Time Cells”: Time versus Path Integration

Data Structures & Algorithms

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.

Introduction to High Performance Computing Lecture 16

Overlay of Two Subdivisions

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,

6- General Purpose GPU Programming

OFP Filters in the Denoising of the Significance Map

Presentation transcript:

A Cell-by-Cell AMR Method for the PPM Hydrodynamics Code Dennis C. Dinge dennis@lcse.umn.edu University of Minnesota http://www.lcse.umn.edu/~dennis/

Overview 1) The AMR scheme: What we do that’s different from most. Where we refine. How the boundaries of refined regions are done. The ordering of sweeps at different refinement levels 2) Parallelization Method Overview: How the problem is broken up and parallelized. Storage on all scales. 3) Some Results and Concluding Remarks

Where we refine. 1) Walls, Shocks, and Contact Discontinuities. Refined regions are fronts of dimension one less than the problem dimension. We exploit AMR’s ability to capture and follow these fronts. Only two levels of refinement are currently done. 2) We don’t use AMR to refine entire regions of the problem for which standard techniques of non-uniform grids and simple grid motion are adequate. 3) Decision on refinement is made on a cell by cell basis. 4) Cells which were marked for refinement by a previous transverse sweep are refined in the current sweep whether the current sweep thinks they should be refined or not. A front must be undetected in all directional sweeps to be dropped from refinement.

How the boundaries of refined regions are done. 1) Since the refined regions in this scheme are so thin it is highly desirable to reduce the number of ghost zones at the ends of the refined rows. 2) The number of ghost zones is reduced by retaining information from the coarser calculations. The PPM parabolic coefficients for pressure, velocity and density are retained from the coarser sweep. These are used to make their finer grid counterparts along with left and right edge values for density velocity, and pressure in the grid end zones. 3) The fluxes for mass, momentum and energy are also retained and used to recalculate values for coarser cells at the ends of refined regions. This recalculation is necessary because the fluxes entering the end cells from the refined region will in general differ from their coarser counterparts.

The ordering of sweeps on different levels. 1) For a particular row a coarse sweep is first done over all cells. 2) If coarse cells which must be refined are detected during the sweep they are divided evenly into 4 parts and the resulting subcells are grouped into rows of fine grid cells. End Cell End Cell 3) Sweeps are then done over the fine grid cells.

lower finer grid refinement and sweeps are done in like manner. End Cell End Cell End Cell End Cell 4) If any of the fine grid sweeps detects the need to refine again upper and lower finer grid refinement and sweeps are done in like manner. 5) At the end of each sweep end values be they coarse or fine are updated to account for new values of fluxes. 6) Interior values at the first level of refinement are updated in light of new information from the second level of refinement and these new values are used to update the coarse grid.

Coarse Sweep

Upper and lower Fine Sweep End Cell End Cell Upper and lower Fine Sweep

Upper and lower Finer Sweep End Cell End Cell End Cell End Cell Upper and lower Finer Sweep

Upper and lower Fine Sweep Coarse Sweep End Cell End Cell Upper and lower Fine Sweep End Cell End Cell End Cell End Cell Upper and lower Finer Sweep

Doing the sweeps in this way rather than doing the entire coarse sweep followed by an entire fine sweep and an entire finer sweep should make better use of the cache. But it will mean that the amount of work may differ greatly from row to row. As will be explained below this does not present a problem for the parallelization scheme we employ.

Tiles may be solved in any order with semaphores assuring that 1) On the largest scale the problem is broken into patches called tiles in 2D and bricks in 3D. Globally the tiles are stored as large 1D arrays with interior, edge and corner values stored contiguously. At the end of a sweep the edge or corner data for one tile is updated with the proper interior data from another or the appropriate boundary condition. Tiles may be solved in any order with semaphores assuring that dependant tiles are complete before a particular tile begins. Parallelization is done with Open MP 2) Each tile is solved using a standard X Y Y X sequence of sweeps. The Y sweeps are broken in half so the sequence is really X, Y lower, Y upper, Y lower, Y upper, X. This will allow the code to proceed without waiting on whole X or Y sweeps.

3) For each sweep the half tile is broken into strips. The size of strips in terms of rows is an adjustable variable. The strips may be solved in any order with spin waits assuring that necessary strips from previous sweeps are complete before a particular strip is allowed to proceed. 4) There are many more strips than CPUs. Work within strips may differ greatly because the refinement of rows within the strips may differ greatly. But, as long as the number of strips is large compared to the number of CPUs the overall process will be load balanced.

Storage on all scales 1) Coarse values are stored locally in a NVAR by I by J array. Where NVAR is the number of variables for each cell and I and J are the number of cells in the X and Y respectively for the tile. 2) Refined information is stored in two arrays. A 3 by I by J integer array points to a compressed 1D array with refined data for a cell. A value of zero in the pointer array means the cell has no fine grid structure. This is the case for most cells. Three values are used to keep track of refinement detections in the last X pass the last Y pass and the last pass be it X or Y. The pointer arrays points to the location for the first level of refinement. This location has pointers to values of a second level of refinement if one exists within the same compressed array. 3) Information for the coarse grid and pointer array is passes between tiles in chunks of standard size. The amount of compressed array data passed will depend on the amount of refinement in the “ghost zones” of adjacent tiles. The compressed array information passed to a tile is added to the end of that tiles fine grid information for it’s interior.

Some Results AMR run showing the 2D shock tube problem at ten, thirty, and fifty thousand cycles. Top panels show density. Bottom panels show where the zones are being refined by the AMR. Red indicates one level of refinement. White indicates two levels of refinement.

Some Results Comparison of AMR top with high resolution run at ten, thirty, and fifty thousand cycles. The resolution of the high resolution run was 16 times that of the coarse grid the AMR started with. Or equivalent to the AMR refining twice everywhere.

Some Results Comparison of AMR top with low resolution run at ten, thirty, and fifty thousand cycles. The resolution of the low resolution run was the same as the coarse grid the AMR started with. Or equivalent to the AMR refining nowhere.

Concluding Remarks 1) A working 2D parallel version is being tested. 2) A 3D parallel version is in preparation.