Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics PDEs modeling magnetohydrodynamics, electromechanics,

Similar presentations


Presentation on theme: "ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics PDEs modeling magnetohydrodynamics, electromechanics,"— Presentation transcript:

1 ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics PDEs modeling magnetohydrodynamics, electromechanics, stochastic damage modeling and detailed interface mechanics in high strain rate regimes on unstructured meshes in an ALE framework. Nearly all the algorithms must accept dynamic, mixed-material elements, which are modified by remeshing, interface reconstruction, and advection components. Recent trends in computing hardware have forced application developers to think about how to address and improve performance on traditional CPUs and to look forward to next generation platforms. Core to the ALEGRA performance strategy is to improve and rewrite loop bodies to be conformant with the requirements of high performance kernels, such as accessing data in array form, no pointer dereferencing, no function calls, and thread safety. Necessary to achieve this, however, are changes to the underlying infrastructure. We report on recent progress in the infrastructure to support array-based data access and on iteration of mesh objects. The effects on performance on traditional platforms will be shown. We also discuss the practical realities and cost estimates for attempting to move an existing full featured production application like ALEGRA toward running effectively on future platforms and being maintainable at the same time. The ALEGRA Production Application: Strategy, Challenges and Progress Toward Next Generation Platforms Richard R. Drake Dept 1443 - Computational Multiphysics, Sandia National Laboratories Algorithms & Abstractions for Assembly in PDE Codes, May 12-14, 2014

2 ALEGRA: Shock Hydro & MHD 20 years of development & evolution Operator split, multi-physics Includes explicit and implicit PDE solvers 2 and 3 spatial dimensions Core hydro is multi-material Lagrangian plus remap An XFEM capability is maturing 650k LOC (not including libraries, such as Trilinos) Mix of research, development, and production capabilities Extensive material model choices Shock hydro 2D Magnetics 3D Resistive MHD Extensive material model choices

3 Some ALEGRA Core Algorithms Mixed material cell treatment Remap Remesh Material interface reconstruction Material & field advection Dynamic topology Extended Finite Element Method (XFEM) Spatial refinement/unrefinement Flexible set of material models comprising each material Central difference and midpoint time integration options XFEM requires topological enrichment Material interface reconstruction Swept volume & intersection remap

4 NEVADA Infrastructure (A Framework) Everything depends on the “Mesh” Field I/O Load Balancing Contact Spatial Adaptivity XFEM Adaptivity Halo Comm In-Situ Processing In-Situ Viz Remesh Interface Reconstruction Advection Input Parsing Physics Algorithms Unstructured Mesh Structured Mesh Materials

5 Performance We need to run faster ! Customer needs NW needs Optics (marketing) We need to run faster ! Customer needs NW needs Optics (marketing) It has become clear that: There is no performance silver bullet Application software must change This will require a resource shift Can’t rely on faster CPUs anymore ! 56% 60% Muzia, 2D

6 The ALEGRA Performance Strategy Work in the present but aim for the future. Incrementally reimplement algorithms Remesh, interface reconstruction, advection Lagrangian step pieces Matrix assembly coding Time step size computation Incrementally reimplement algorithms Remesh, interface reconstruction, advection Lagrangian step pieces Matrix assembly coding Time step size computation Focus on foundational concepts Accessing bulk data in array form Limit pointer dereferencing Limit function calls (non-inlined) Minimize the data read/writes Thread safety Focus on foundational concepts Accessing bulk data in array form Limit pointer dereferencing Limit function calls (non-inlined) Minimize the data read/writes Thread safety Refactor support infrastructure Enable array-based access Enable flat indexed based iteration Enable thread safety (colorings?) Refactor support infrastructure Enable array-based access Enable flat indexed based iteration Enable thread safety (colorings?) Consider new algorithms Alternate formulations New/different algorithms Consider new algorithms Alternate formulations New/different algorithms [Komatitsch]

7 Progress in Data Layout v1 v2 v3 v4... Object-based layout Array-based layout obj_idx 012 v1 v2 v3 v4... Indexed by “obj_idx” “double**” nd  Vector_Var( CURCOOR ) nd  data[ CURCOOR ] nd  data[ CURCOOR ][ nd  obj_idx ] Becomes, in object layout: in array layout: Object-based layout has more direct access to memory. Array-based layout has better cache & TLB behavior. Depending on the algorithm and problem size, the better memory behavior may or may not offset the extra dereferencing. Object-based layout has more direct access to memory. Array-based layout has better cache & TLB behavior. Depending on the algorithm and problem size, the better memory behavior may or may not offset the extra dereferencing. “Transpose” the storage Common, existing access pattern:

8 Speedups: Object- versus Array-Based Comparisons of unmodified versus array-based code Intel chips: RedSky=Nehalem, TLCC2=SandyBridge The memory behavior wins over the extra offset in many cases.

9 Algorithms Should Use the Arrays Directly Element * el = 0; TOTAL_ELEMENT_LOOP(el) { const Vector vara = el->Vector_Var( VARA_IDX ); Vector & varb = el->Vector_Var( VARB_IDX ); el->Vector_Var( VARA_IDX ) += varb; el->Scalar_Var( VARC_IDX ) = vara * varb; } Element * el = 0; TOTAL_ELEMENT_LOOP(el) { const Vector vara = el->Vector_Var( VARA_IDX ); Vector & varb = el->Vector_Var( VARB_IDX ); el->Vector_Var( VARA_IDX ) += varb; el->Scalar_Var( VARC_IDX ) = vara * varb; } ArrayView vara = mesh->getField( VARA_IDX ); ArrayView varb = mesh->getField( VARB_IDX ); ArrayView varc = mesh->getField( VARC_IDX ); Element * el = 0; TOTAL_ELEMENT_LOOP(el) { const int ei = el->Idx(); const Vector va = vara[ei]; vara[ei] += varb[ei]; varc[ei] = va * varb[ei]; } ArrayView vara = mesh->getField( VARA_IDX ); ArrayView varb = mesh->getField( VARB_IDX ); ArrayView varc = mesh->getField( VARC_IDX ); Element * el = 0; TOTAL_ELEMENT_LOOP(el) { const int ei = el->Idx(); const Vector va = vara[ei]; vara[ei] += varb[ei]; varc[ei] = va * varb[ei]; } Object-based access: Array-based access: (Oversimplified, hypothetical loop)

10 Object List & Iteration Improvements  Index based mesh object storage  Enables iteration without dereferencing objects  Performance comparison shows no improvement   Algorithms would have to take advantage first Doubly linked lists:Index sets: for ( int i=0; i<N; ++i ) { int ni = index_list[i]; vel[ni] = old_vel + dt * accl[ni];... } for ( int i=0; i<N; ++i ) { int ni = index_list[i]; vel[ni] = old_vel + dt * accl[ni];... } Can now do this: Convert to use integer offsets 012 List: Data: Nodes: … List: Data: … 012

11 Object Ordering Exploration  Improve cache locality by mesh object ordering  Hmm? No speedups over default ordering   Improve cache locality by mesh object ordering  Hmm? No speedups over default ordering  Order elements by space filling curve [wikipedia] Order nodes by first touch element loop

12 Summary  ALEGRA has adopted a low risk performance strategy  Main concept: incrementally rewrite algorithms towards NGP standards  Progress made on support infrastructure  Array-based field data  Integer index set object looping  1.4X speedup realized on realistic simulations  Work continues on infrastructure & algorithms  Data: Topology storage, integer field data, material data  Algorithms: Remap, Lagrangian step


Download ppt "ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics PDEs modeling magnetohydrodynamics, electromechanics,"

Similar presentations


Ads by Google