Adaptive Multiscale Simulation Infrastructure - AMSI  Overview: o Industry Standards o AMSI Goals and Overview o AMSI Implementation o Supported Soft.

Slides:

Advertisements

Similar presentations

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Advertisements

Prescriptive Analytics Part I Nick Gonzalez, 2/10/14.

1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.

Trellis: A Framework for Adaptive Numerical Analysis Based on Multiparadigm Programming in C++ Jean-Francois Remacle, Ottmar Klaas and Mark Shephard Scientific.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

Nonlinearity Structural Mechanics Displacement-based Formulations.

Software Version Control SubVersion software version control system WebSVN graphical interface o View version history logs o Browse directory structure.

CSE351/ IT351 Modeling and Simulation

Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.

Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.

Finite Element Method in Geotechnical Engineering

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.

MCE 561 Computational Methods in Solid Mechanics

Chapter 10: Architectural Design

Mapping Techniques for Load Balancing

Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009.

© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March

Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

1 Presenters: Cameron W. Smith and Glen Hansen Workflow demonstration using Simmetrix/PUMI/PAALS for parallel adaptive simulations FASTMath SciDAC Institute.

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.

Lecture 9: Chapter 9 Architectural Design

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.

After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.

Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering

Lecture Topics covered CMMI- - Continuous model -Staged model PROCESS PATTERNS- -Generic Process pattern elements.

© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.

Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.

1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 

Stress constrained optimization using X-FEM and Level Set Description

Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.

MECH4450 Introduction to Finite Element Methods Chapter 9 Advanced Topics II - Nonlinear Problems Error and Convergence.

Data Structures and Algorithms in Parallel Computing Lecture 7.

CS 351/ IT 351 Modeling and Simulation Technologies Review ( ) Dr. Jim Holten.

Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

MECH593 Introduction to Finite Element Methods

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

February 19, February 19, 2016February 19, 2016February 19, 2016 Azusa, CA Sheldon X. Liang Ph. D. Software Engineering in CS at APU Azusa Pacific.

C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.

Addressing Data Compatibility on Programmable Network Platforms Ada Gavrilovska, Karsten Schwan College of Computing Georgia Tech.

Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.

Topics  Direct Predicate Characterization as an evaluation method.  Implementation and Testing of the Approach.  Conclusions and Future Work.

Institute of Mechanics and Advanced Materials An Adaptive Multiscale Method for Modelling of Fracture in Polycrystalline Materials Ahmad Akbari R., Pierre.

Unstructured Meshing Tools for Fusion Plasma Simulations

Xing Cai University of Oslo

Finite Element Method in Geotechnical Engineering

I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2

Data Structures for Efficient and Integrated Simulation of Multi-Physics Processes in Complex Geometries A.Smirnov MulPhys LLC github/mulphys

Parallel Unstructured Mesh Infrastructure

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Construction of Parallel Adaptive Simulation Loops

GENERAL VIEW OF KRATOS MULTIPHYSICS

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

Gary M. Zoppetti Gagan Agrawal Rishi Kumar

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

Adaptive Multiscale Simulation Infrastructure - AMSI  Overview: o Industry Standards o AMSI Goals and Overview o AMSI Implementation o Supported Soft Tissue Simulation o Results W.R. Tobin, D. Fovargue, D. Ibanez, M.S. Shephard Scientific Computation Research Center Rensselaer Polytechnic Institute

Current Industry Standards – Physical Simulations  Overwhelming majority of numerical simulations conducted in HPC (and elsewhere) are single scale o Continuum (e.g. Finite Element, Finite Difference) o Discrete (e.g. Molecular Dynamics)  Phenomena at multiple scales can have profound effects on the eventual solution to a problem (e.g. fine-scale anisotropies) 2

Current Industry Standards – Physical Simulations  Typically a physical model or scale is simulated using a Single Program Multiple Data (SPMD) style of parallelism o Quantities of interest (mesh, tensor fields, etc.) distributed across the parallel execution space 3 Geometric Model Partition Model Distributed Mesh

Current Industry Standards – Physical Simulations  Interacting physical models and scales introduce a much more complex set of requirements in our use of the parallel execution space o Writing a new SPMD code for each new multiscale simulation would require intense reworking of legacy codes used for single-scale simulations (possibly many times over)  Need approach which can leverage the work that has gone into creating and perfecting legacy simulations in the context of massively parallel simulations with interacting physical models 4 primary spmd code auxiliary spmd code auxiliary spmd code auxiliary spmd code

AMSI Goals  Take advantage of proven legacy codes to address the needs of multimodel problems o Minimize need to rework legacy codes to execute in more dynamic parallel environment o Only desired edit/interaction points are those locations in the code where the values produced by multiscale interactions are needed.  Allow dynamic scale load-balancing and process scale reassignment to reduce process idle time when a scale is blocked or underutilized 5

AMSI Goals  Hierarchy of focuses o Abstract-Level: Support for implementing multi-model simulations on massively parallel HPC machines o Simulation-Level: Allow dynamic runtime workflow management to implement versatile adaptive simulations o Theory-Level: Provide generic control algorithms (and hooks to allow specialization) supported by real-time minimal simulation meta- modeling o Developer-Level: Facilitate all of the above while minimizing AMSI system overheads and maintaining robust code 6 simulation goals physics analysis scale/physics linking models scale/physics linking models physical attributes simulation initialization simulation state control adaptive simulation control discretization, model, linking error estimates discretization, model, linking error estimates discretization, model, scale linking improvement model hierarchy control limits based on measured parameters

AMSI Goals  Variety of end-users targeted o Application Experts: Simulation end-users who want answers to various problems o Modeling Experts: Introduce codes expressing new physical models Combine proven physical models in new ways to describe multiscale behavior o Computational Experts: Introduce new discretization methods Introduce new numerical solution methods Develop new parallel algorithms 7

AMSI Overview  General meta-modeling services o Support for modeling computational scale-linking operations and data Model of scale-tasks and task-relations denoting multiscale data transfer o Specializing this support will facilitate interaction with high- level control and decision-making algorithms 8 explicit and computational domains math and computational models explicit and computational tensor fields scaleX explicit and computational domains math and computational models explicit and computational tensor fields scaleY geometric interactions model relationships field transformations scale linking

AMSI Overview  Dynamic management of the parallel execution space  Process reassignment will use load balancing support for underlying SPMD distributed data as well as the implementation of state-specific entry/exit vectors for scale-tasks. o Load balancing support of scale-coupling data is supported by the meta model of that data in the parallel space o Other data requires support for dynamic load balancing in any underlying libraries o Can be thought of as a hierarchy of load-balancing operations Multiple scale-task communication/computation balancing Single scale-task load balancing (standard SPMD load balancing operators) 9

AMSI Implementation  AMSI::ControlService o Primary application interaction point for AMSI, tracks the overall state of the simulation. o Higher-level control decisions use this object to implement those decisions and update the simulation meta-model.  AMSI::TaskManager o Maintain the computational meta-model of the parallel execution space and various simulation models.  AMSI::RelationManager o Manage computational scale-linking communication and load balancing required for dynamic management of parallel execution space. 10

AMSI Implementation  Real-time minimal simulation meta-model o Initialization actions Scale-tasks and their scale-linking relations o Runtime actions Data distributions representing discrete units of generic scale-linking data Communication patterns determining distribution of scale linking communication down to individual data distribution units o Shift to more dynamic scale management will require new control data to be reconciled across processes and scales Change initialization actions to be (allowable) runtime actions 11 Initialization Runtime scaleX scaleY scaleZ scaleX scale-linking data scaleY communication patterns

AMSI Implementation  Two forms of control data parallel communication o Assembly is a scale-task collective process. o Reconciliation is collective on the union of two scale-tasks associated by a communication relation. 12 scaleX scaleY Assembly scaleX scaleY Reconciliation

AMSI Implementation  Scale linking communication patterns o Constructed via standard distribution algorithms, or o Hooks provided for user-implemented pattern construction, unique to each data distribution 13 CommPatternAlgo_Register(relation_id, CommPatternCreate_FuncPtr); CommPattern_Create(dataDist_id, owner_scale_id, foreign_scale_id); scaleX scaleY

AMSI Implementation  Scale linking communication is handled, on both sides, via a single function call o Determines whether the process belongs to the sending or recving scale-task o Communicates scale-linking quantities guided by a communication pattern o Buffer is contiguous memory segment packed with POD data, MPI_Datatype must describe that datatype o At present a data distribution is limited to one POD representation 14 Communicate(relation_id, pattern_id, buffer, MPI_Datatype);

AMSI Implementation  Shift to phased communication and dynamic scale-task management will introduce new requirements o Will reduce number of explicit control data reconciliations o Will require the introduction of implicit control data reconciliations during scale-linking operations Primary simulation control points 15 scaleX scaleY assemble reconcile communicate

AMSI Implementation  Shift to phased communication and dynamic scale-task management will introduce new requirements o Will reduce number of explicit control data reconciliations o Will require the introduction of implicit control data reconciliations during scale-linking operations Primary simulation control points 16 scaleX scaleY assemble reconcile / communicate compute

Biotissue  Multiscale soft-tissue mechanics simulation o Engineering Scale : Macroscale (Finite Element Analysis) o Fine Scale controlling engineering scale behavior: Microscale Fiber-Only-RVE (Quasistatics) Microscale Fiber-Matrix-RVE (FEA) (future project) Additional cellular scale(s) (FEA)  Intermediate scale between current scales o Scale linking Deformations to RVE Force/displacement to the engineering scale 17 Macroscale Fiber-Only

Biotissue Implementation  Scalable implementation with parallelized scale-tasks 18 Macroscale Microscale macro0 macro1macro2 macroN micro0micro1 micro2microM-1microM

Biotissue Implementation  Scalable implementation with parallelized scale-tasks 19 Macroscale Microscale macro0 macro1macro2 macroN micro0micro1 micro2microM-1microM

Biotissue Implementation  Scalable implementation with parallelized scale-tasks o Ratio of macroscale mesh elements per macroscale process to number of microscale processes determines neighborhood of scale-linking communication 20 Macroscale Microscale

Biotissue Implementation o Macroscale - Parallel Finite Element Analysis Distributed partitioned mesh, distributed tensor fields defined over the mesh, distributed linear algebraic system Stress field values characterize macro-micro ratio o Fiber-only microscale - Quasistatics code ~1k Nodes per RVE Rapid assembly and solve times per RVE in serial implementation Strong scaling with respect to macroscale mesh size Initial results use fiber-only at every macroscale integration point to generate stress field values o Fiber-matrix microscale – Parallel FEA Order of magnitude more nodes per RVE (~10k-40k) More complex linear system assembly and longer solve times (nonlinear) necessitate parallel implementation per RVE 21

Biotissue Implementation  Incorporating fiber-and-matrix microscale RVEs o Hierarchy of parallelism Macroscale SPMD code Microscale fiber-only code Microscale fiber-matrix SPMD code o Nonlinear problem o Macroscale to auxiliary scales relation more complex Constitutive relation Fiber-only RVE Fiber-matrix RVE o Adaptive processes allow these relations to change over time  Intermediate cellular scale will introduce even further complexities to this situation 22

Results 23  Biotissue simulation was run with a test problem o Standard tensile test macroscale geometry (dogBone) o Various discretizations of the geometry Current results for 20k and 200k, working on memory issues (microscale) for 2m elements and higher o Holding macroscale count fixed, varying microscale o Holding micrsocale count fixed, varying macroscale o Varying both scales

Results 24

Results 25 Time (s) # of processes on microscale 1 st iteration of multiscale solver 20k mesh – 2 macro processes

Results 26 Time % # of processes on microscale 1 st iteration of multiscale solver 20k mesh – 2 macro processes

Results  (varying macroscale while holding micro fixed) 27 Time (s) # of processes on macroscale 1 st iteration of multiscale solver 200k mesh – 7680 macro processes

Results 28 Arrows indicate increasing macro size (4,8,16,32,64) Communication 1 st iteration of multiscale solver 200k mesh – time ratios Time % # of processes on microscale

Results 29  (weak scaling results) Time (s) # of processes on macroscale

Closing Remarks 30  Results are just starting to come out of the implementation o Need to identify critical areas of each scale code to improve overall performance of multiscale code o Shift to phased communication will allow macroscale to process microscale results as they arrive, increasing computation communication overlap o Contributing microscale code needs memory footprint improvements to mitigate running out of memory during longer runs (larger meshes)