Parallel Algorithm Oriented Mesh Database

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Trellis: A Framework for Adaptive Numerical Analysis Based on Multiparadigm Programming in C++ Jean-Francois Remacle, Ottmar Klaas and Mark Shephard Scientific.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Message Passing Fundamentals.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Software Version Control SubVersion software version control system WebSVN graphical interface o View version history logs o Browse directory structure.
 Jean-François Remacle,  Joe E. Flaherty and Mark S. Shephard  Rensselaer Polytechnic Institute  Parallel Algorithm Oriented.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Strategies for Implementing Dynamic Load Sharing.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
ParFUM Parallel Mesh Adaptivity Nilesh Choudhury, Terry Wilmarth Parallel Programming Lab Computer Science Department University of Illinois, Urbana Champaign.
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
PIMA-motivation PIMA: Partition Improvement using Mesh Adjacencies  Parallel simulation requires that the mesh be distributed with equal work-load and.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
Data Structures and Algorithms in Parallel Computing Lecture 7.
BOĞAZİÇİ UNIVERSITY – COMPUTER ENGINEERING Mehmet Balman Computer Engineering, Boğaziçi University Parallel Tetrahedral Mesh Refinement.
ParMA: Towards Massively Parallel Partitioning of Unstructured Meshes Cameron Smith, Min Zhou, and Mark S. Shephard Rensselaer Polytechnic Institute, USA.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
Parallel Computing Presented by Justin Reschke
 Dan Ibanez, Micah Corah, Seegyoung Seol, Mark Shephard  2/27/2013  Scientific Computation Research Center  Rensselaer Polytechnic Institute 1 Advances.
DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer,
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
A Parallel Communication Infrastructure for STAPL
Examples (D. Schmidt et al)
Auburn University
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Parallel Graph Algorithms
Definition of Distributed System
Parallel Unstructured Mesh Infrastructure
Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun
Parallel Programming By J. H. Wang May 2, 2017.
In-situ Visualization using VisIt
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
CHAPTER 3 Architectures for Distributed Systems
Performance Evaluation of Adaptive MPI
Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of.
GENERAL VIEW OF KRATOS MULTIPHYSICS
Course Outline Introduction in algorithms and applications
Implementing Architectures
Analysis models and design models
Chapter 7 –Implementation Issues
SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691
COP 3330 Object-oriented Programming in C++
Sylnovie Merchant, Ph.D. MIS 161 Spring 2005
Combinatorial Optimization of Multicast Key Management
Chapter 5 Architectural Design.
Mapping DSP algorithms to a general purpose out-of-order processor
Basic organizations and memories in distributed computer systems
Route Metric Proposal Date: Authors: July 2007 Month Year
Database System Architectures
Parallel Programming in C with MPI and OpenMP
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Dynamic Load Balancing of Unstructured Meshes
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Parallel Algorithm Oriented Mesh Database Jean-François Remacle, Mark S. Shephard & Joseph E. Flaherty Scientific Computation Research Center Rensselaer Polytechnic Institute remacle@scorec.rpi.edu Outline Algorithm Oriented Mesh Data Structure (AOMD) Parallel AOMD Technical issues Parallel adaptive example (DG) http://www.scorec.rpi.edu/AOMD

Motivations of AOMD and PAOMD Aim of AOMD is to provide services to mesh users Geometry based analysis, relation mesh to model is maintained Support of dynamic mesh adjacencies Parallel services: message passing, adaptivity and load balancing capabilities (callback pattern) No MPI calls visible to users, PAOMD hides parallel issues AOMD and PAOMD is a toolbox Standard C++ Iterators, generic design 3000 lines of code for the serial part 1000 lines of code for the parallel part Compiles in 5 minutes with gcc 3.0 Open source (BSD) : http://www.scorec.rpi/AOMD

Basics of the Algorithm Oriented Mesh Database A mesh entity is described by a set of lower dimension entities All vertices always required Vertices are atomic mesh entities, must be differentiated (using iD’s, coordinates or anything else consistent) Two entities are equal if their set of vertices are equal Not absolutely general but key to practical implementation Allows to compare mesh entities (<,>,=) independently of their representation Some associative containers for mesh entities : add, remove search Minimum information Equally dimension classified entities must be present All vertices, all regions, all edges classified on model edges and all faces classified on model faces This is a sufficient minimum: no geometrical checks

Basics of the Parallel AOMD Basics of parallel AOMD Partition boundaries treated like model boundaries Equal order mesh entities must exist on partition boundaries (partition faces, edges and vertices) Mesh vertices must be differentiated among partitions Same iD’s Same coordinates ... On processor: serial AOMD Implementation aspects Simplicity, no master, no owner Round of communication standardized, no MPI calls visible, messages automatically packed

Parallel AOMD - Mesh Adaptation Target is transient applications with thousands of mesh adaptation steps Want fast and simple adaptation Need efficient inter-processor communications Mesh Refinement Apply templates Include support of non-conforming meshes and multigrid Refined entities with remote copies must be split on all partitions Mesh Coarsening Collect all mesh entities involved onto one partition Carry out operation using serial operators on processor

Dynamic Load Balancing and Mesh Migration Need dynamic load balancing after mesh adaptation Procedures build on balancing procedures in Zoltan (from Sandia) Load balancing procedure indicates which mesh entities are to be migrated to which processor PAOMD only migrates minimum set, unless user specifically asks to migrate other entities classification after load balancing and before migration configuration after migration

Mesh Migration Steps in process Message passing Collect the mesh entities to be migrated to another partition Determine needed higher order mesh entities to be migrated (use AOMD to determine minimal set needed) Collect entities and any user attached data Perform communications to send entities and update links Message passing At PAOMD operator level it appears messages are sent one at a time This would lead to unacceptable communication costs Message packing used - AUTOPACK (from Argonne) Automatically controls message packing process Includes information and tools to optimize message size for network architecture used

Implementation issues Design C++ and generic programming STL, efficient hashing function AOMD::iterators follow C++ standard, std::algorithms (>100) may be applied in combination with AOMD::iterators AOMD::algorithms available: adjacency creation, building a graph, building a tree in a mesh, edge collapsing… Some OO Patterns Singleton, Visitor, Memento... Tradeoff efficiency vs. flexibility We believe there is no tradeoff Templates, functors, inlining… C++ can be efficient Classical example, quick sort stl::sort is twice faster (with VC6) than C qsort External libraries for Parallel Autopack, automatic message packing Zoltan, dynamic load balancing and partitioning

Mesh refinement Conformal or not (hanging nodes or mixed meshes) class myAOMD_RefCallback { public : int operator () (const meshEntity *); void callback (std::list<meshEntity *> &before, std::list<meshEntity *> &after); }; Conformal or not (hanging nodes or mixed meshes) The Algorithm AOMD:: RefUnref(theMesh, myAOMD_RefCallback);

Communications Messages are packed (autopack) The Algorithm class myAOMD_RoundOfComm { public : char * sendBuffer (const meshEntity *, int dest_proc, size_t &sizebuf) const; void recvBuffer (const meshEntity *, int src_proc, char *buf) const; }; Messages are packed (autopack) The Algorithm AOMD::roundOfComm(theMesh, myAOMD_RoundOfComm);

Load balancing Messages are packed (autopack) The Algorithm class myAOMD_LBCallback { public : char * sendBuffer (const meshEntity *, int dest_proc, size_t &sizebuf) const; void recvBuffer (const meshEntity *, int src_proc, char *buf) const; }; Messages are packed (autopack) The Algorithm AOMD::LB(theMesh, myAOMD_LBCallback);

Demonstration of Load Balancing Available on http://www.scorec.rpi.edu/AOMD

Demonstration of Load Balancing

2-D Animation of Instability Linear DG elements, 30,000 to 800,000 dof Atwood Number, A = 1/3 10 fourier modes in “random” distribution time for the bubble to reach top of the window (y = 0.5) : 5 sec This calculation: a = 0.06 Experiments: a = 0.058 - 0.065 Theory (Glimm, et al) a = 0.045 - 0.06

Refined 3-D Meshes for Rayleigh Taylor Instability non-conforming hexahedron mesh light fluid 24 steps of refinement heavy fluid 72 steps of refinement 104 steps of refinement

Conclusions PAOMD advantages Future work Quite small piece of software, documented Focused, mesh management only Asks for minimum user knowledge about parallel issues Efficient implementation Future work Terascale computers PAOMD concepts are theoretically scalable Hardware heterogeneity, machine and network models have to be added in partitioners 64 Procs, 40 GDof