2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure

Slides:

Advertisements

Similar presentations

Steady-state heat conduction on triangulated planar domain May, 2002

Advertisements

Practical techniques & Examples

Chapter 6: Memory Management

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Reference: Message Passing Fundamentals.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.

Computer Organization and Architecture

Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.

The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.

An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Data Structures and Algorithms in Parallel Computing Lecture 7.

Load Balancing : The Goal Given a collection of tasks comprising a computation and a set of computers on which these tasks may be executed, find the mapping.

1 Data Organization Example 1: Heap storage management Maintain a sequence of free chunks of memory Find an appropriate chunk when allocation is requested.

High Performance Computing Seminar

Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.

File-System Management

Auburn University

Spencer MacBeth Supervisor - Dr. Ramon Lawrence

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Virtual memory.

Course Developer/Writer: A. J. Ikuomola

Ioannis E. Venetis Department of Computer Engineering and Informatics

ChaNGa: Design Issues in High Performance Cosmology

Data Structure Interview Question and Answers

Operating Systems (CS 340 D)

Parallel Programming By J. H. Wang May 2, 2017.

In-situ Visualization using VisIt

Lecture No.43 Data Structures Dr. Sohail Aslam.

DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

William Stallings Computer Organization and Architecture

Review Graph Directed Graph Undirected Graph Sub-Graph

Algorithm Analysis CSE 2011 Winter September 2018.

Program based on pointers in C.

Main Memory Management

Parallel Programming in C with MPI and OpenMP

Database Implementation Issues

Craig Schroeder October 26, 2004

Unit-2 Divide and Conquer

Data Structures and Algorithms

Indexing and Hashing Basic Concepts Ordered Indices

Parallel Sorting Algorithms

Gary M. Zoppetti Gagan Agrawal

Advance Database System

Advanced Implementation of Tables

Database Design and Programming

Advanced Implementation of Tables

Adaptive Mesh Applications

Adaptivity and Dynamic Load Balancing

Data Structures & Algorithms

Comparison of CFEM and DG methods

Parallel Programming in C with MPI and OpenMP

Database Implementation Issues

(via graph coloring and spilling)

Lecture-Hashing.

Presentation transcript:

2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure Key generation/indexing scheme, mesh partitioning, ordering, hashtable Code Structure Preprocessing, AFEAPI code, postprocessing Node, element, hashtable Available at: http://wings.buffalo.edu/eng/mae/acm2e/afeapi_download.html

Difficulties of Parallel Adaptive Codes Dynamic allocation of memory Dynamic creation/deletion of objects (eg. nodes, elements) Dynamic load-balancing Dealing with on processor, off-processor and global information Global constraints

Infrastructure Requirements Ability to insert and delete objects during simulation dynamic allocation and de-allocation of memory Automatically distribute/redistribute data and computation among processors Dynamic Load Balancing Maintain irregularity and other refinement constraints on distributed data

Data/Computation Management Persistent Geometric/Mesh Data Geometry: Vertices, Edges, Faces, regions (Shephard,Flaherty …) Mesh: Nodes, Elements, Edges(?), Faces(?) Dynamic Computational Data: Matrices, Intermediate Solution Vectors Matrices generated as additive blocks following the distribution of the mesh Vectors follow distribution of nodes

Data/Computation Management Data Distribution/Scheduling is achieved by: Assigning a key to each object that is derived from the data itself These keys define a simple ordering scheme for the data Partitioning the key space produces a data/computation distribution

Space Filling Curve hn is continuous hn can completely fill a unit n dimensional hypercube there exists For any set of points

If points in the n-dimensional hyper-cube are close to each other then Space Filling Curve Characteristics of the SFC ordering (important in the case of adaptive meshes): Geometric locality: If points in the n-dimensional hyper-cube are close to each other then the images under are also close to each other in the mean sense -- can help cache performance too Sub-cube property: If the entire domain split up in a recursive fashion, the curve passes through all points in each sub-cube at a particular level before going through points in a neighboring sub-cube. Self similarity Fractals The curve can be generated from a basic stencil Possible future use for integrating GIS and simulation data

Space Filling Curve A unique key (identifier) of an object can be created from the SFC The key is stored as an array of size keylength of unsigned integers Map the location from [0,1]d to the key space K[0,Maximum Unsigned Integer*keylength] For the key, the left most bit is the most important and the right most bit is the least important Objects with keys that have “close” values for the leading digits of a key are near each other and objects that do not have “close” values for the leading digits of a key will likely be far from each other

Mesh Partitioning To achieve good parallel efficiency computational load should be equally distributed (if computational load changes dynamically, mesh partitioning needs to be done dynamically as well) Good partitioning: load balanced; communication minimized (all the processors are equally used); (communication takes a lot of time) Two basic types of partitioning are based on: graph (connectivity) of the mesh Mesh quality Cost High High Geometric/mesh traversal Mesh quality Cost Fair Low

Mesh Partitioning and Repartitioning The Space Filling Curve (SFC) based algorithm Geometric mesh partitioning algorithm IDEA: Use the SFC ordering because it is easier to deal with ordering objects and splitting sorted list in 1 dimension then in dimensions greater than 1 Determine the key of each element given by the SFC algorithm Sort the list of objects by their keys Divide the list to P equal pieces

Mesh Partitioning and Repartitioning Problem Repartitioning causes massive data migration and loss of parallel efficiency Solution Predictive load balancing strategies are used to compute incremental modifications! Load balancing is performed before mesh is adapted based on expected amount of work after mesh adaptation.

Predictive Load Balancing

Hashtable Hashtable is used to access objects quickly while decreasing memory usage Hashtable size should be much larger than the amount of objects that are accessed through the hashtable

Hashing Object is put in hashtable according to its SFC key Address calculator finds an object’s address from the SFC key and the minimum and maximum key for a given processor Minimum and maximum key values are calculated at creation of the hashtable – possible to have an object that has a key less than the minimum key or greater than the maximum key, then object will be put in the beginning or end of the hashtable, respectively Objects with larger key values are put in after objects with smaller key values If the particular place in the hashtable is already occupied, objects at that level are stored in linked list

AFEAPI Hashtable

Code Structure

1. Create mesh file (preferable using Hypermesh) Using AFEAPI Instructions at: http://wings.buffalo.edu/eng/mae/acm2e/afeapi_download.html (use version AFEAPI_VBR_04.tar.gz) 1. Preprocessing 1. Create mesh file (preferable using Hypermesh) 2. Run serial preprocessing code specifying material properties and amount of processors to use 2. Run parallel code 3. Serial postprocessing in Tecplot

Code Structure Set up MPI (Initialize, create MPI communication structures) Read in data/Create persistent data (eg. Hashtable) Create local and global ordering of dof Solve Calculate sparse storage info (VBR sparse storage scheme) Assemble stiffness matrices (eliminate bubble dof if they exist) Solve global system (reconstruct bubble dof if they were eliminated) Postprocess Calculate constrained nodes Put solution in node objects Calculate error estimate Create results file for use in Tecplot If error estimate is below desired tolerance: end, otherwise, go on to 7 Perform predictive load balancing/mesh partitioning Refine the element size (h adapt) Refine the element polynomial order (p adapt) Smooth load balance/mesh partitions Go to Step 3

Can be recycled from old FORTRAN codes!! Code Customization Customization for adaptive static hp-FEM requires providing routines to compute element stiffness and error e.g. subroutine elemcom(ifg, nequ, ndff, Nc, Norder, Nelb, bcvalue, Icon, xnod, ek, ef) subroutine errest(nequ, Norder, xnod, Utemp, Nelb, bcvalue, Icon, errorsq, solsq) Can be recycled from old FORTRAN codes!! For dynamic calculations or other discretization methods, customization may be a little mover involved

Important C++ Classes in AFEAPI 3 major classes: node class, element class, hashtable class HASHTABLE class (csrc/header/hashtab.h): Used for accessing node and element objects NODE class (csrc/header/node.h): Key is generated from node coordinates 3 Types of nodes: vertex, edge and bubble ELEMENT Class (csrc/header/element2.h) Only use/allow quadrilateral elements Geometry is defined by 9 nodes Element key is generated from bubble node coordinates Node ordering is counterclockwise