Download presentation
Presentation is loading. Please wait.
Published byKatrina Dixon Modified over 9 years ago
1
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale
2
2 Unstructured Grids Typically arise in finite element method: –E.g. Space is tiled with variable-size-and-shape triangles –in 3D: may be tetrahedra, or hexahedra –Allows one to adjust the resolution in different regions The base data structure is a graph –Often, represented as bipartite graph: E.g. Triangles (Elements) and Nodes
3
3 Unstructured grid computations Typically –Attributes (stresses, strains, pressure, temperature, velocities) are attached to nodes and elements –Programs loop over elements and loop over nodes, separately Each time you “visit” an element: –Need to access, and possibly modify, all nodes connected to it. Each time you visit a node: –Typically, access and modify only node attributes –Rarely: access/modify attributes of elements connected to it
4
4 Unstructured grids: parallelization issues Two concerns: –The unstructured grid graph must be partitioned across processors vproc (virtual processor, in general) –Boundary values must be shared What to partition and what to duplicate (at the boundaries) –Partition elements (so each element belongs to exactly one vproc) –Share nodes at the boundary Each node potentially has several ghost copies –Why is this better than partitioning nodes, and sharing elements?
5
5 Partitioning unstructured grids Not so simple as structured grids –“by rows”, “by columns”, “rectangular”,.. Don’t work Geometric? –Applicable only if each node has coordinates –Even when applicable, may not lead to good performance What performance metrics to use? –Load balance: the number of elements in each partition –Communication Number of shared nodes (Total) Maximum number of shared nodes for any one partition Maximum number of “neighbor partitions” for any partition –Why? per message cost Geometric: difficult to optimize both
6
6 MP issues: Charm++ help: –Today (Wed, 2/21) 2pm to 5:30 pm, –2504, 2506, 2508 DCL (Parallel Programming Laboratory) My office hours for this week: –Thursday 10:00 A.M. to 12:00 noon on Thursday
7
7 Grid partitioning When communication costs are relatively low –Either because the data-set is large or the computation per element is large –Geometric methods can be used: Orthogonal Recursive Bisection (ORB) –Basic idea: Recursively divide sets into two Keep shapes squarish as long as possible –For each set: Find bounding box (Xmax, Xmin, Ymax, Ymin,..) Find the longer dimension (X or Y or..) Find a cut along the longer dimension that will divide the set equally –Doesn’t have to be at the midpoint of the section Partition the element in the two sets based on the cut Repeat for each set –Variation: non-power-of-two processors
8
8 Grid partitioning: quad/oct trees Another Geometric technique: At each step, divide the set into 2xD subsets, where D is the number of physical dimensions\ –In 2-D: 4 quadrants –Dividing line goes thru geometric midpoint of the box. –Bounding box is NOT recalculated each time in the recursion Comparison with ORB
9
9 Grid partitioning: Graph partitioners CHACO and METIS are well-known programs Optimize both load imbalance and communication overhead –But often ignore per-message cost, or the maximum-per-partition costs Earlier algorithm: KR (Kernigham-Ritchie) –METIS first coarsens the graph, applies KR to it, and then refines the graph Doing this not just once, but a k-level coarsening-refining
10
10 Crack Propagation Explicit FEM code Zero-volume Cohesive Elements inserted near the crack As the crack propagates, more cohesive elements added near the crack, which leads to severe load imbalance Framework handles –Partitioning elements into chunks –Communication between chunks –Load Balancing Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Pictures: S. Breitenfeld, and P. Geubelle
11
11 Crack Propagation Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Both decompositions obtained using Metis. Pictures: S. Breitenfeld, and P. Geubelle
12
12 Unstructured grid: managing communication Suppose triangles A B and C are on different processors –Node 1 is shared between all 3 processors –Must have a copy on all 3 processors –When values need to be added up: Option 1 (star): let A (say) be the “owner” of 1, –B and C send their copy of “1” to A, –A combines them (usually, just adding them up) –A sends updated values to B and C Option 2: (symmetric): each sends its copy of 1 to both the others Which one is better? A B C 1
13
13 Unstructured grid: managing communication In either scheme: –Each vproc maintains a list of neighboring vprocs –For each neighbor: maintains a list of shared nodes Each node has a local index (my 5th node). The same list works in both directions –Send –Receive
14
14 Adaptive variations: Structured grids: Suppose you need a different level of refinement at different places in the grid: Adaptive Mesh Refinement –Quad and Oct trees can be used –Neighboring regions may have resolutions that differ by 1 level Requiring (possibly complex) interpolation algorithms –The fact that you have to do the refinement in the middle of a parallel computation makes a difference Again and again, but often not every step Adjust your communication list Alternatively, put a layer of software in the middle to do the interpolations –so each square chunk thinks it has exactly one nbr on each side
15
15 Adaptive variations: unstructured grids Mesh may be refined in places, dynamically: –This is much harder to do (even sequentially) than for structured grids Think about triangles: –Quality restriction: avoid skinny long triangles –From parallel computing point of view: Need to change the list of shared nodes Load balance may shift Load balancing: –Abandon partitioning and repartition –Incrementally adjust (typically with virtualization)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.