Improved Mesh Partitioning For Parallel Substructure Finite Element Computations Shang-Hsien Hsieh, Yuan-Sen Yang and Po-Liang Tsai Department of Civil Engineering National Taiwan University Taipei, Taiwan, R.O.C. Sponsored by the National Science Council of R.O.C.
Objective n To improve the efficiency of the parallel substructure finite element method through investigation on mesh partitioning.
Parallel Substructure Method n (a) Mesh partitioning (preprocessed by a single processor) n (b) Concurrent substructure condensation n (c) Solution of condensed system equations associated with the interface d.o.f.’s using a single processor n (d) Concurrent solution of the substructure internal d.o.f.’s (a) (b) (c) (d)
Parallel Substructure Method (Cont’d) Major difficultyMajor difficulty –Workloads are not well balanced. ReasonReason –Insufficient mesh partitioning criteria BLD ,480 BC elms 152,400 D.O.F.‘s
n Common criteria used by most of mesh partitioning algorithms: –Balance of number of elements among substructures –Minimization of total number of interface nodes Mesh Partitioning
Mesh Partitioning (Cont’d) n New criteria –Balance of the total element weights among substructures –Minimization of number of interface nodes
n An iterative approach –Mesh partitioning kernel – METIS (Karypis and Kumar, 1995) –Evaluation of performance indicators –Adjustment of element weights based on the number of substructure interface nodes Improved Mesh Partitioning
Improved Mesh Partitioning (Cont’d) n Tuning factor F of iteration i : N i IN, j N i IN, j Min( N i IN, j, for each substructure j ) i j F i j = 8/13 i 1 F i 1 = 6/6=1.0 i 3 F i 3 = 7/6=1.17 i 2 F i 2 = 6/6=1.0
Improved Mesh Partitioning (Cont’d) n Indicator E: –Indicator of efficiency of iteration i –E i = max(E i 1, j for each substructure j ) + E i 2 –E i 1, j : condensation time indicator of substructure j –E i 1, j = [(I i 1, j ) 2.5 +(I i 2, j ) 2.5 ] / [(I 0 1, j ) 2.5 +(I 0 2, j ) 2.5 ] –I i 1, j : N i ELM, j / N ELM –I i 2, j : N i IN, j / N 0 IN, j –Interface solution time factor - E 2,i : –E i 2 = (N i IN / N 0 IN ) 3
Improved Mesh Partitioning (Cont’d) n Indicator E vs. Total elapsed time T Model: 4E solid(B20) elements 48,975 D.O.F.‘s (Tsai, 1999) Normalized E or T Iteration i
CPU: Intel Pentium II- 350 Memory: NEC 128MB PC100 SDRAM Network: ACCTON 10/ 100 Mbps D-Link 100 Mbps Hub D-Link 100 Mbps Hub OS: Linux Redhat 5.2 CPU: Intel Pentium II- 350 Memory: NEC 128MB PC100 SDRAM Network: ACCTON 10/ 100 Mbps D-Link 100 Mbps Hub D-Link 100 Mbps Hub OS: Linux Redhat 5.2 PC Cluster Computing Environment
Numerical Experiments BLADE 944 solid(B20) elements 18,180 D.O.F.‘s n Improved mesh partitioning iterations (Wawrzynek, 1991) N sub ( number of substructures) = 4 CPU time: 1.6 sec.
METIS without iteration Improved mesh partitioning (with 2 iterations) BLADE 944 solid(B20) elements 18,180 D.O.F.‘s Np = 4 Hardware: PC cluster ( P II 350) OS : Linux Redhat 5.2 Numerical Experiments (Cont’d) 67.4 sec sec. Additional 1.6 sec. for iterative mesh partitioning
Numerical Experiments (Cont’d) n Improved mesh partitioning iterations ESTORY30 12,750 BC elements 28,080 D.O.F.‘s CPU time: 3.6 sec. N sub ( number of substructures) = 4
METIS without iteration Improved mesh partitioning (with 1 iteration) ESTORY30 12,750 BC elements 28,080 D.O.F.‘s Np = 4 Hardware: PC cluster ( P II 350) OS : Linux Redhat 5.2 Numerical Experiments (Cont’d) 89.2 sec sec. Additional 3.6 sec. for iterative mesh partitioning 64.5 sec.
Conclusions n The iterative mesh partitioning approach can effectively improve the efficiency of parallel substructure finite element computations. n Better mesh partitioning is still needed. n A parallel equation solver becomes more important.