Improved Mesh Partitioning For Parallel Substructure Finite Element Computations Shang-Hsien Hsieh, Yuan-Sen Yang and Po-Liang Tsai Department of Civil.

Slides:



Advertisements
Similar presentations
Theory of Computer Science - Algorithms
Advertisements

Line Balancing Problem A B C 4.1mins D 1.7mins E 2.7 mins F 3.3 mins G 2.6 mins 2.2 mins 3.4 mins.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Beowulf Supercomputer System Lee, Jung won CS843.
Fractal Element Antenna Genetic Optimization Using a PC Cluster ACES Proceedings March 21, 2002 Monterey, CA.
PAGE: A Partition Aware Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma EECS, Peking University, China.
ISEE: Internet-based Simulation for Earthquake Engineering Part (I): The Database Approach Yuan-Sen YANG, Shiang-Jung WANG, Kung-Juin WANG, Keh-Chyuan.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Introduction CS 524 – High-Performance Computing.
4/26/05Han: ELEC72501 Department of Electrical and Computer Engineering Auburn University, AL K.Han Development of Parallel Distributed Computing System.
1 CS533 Modeling and Performance Evaluation of Network and Computer Systems Group Work.
Some Experiences on Parallel Finite Element Computations Using IBM/SP2 Yuan-Sen Yang and Shang-Hsien Hsieh National Taiwan University Taipei, Taiwan, R.O.C.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Compact State Machines for High Performance Pattern Matching Department of Computer Science and Information Engineering National Cheng Kung University,
MPE++: An Object-Oriented Mesh Partitioning Environment in C++ Shang-Hsien Hsieh, Yuan-Sen Yang, Wei-Choung Cheng, Ming-Der Lu, Elisa D. Sotelino Department.
Scientific Computing on Heterogeneous Clusters using DRUM (Dynamic Resource Utilization Model) Jamal Faik 1, J. D. Teresco 2, J. E. Flaherty 1, K. Devine.
A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping Maria Athanasaki, Aristidis Sotiropoulos, Georgios Tsoukalas,
Y. S. Yang and S. H. Hsieh National Taiwan University, Taipei, Taiwan December 8, 2000 FE2000: An Object-Oriented Framework For Parallel Nonlinear Dynamic.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical.
Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
March 12, 2008 A Parallel Algorithm for Optimization-Based Smoothing of Unstructured 3-D Meshes by Vincent C. Betro.
1 A Domain Decomposition Analysis of a Nonlinear Magnetostatic Problem with 100 Million Degrees of Freedom H.KANAYAMA *, M.Ogino *, S.Sugimoto ** and J.Zhao.
SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques.
EFFECTS OF LOCALITY, CONTENT AND JAVA RUNTIME ON VIDEO PERFORMANCE Vikram Chhabra, Akshay Kothare, Mark Claypool Computer Science Department Worcester.
Quasi-static Channel Assignment Algorithms for Wireless Communications Networks Frank Yeong-Sung Lin Department of Information Management National Taiwan.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
Parallel Simulation of Continuous Systems: A Brief Introduction
Minimax Open Shortest Path First (OSPF) Routing Algorithms in Networks Supporting the SMDS Service Frank Yeong-Sung Lin ( 林永松 ) Information Management.
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
1 Raspberry Pi HPC Testbed By Bradford W. Bazemore Georgia Southern University.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang.
Large-scale Structural Analysis Using General Sparse Matrix Technique Yuan-Sen Yang, Shang-Hsien Hsieh, Kuang-Wu Chou, and I-Chau Tsai Department of Civil.
Easy Deployment of the WRF Model on Heterogeneous PC Systems Braden Ward and Shing Yoh Union, New Jersey.
DBS A Bit-level Heuristic Packet Classification Algorithm for High Speed Network Author : Baohua Yang, Xiang Wang, Yibo Xue, Jun Li Publisher : th.
QoS Supported Clustered Query Processing in Large Collaboration of Heterogeneous Sensor Networks Debraj De and Lifeng Sang Ohio State University Workshop.
PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.
Minimalist’s Linux Cluster Changyoung Choi, Jeonghyun Kim, Seyong Kim Department of Physics Sejong University.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Gravitational N-body Simulation Major Design Goals -Efficiency -Versatility (ability to use different numerical methods) -Scalability Lesser Design Goals.
Data Structures and Algorithms in Parallel Computing Lecture 7.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
HYDROGRID J. Erhel – October 2004 Components and grids  Deployment of components  CORBA model  Parallel components with GridCCM Homogeneous cluster.
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
- Divided Range Multi-Objective Genetic Algorithms -
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
Load Rebalancing for Distributed File Systems in Clouds.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Improving Parallelism in Structural Data Mining Min Cai, Istvan Jonyer, Marcin Paprzycki Computer Science Department, Oklahoma State University, Stillwater,
Evolution at CERN E. Da Riva1 CFD team supports CERN development 19 May 2011.
Parallel Plasma Equilibrium Reconstruction Using GPU
Frank Yeong-Sung Lin (林永松) Information Management Department
Department of Computer Science University of California, Santa Barbara
GPU Implementations for Finite Element Methods
Department of Information Management National Taiwan University
Frank Yeong-Sung Lin (林永松) Information Management Department
CINECA HIGH PERFORMANCE COMPUTING SYSTEM
Presentation transcript:

Improved Mesh Partitioning For Parallel Substructure Finite Element Computations Shang-Hsien Hsieh, Yuan-Sen Yang and Po-Liang Tsai Department of Civil Engineering National Taiwan University Taipei, Taiwan, R.O.C. Sponsored by the National Science Council of R.O.C.

Objective n To improve the efficiency of the parallel substructure finite element method through investigation on mesh partitioning.

Parallel Substructure Method n (a) Mesh partitioning (preprocessed by a single processor) n (b) Concurrent substructure condensation n (c) Solution of condensed system equations associated with the interface d.o.f.’s using a single processor n (d) Concurrent solution of the substructure internal d.o.f.’s (a) (b) (c) (d)

Parallel Substructure Method (Cont’d) Major difficultyMajor difficulty –Workloads are not well balanced. ReasonReason –Insufficient mesh partitioning criteria BLD ,480 BC elms 152,400 D.O.F.‘s

n Common criteria used by most of mesh partitioning algorithms: –Balance of number of elements among substructures –Minimization of total number of interface nodes Mesh Partitioning

Mesh Partitioning (Cont’d) n New criteria –Balance of the total element weights among substructures –Minimization of number of interface nodes

n An iterative approach –Mesh partitioning kernel – METIS (Karypis and Kumar, 1995) –Evaluation of performance indicators –Adjustment of element weights based on the number of substructure interface nodes Improved Mesh Partitioning

Improved Mesh Partitioning (Cont’d) n Tuning factor F of iteration i : N i IN, j N i IN, j Min( N i IN, j, for each substructure j ) i j F i j = 8/13 i 1 F i 1 = 6/6=1.0 i 3 F i 3 = 7/6=1.17 i 2 F i 2 = 6/6=1.0

Improved Mesh Partitioning (Cont’d) n Indicator E: –Indicator of efficiency of iteration i –E i = max(E i 1, j for each substructure j ) + E i 2 –E i 1, j : condensation time indicator of substructure j –E i 1, j = [(I i 1, j ) 2.5 +(I i 2, j ) 2.5 ] / [(I 0 1, j ) 2.5 +(I 0 2, j ) 2.5 ] –I i 1, j : N i ELM, j / N ELM –I i 2, j : N i IN, j / N 0 IN, j –Interface solution time factor - E 2,i : –E i 2 = (N i IN / N 0 IN ) 3

Improved Mesh Partitioning (Cont’d) n Indicator E vs. Total elapsed time T Model: 4E solid(B20) elements 48,975 D.O.F.‘s (Tsai, 1999) Normalized E or T Iteration i

CPU: Intel Pentium II- 350 Memory: NEC 128MB PC100 SDRAM Network: ACCTON 10/ 100 Mbps D-Link 100 Mbps Hub D-Link 100 Mbps Hub OS: Linux Redhat 5.2 CPU: Intel Pentium II- 350 Memory: NEC 128MB PC100 SDRAM Network: ACCTON 10/ 100 Mbps D-Link 100 Mbps Hub D-Link 100 Mbps Hub OS: Linux Redhat 5.2 PC Cluster Computing Environment

Numerical Experiments BLADE 944 solid(B20) elements 18,180 D.O.F.‘s n Improved mesh partitioning iterations (Wawrzynek, 1991) N sub ( number of substructures) = 4 CPU time: 1.6 sec.

METIS without iteration Improved mesh partitioning (with 2 iterations) BLADE 944 solid(B20) elements 18,180 D.O.F.‘s Np = 4 Hardware: PC cluster ( P II 350) OS : Linux Redhat 5.2 Numerical Experiments (Cont’d) 67.4 sec sec. Additional 1.6 sec. for iterative mesh partitioning

Numerical Experiments (Cont’d) n Improved mesh partitioning iterations ESTORY30 12,750 BC elements 28,080 D.O.F.‘s CPU time: 3.6 sec. N sub ( number of substructures) = 4

METIS without iteration Improved mesh partitioning (with 1 iteration) ESTORY30 12,750 BC elements 28,080 D.O.F.‘s Np = 4 Hardware: PC cluster ( P II 350) OS : Linux Redhat 5.2 Numerical Experiments (Cont’d) 89.2 sec sec. Additional 3.6 sec. for iterative mesh partitioning 64.5 sec.

Conclusions n The iterative mesh partitioning approach can effectively improve the efficiency of parallel substructure finite element computations. n Better mesh partitioning is still needed. n A parallel equation solver becomes more important.