Download presentation
Presentation is loading. Please wait.
Published byGriffin Hancock Modified over 6 years ago
1
Parallel Iterative Solvers for Ill-Conditioned Problems with Reordering
Kengo Nakajima Department of Earth & Planetary Science, The University of Tokyo. Parallel preconditioned iterative solvers for FEM-type applications Optimization on the Earth Simulator and other SMP cluster type architectures Flat MPI, OpenMP/MPI Hybrid Reordering for parallel/vector processing Multicoloring (MC) RCM, CM-RCM (Cyclic Multicolor) Basically, convergence is faster if the number of color is larger … Smaller vector length ⇒ poor performance on vector processors. Synchronization overhead of OpenMP DJDS provides data locality if the color number increases. On scalar processors, performance may be improved as color number increases (both for flat MPI and Hybrid). Hitachi SR11000 ●DJDS ■DCRS ES IBM SP3 Effect of color# and matrix storage (PGA model, 1 SMP node, OpenMP) MC RCM CM-RCM Matrix storage DJDS: Descending-order Jagged Diagonal Storage) DCRS: Descending-order Compressed Row Storage do i= 1, N k1= index(i-1)+1 k2= index(i) do k= k1, k2 kk= item(k) Y(i)= Y(i)+A(k)*x(k) enddo do j= 1, NCON do i= 1, NN(j) k = index(j-1) + i kk= item (k) Y(i)= Y(i)+A(k)*x(k) enddo DJDS with long innermost loops is suitable for vector processors. Reduction type loop of DCRS is more suitable for cache-based scalar processor because of its localized operation. For well-conditioned problems, difference between MC/RCM/CM-RCM and effect of color number is not so significant. ES ●DJDS ■DCRS ▲No Reordering ●MC ●CM-RCM ▲RCM 3D elastic cube with uniform meshes (106nodes, 3x106 DOF) Single SMP node with OpenMP, DJDS Hitachi SR11000 Effect of Reordering on ES for 3D Linear-Elastic Problem (1 SMP node, OpenMP) Complicated PGA (Pin Grid Array) model But it’s significant for ill-conditioned problems 61 pins 956,128 elements 1,012,354 nodes (3,037,062 DOF) Single SMP node with OpenMP, DJDS ES ●MC ●CM-RCM ▲RCM ES Hitachi SR11000 Number of independent sets for RCM is 2985. Min. number of colors for independent CM-RCM is 2381. Max. ratio: 30:1 CM-RCM provides the most robust and efficient convergence for both of vector and scalar processors. complicated geometries Hitachi SR11000
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.