Download presentation
Presentation is loading. Please wait.
Published byStuart Norris Modified over 9 years ago
1
1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne National Laboratory
2
2 Collaborators: Barry Smith Mathematics and Computer Science, Argonne National Laboratory Michael Sternberg, Peter Zapol Materials Science, Argonne National Laboratory
3
3 Modeling of Nanostructured Materials * System sizeAccuracy
4
4 Density-Functional based Tight-Binding (DFTB)
5
5 Matrices are large: ultimate goal 50,000 atoms with electronic structure ~ N=200,000 sparse: non-zero density -> 0 as N increases dense solutions are requested: 60% eigenvalues and eigenvectors Dense solutions of large sparse problems!
6
6 DFTB implementation (2002)
7
7 Two classes of methods: Direct methods (dense matrix storage): - compute all or almost all eigensolutions out of dense matrices of small to medium size - Tridiagonal reduction + QR or Bisection - Time = O(N 3 ), Memory = O(N 2 ) - LAPACK, ScaLAPACK Iterative methods (sparse matrix storage) : - compute a selected small set of eigensolutions out of sparse matrices of large size - Lanczos - Time = O(nnz*N) <= O(N 3 ), Memory = O(nnz) <= O(N 2 ) - ARPACK, BLZPACK,…
8
8 DFTB-eigenvalue problem is distinguished by (A, B) is large and sparse Iterative method A large number of eigensolutions (60%) are requested Iterative method + multiple shift-and-invert The spectrum has - poor average eigenvalue separation O(1/N), - cluster with hundreds of tightly packed eigenvalues - gap >> O(1/N) Iterative method + multiple shift-and-invert + robusness The matrix factorization of (A- B)=LDL T : not-very-sparse(7%) <= nonzero density <= dense(50%) Iterative method + multiple shift-and-invert + robusness + efficiency Ax= Bx is solved many times (possibly 1000’s) Iterative method + multiple shift-and-invert + robusness + efficiency + initial approximation of eigensolutions
9
9 Lanczos shift-and-invert method for Ax = Bx: Cost: - one matrix factorization: - many triangular matrix solves: Gain: - fast convergence - clustering eigenvalues are transformed to well-separated eigenvalues - preferred in most practical cases
10
10 Multiple Shift-and-Invert Parallel Eigenvalue Algorithm
11
11 Multiple Shift-and-Invert Parallel Eigenvalue Algorithm
12
12 Idea: distributed spectral slicing compute eigensolutions in distributed subintervals Example: Proc[1] Assigned Spectrum: ( [0], [2] ) shrink Computed Spectrum: { [1] } expand Proc[0] Proc[1] Proc[2] min i min max i max [0] [1] [2]
13
13 Software Structure MPI PETSc SLEPc MUMPS ARPACK Shift-and-Invert Parallel Spectral Transforms (SIPs) Select shifts Bookkeep and validate eigensolutions Balance parallel jobs Ensure global orthogonality of eigenvectors Subgroup of communicators
14
14 Software Structure ARPACK www.caam.rice.edu/software/ARPACK/ SLEPc Scalable Library for Eigenvalue Problem Computations www.grycap.upv.es/slepc/ MUMPS MUltifrontal Massively Parallel sparse direct Solver www.enseeiht.fr/lima/apo/MUMPS/ PETSc Portable, Extensible Toolkit for Scientific Computation www.mcs.anl.gov/petsc/ MPI Message Passing Interface www.mcs.anl.gov/mpi/
15
15 Select shifts: - robustness: be able to compute all the desired eigenpairs under extreme pathological conditions - efficiency: reduce the total computation cost (matrix factorization and Lanczos runs)
16
16 Select shifts: ii e.g., extension to the right side of i : = k + 0.45( k – 1 ) mid = ( i + max )/2 i+1 = min( , mid ) k 1 i+1 max mid
17
17 Eigenvalue clusters and gaps Gap detection Move shift outside of a gap
18
18 Bookkeep eigensolutions COMPUT COMPUT COMPUT DONE UNCOMPUT Multiple eigenvalues aross processors: proc[0] proc[1] Overlap & Match 00 11
19
19 Bookkeep eigensolutions
20
20 Balance parallel jobs
21
21 SIPs Proc[0] Proc[1] Proc[2] min i min max i max [0] [1] [2]
22
22 d) pick next shift ; update computed spectrum [ min, max ] and send to neighboring processes e) receive messages from neighbors update its assigned spectrum ( min, max ) Proc[0] Proc[1] Proc[2] min max [0] [1] [2] [0] 1 [1] 1
23
23 Accuracy of the Eigensolutions Residual norm of all computed eigenvalues is inherited from ARPACK Orthogonality of the eigenvectors computed from the same shift is inherited from ARPACK Orthogonality between the eigenvectors computed from different shifts? –Each eigenvalue singleton is computed through a single shift –Eigenvalue separation between two singletons satisfying eigenvector orthogonality
24
24 Subgroups of communicators: when a single process cannot store matrix factor or distributed eigenvectors [0]id = [3] idEps = 1 idMat = 0 [6][9] [1][4] idEps = 1 idMat = 1 [7][10] [2][5] idEps = 1 idMat = 2 [8][11] min max commEps commMat
25
25 Numerical Experiments on Jazz Jazz, Argonne National Laboratory: Compute – 350 nodes, each with a 2.4 GHz Pentium Xeon Memory – 175 nodes with 2 GB of RAM, 175 nodes with 1 GB of RAM Storage – 20 TB of clusterwide disk: 10 TB GFS and 10 TB PVFS Network – Myrinet 2000, Ethernet
26
26 Tests Diamond (a diamond crystal) Grainboundary-s13, Grainboundary-s29, Graphene, MixedSi, MixedSiO2, Nanotube2 (a single-wall carbon nanotube) Nanowire9 Nanowire25 (a diamond nanowire) …
27
27 Numerical results: Nanotube2 (a single-wall carbon nanotube) Non-zero density of matrix factor: 7.6%, N=16k
28
28 Numerical results: Nanotube2 (a single-wall carbon nanotube) Myrinet Ethernet
29
29 Numerical results: Nanowire25 (a diamond nanowire) Non-zero density of matrix factor: 15%, N=16k
30
30 Numerical results: Nanowire25 (a diamond nanowire) Myrinet Ethernet
31
31 Numerical results: Diamond (a diamond crystal) Non-zero density of matrix factor: 51%, N=16k
32
32 Numerical results: Diamond (a diamond crystal) * * npMat=4 * Myrinet Ethernet
33
33 Summary SIPs: a new multiple Shift-and-Invert Parallel eigensolver. Competitive computational speed: - matrices with sparse factorization: SIPs: (O(N 2 )); ScaLAPACK: (O(N 3 )) - matrices with dense factorization: SIPs outperforms ScaLAPCK on slower network (fast Ethernet) as the number of processors increases Efficient memory usage: SIPs solves much larger eigenvalue problems than ScaLAPACK, e.g., nproc=64, SIPs: N>64k; ScaLAPACK: N=19k Object-oriented design: - developed on top of PETSc and SLEPc. PETSc provides sequential and parallel data structure; SLEPc offers built-in support for eigensolver and spectral transformation. - through the interfaces of PETSc and SLEPc, SIPs easily uses external eigenvalue package ARPACK and parallel sparse direct solver MUMPS. The packages can be upgraded or replaced without extra programming effort.
34
34 Challenges ahead … Memory Execution time Numerical difficulties!!! eigenvalue spectrum (-1.5, 0.5)=O(1) -> huge eigenvalue clusters -> large eigenspace with extremely sensitive vectors Increase or mix arithmetic precision? Eigenspace replaces individual eigenvectors? Use previously computed eigenvectors as initial guess? Adaptive residual tol? New model? … Matrix Size 6k 32k 64k <- We are here200k
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.