1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne.

Slides:



Advertisements
Similar presentations
Chapter 9 Approximating Eigenvalues
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June
1 Parallel Algorithms II Topics: matrix and graph algorithms.
Solving Linear Systems (Numerical Recipes, Chap 2)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Symmetric Eigensolvers in Sca/LAPACK Osni Marques
Reconfigurable Application Specific Computers RASCs Advanced Architectures with Multiple Processors and Field Programmable Gate Arrays FPGAs Computational.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
1 High-Performance Eigensolver for Real Symmetric Matrices: Parallel Implementations and Applications in Electronic Structure Calculation Yihua Bai Department.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CS240A: Conjugate Gradients and the Model Problem.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Parallel & Cluster Computing Linear Algebra Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.
1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton, Mitsuhisa Sato CNRS/LIFL HPCS Lab. University of Tsukuba.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Benchmarking Parallel Eigen Decomposition for Residuals Analysis of Very Large Graphs Edward Rutledge, Benjamin Miller, Michelle Beard HPEC 2012 September.
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
Distributed Linear Algebra Peter L. Montgomery Microsoft Research, Redmond, USA RSA 2000 January 17, 2000.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
CS240A: Conjugate Gradients and the Model Problem.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Al Parker July 19, 2011 Polynomial Accelerated Iterative Sampling of Normal Distributions.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
© 2012 Autodesk A Fast Modal (Eigenvalue) Solver Based on Subspace and AMG Sam MurgieJames Herzing Research ManagerSimulation Evangelist.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
What Programming Paradigms and algorithms for Petascale Scientific Computing, a Hierarchical Programming Methodology Tentative Serge G. Petiton June 23rd,
Data Structures and Algorithms in Parallel Computing Lecture 7.
1 Distributed BDD-based Model Checking Orna Grumberg Technion, Israel Joint work with Tamir Heyman, Nili Ifergan, and Assaf Schuster CAV00, FMCAD00, CAV01,
Toward an Automatically Tuned Dense Symmetric Eigensolver for Shared Memory Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya.
Report from LBNL TOPS Meeting TOPS/ – 2Investigators  Staff Members:  Parry Husbands  Sherry Li  Osni Marques  Esmond G. Ng 
Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Performance of BLAS-3 Based Tridiagonalization Algorithms on Modern SMP Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Symmetric-pattern multifrontal factorization T(A) G(A)
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems Minghao Wu AMSC Program Advisor: Dr. Howard.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
ALGEBRAIC EIGEN VALUE PROBLEMS
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Numeric/symbolic exact rational linear system solver Bryan Youse, David Wood, B. David Saunders University of Delaware, Newark, Delaware Outlook [1] Zhendong.
High Performance Computing Seminar
A survey of Exascale Linear Algebra Libraries for Data Assimilation
Sathya Ronak Alisha Zach Devin Josh
Eigenspectrum calculation of the non-Hermitian O(a)-improved Wilson-Dirac operator using the Sakurai-Sugiura method H. Sunoa, Y. Nakamuraa,b, K.-I. Ishikawac,
Is System X for Me? Cal Ribbens Computer Science Department
Stochastic Differential Equations and Random Matrices
CSCE569 Parallel Computing
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Support for Adaptivity in ARMCI Using Migratable Objects
RKPACK A numerical package for solving large eigenproblems
Presentation transcript:

1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne National Laboratory

2 Collaborators: Barry Smith Mathematics and Computer Science, Argonne National Laboratory Michael Sternberg, Peter Zapol Materials Science, Argonne National Laboratory

3 Modeling of Nanostructured Materials * System sizeAccuracy

4 Density-Functional based Tight-Binding (DFTB)

5 Matrices are large: ultimate goal 50,000 atoms with electronic structure ~ N=200,000 sparse: non-zero density -> 0 as N increases dense solutions are requested: 60% eigenvalues and eigenvectors Dense solutions of large sparse problems!

6 DFTB implementation (2002)

7 Two classes of methods: Direct methods (dense matrix storage): - compute all or almost all eigensolutions out of dense matrices of small to medium size - Tridiagonal reduction + QR or Bisection - Time = O(N 3 ), Memory = O(N 2 ) - LAPACK, ScaLAPACK Iterative methods (sparse matrix storage) : - compute a selected small set of eigensolutions out of sparse matrices of large size - Lanczos - Time = O(nnz*N) <= O(N 3 ), Memory = O(nnz) <= O(N 2 ) - ARPACK, BLZPACK,…

8 DFTB-eigenvalue problem is distinguished by (A, B) is large and sparse Iterative method A large number of eigensolutions (60%) are requested Iterative method + multiple shift-and-invert The spectrum has - poor average eigenvalue separation O(1/N), - cluster with hundreds of tightly packed eigenvalues - gap >> O(1/N) Iterative method + multiple shift-and-invert + robusness The matrix factorization of (A-  B)=LDL T : not-very-sparse(7%) <= nonzero density <= dense(50%) Iterative method + multiple shift-and-invert + robusness + efficiency Ax= Bx is solved many times (possibly 1000’s) Iterative method + multiple shift-and-invert + robusness + efficiency + initial approximation of eigensolutions

9 Lanczos shift-and-invert method for Ax = Bx: Cost: - one matrix factorization: - many triangular matrix solves: Gain: - fast convergence - clustering eigenvalues are transformed to well-separated eigenvalues - preferred in most practical cases

10 Multiple Shift-and-Invert Parallel Eigenvalue Algorithm

11 Multiple Shift-and-Invert Parallel Eigenvalue Algorithm

12 Idea: distributed spectral slicing compute eigensolutions in distributed subintervals Example: Proc[1] Assigned Spectrum: (  [0],  [2] ) shrink Computed Spectrum: {  [1] } expand Proc[0] Proc[1] Proc[2] min i min max i max  [0]  [1]  [2]

13 Software Structure MPI PETSc SLEPc MUMPS ARPACK Shift-and-Invert Parallel Spectral Transforms (SIPs) Select shifts Bookkeep and validate eigensolutions Balance parallel jobs Ensure global orthogonality of eigenvectors Subgroup of communicators

14 Software Structure ARPACK SLEPc Scalable Library for Eigenvalue Problem Computations MUMPS MUltifrontal Massively Parallel sparse direct Solver PETSc Portable, Extensible Toolkit for Scientific Computation MPI Message Passing Interface

15 Select shifts: - robustness: be able to compute all the desired eigenpairs under extreme pathological conditions - efficiency: reduce the total computation cost (matrix factorization and Lanczos runs)

16 Select shifts: ii e.g., extension to the right side of  i :   = k ( k – 1 )  mid = (  i +  max )/2  i+1 = min(  ,  mid ) k 1  i+1   max  mid

17 Eigenvalue clusters and gaps Gap detection Move shift outside of a gap

18 Bookkeep eigensolutions COMPUT COMPUT COMPUT DONE UNCOMPUT Multiple eigenvalues aross processors: proc[0] proc[1] Overlap & Match 00 11

19 Bookkeep eigensolutions

20 Balance parallel jobs

21 SIPs Proc[0] Proc[1] Proc[2] min i min max i max  [0]  [1]  [2]

22 d) pick next shift  ; update computed spectrum [  min,  max ] and send to neighboring processes e) receive messages from neighbors update its assigned spectrum ( min, max ) Proc[0] Proc[1] Proc[2] min max  [0]  [1]  [2]  [0] 1  [1] 1

23 Accuracy of the Eigensolutions Residual norm of all computed eigenvalues is inherited from ARPACK Orthogonality of the eigenvectors computed from the same shift is inherited from ARPACK Orthogonality between the eigenvectors computed from different shifts? –Each eigenvalue singleton is computed through a single shift –Eigenvalue separation between two singletons  satisfying eigenvector orthogonality

24 Subgroups of communicators: when a single process cannot store matrix factor or distributed eigenvectors [0]id = [3] idEps = 1 idMat = 0 [6][9] [1][4] idEps = 1 idMat = 1 [7][10] [2][5] idEps = 1 idMat = 2 [8][11] min max commEps commMat

25 Numerical Experiments on Jazz Jazz, Argonne National Laboratory: Compute – 350 nodes, each with a 2.4 GHz Pentium Xeon Memory – 175 nodes with 2 GB of RAM, 175 nodes with 1 GB of RAM Storage – 20 TB of clusterwide disk: 10 TB GFS and 10 TB PVFS Network – Myrinet 2000, Ethernet

26 Tests Diamond (a diamond crystal) Grainboundary-s13, Grainboundary-s29, Graphene, MixedSi, MixedSiO2, Nanotube2 (a single-wall carbon nanotube) Nanowire9 Nanowire25 (a diamond nanowire) …

27 Numerical results: Nanotube2 (a single-wall carbon nanotube) Non-zero density of matrix factor: 7.6%, N=16k

28 Numerical results: Nanotube2 (a single-wall carbon nanotube) Myrinet Ethernet

29 Numerical results: Nanowire25 (a diamond nanowire) Non-zero density of matrix factor: 15%, N=16k

30 Numerical results: Nanowire25 (a diamond nanowire) Myrinet Ethernet

31 Numerical results: Diamond (a diamond crystal) Non-zero density of matrix factor: 51%, N=16k

32 Numerical results: Diamond (a diamond crystal) * * npMat=4 * Myrinet Ethernet

33 Summary SIPs: a new multiple Shift-and-Invert Parallel eigensolver. Competitive computational speed: - matrices with sparse factorization: SIPs: (O(N 2 )); ScaLAPACK: (O(N 3 )) - matrices with dense factorization: SIPs outperforms ScaLAPCK on slower network (fast Ethernet) as the number of processors increases Efficient memory usage: SIPs solves much larger eigenvalue problems than ScaLAPACK, e.g., nproc=64, SIPs: N>64k; ScaLAPACK: N=19k Object-oriented design: - developed on top of PETSc and SLEPc. PETSc provides sequential and parallel data structure; SLEPc offers built-in support for eigensolver and spectral transformation. - through the interfaces of PETSc and SLEPc, SIPs easily uses external eigenvalue package ARPACK and parallel sparse direct solver MUMPS. The packages can be upgraded or replaced without extra programming effort.

34 Challenges ahead … Memory Execution time Numerical difficulties!!! eigenvalue spectrum (-1.5, 0.5)=O(1) -> huge eigenvalue clusters -> large eigenspace with extremely sensitive vectors Increase or mix arithmetic precision? Eigenspace replaces individual eigenvectors? Use previously computed eigenvectors as initial guess? Adaptive residual tol? New model? … Matrix Size 6k 32k 64k <- We are here200k