Download presentation
Presentation is loading. Please wait.
Published byCecil Mitchell Modified over 6 years ago
1
05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou User and Application Support Scientific Computing AUTH
2
Presentation Outline Problem Description Serial Implementation
05/23/11 Presentation Outline Problem Description Serial Implementation Parallel implementation Numerical Results Conclusions 2
3
05/23/11 Problem Description 3
4
Linear Algebra Linear System Solution (LAPACK/SCALAPACK) Ax=b
05/23/11 Linear Algebra Linear System Solution (LAPACK/SCALAPACK) Ax=b Matrix-Matrix Multiplication (BLAS/PBLAS) Ax-b=0 4
5
Linear Algebra Libraries
05/23/11 Linear Algebra Libraries Serial Implementation BLAS (Basic Linear Algebra Subprograms) LAPACK (Linear Algebra PACKage) Parallel Implementation BLACS (Basic Linear Algebra Communication Subprograms) PBLAS (Parallel BLAS) SCALAPACK (Scalable LAPACK) 5
6
05/23/11 Example Case 6
7
05/23/11 Example Case N M 7
8
05/23/11 Example Case N NRHS M 8
9
Serial Implementation (Ax=b)
05/23/11 Serial Implementation (Ax=b) DGESV: Computes the solution to a real system of linear equations A*X=B DGESV( N, NRHS, A, LDA, IPIV, B, LDB, INFO ) N : The number of linear equations. NRHS: The number of right hand sides. A: On entry, the N-by-N matrix A On exit, the factors L and U of A. LDA: The leading dimension of the matrix A. IPIV: The pivot indices that define the permutation matrix P B: On entry, the matrix of right hand side matrix B. On exit, the N-by-NRHS solution matrix X. LDB: The leading dimension of the array B. INFO: If equal to zero the solve was successful. 9
10
Serial Implementation (Ax-b=0)
05/23/11 Serial Implementation (Ax-b=0) DGEMM: Perform one of the matrix-matrix operations C = a*A*B + b*C DGEMM( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) TRANSA: Specifies if “normal” or “transpose” matrix A will be used. TRANSB: Specifies if “normal” or “transpose” matrix B will be used. M: The number of rows of the matrix A and C. N: The number of columns of the matrix B and C. K: The number of columns of the matrix A and rows of the matrix B. ALPHA: The scalar alpha. A: The M-by-K matrix A. LDA: The leading dimension of the matrix A. B: The K-by-N matrix B. LDB: The leading dimension of the matrix A. BETA: The scalar beta. C: The M-by-N matrix C. LDC: The leading dimension of the matrix A. 10
11
Example Case (Parallel Implementation)
05/23/11 Example Case (Parallel Implementation) 11
12
Example Case (Parallel Implementation)
05/23/11 Example Case (Parallel Implementation) N NRHS M 12
13
Example Case (Parallel Implementation)
05/23/11 Example Case (Parallel Implementation) 2 x 2 = 4 cpus N NRHS M 13
14
Example Case (Parallel Implementation)
05/23/11 Example Case (Parallel Implementation) 3 x 2 = 6 cpus N NRHS M 14
15
Parallel Implementation (Ax=b)
05/23/11 Parallel Implementation (Ax=b) PDGESV: Computes the solution to a real system of linear equations A*X=B DGESV ( N, NRHS, A, LDA, IPIV, B, LDB, INFO ) PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO ) IA: The row index in the global array A. JA: The column index in the global array A. DESCA: The array descriptor for the distributed matrix A. IB: The row index in the global array B. JB: The column index in the global array B. DESCB: The array descriptor for the distributed matrix B. 15
16
Parallel Implementation (Ax-b=0)
05/23/11 Parallel Implementation (Ax-b=0) PDGEMM: Perform one of the matrix-matrix operations C = a*A*B + b*C DGEMM ( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) PDGEMM( TRANSA, TRANSB, M, N, K, ALPHA, A, IA, JA, DESCA, B, IB, JB, DESCB, BETA, C, IC, JC, DESCC ) IA: The row index in the global array A. JA: The column index in the global array A. DESCA: The array descriptor for the distributed matrix A. IB: The row index in the global array B. JB: The column index in the global array B. DESCB: The array descriptor for the distributed matrix B. IC: The row index in the global array C. JC: The column index in the global array C. DESCC: The array descriptor for the distributed matrix C. 16
17
Serial Implementation
05/23/11 Serial Implementation Standard BLAS Goto BLAS ATLAS BLAS ACML (AMD Core Math Library) Intel MKL 17
18
Serial Implementation Results (Ax-b=0)
05/23/11 Serial Implementation Results (Ax-b=0) * Intel Xeon E5345 @ 2.33GHz 18
19
Serial Implementation Results (Ax=b)
05/23/11 Serial Implementation Results (Ax=b) * Intel Xeon E5345 @ 2.33GHz 19
20
Parallel Implementation Results (Ax-b=0)
05/23/11 Parallel Implementation Results (Ax-b=0) * Intel Xeon E5345 @ 2.33GHz 20
21
Parallel Implementation Results (Ax=b)
05/23/11 Parallel Implementation Results (Ax=b) * Intel Xeon E5345 @ 2.33GHz 21
22
Conclusions Optimized Linear Algebra Libraries improve performance
05/23/11 Conclusions Optimized Linear Algebra Libraries improve performance Scale becomes better as the problems get bigger Distributed Memory Libraries can treat larger problems than Shared Memory Libraries 22
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.