05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support.

05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou User and Application Support Scientific Computing AUTH

Presentation Outline Problem Description Serial Implementation
05/23/11 Presentation Outline Problem Description Serial Implementation Parallel implementation Numerical Results Conclusions 2

05/23/11 Problem Description 3

Linear Algebra Linear System Solution (LAPACK/SCALAPACK) Ax=b
05/23/11 Linear Algebra Linear System Solution (LAPACK/SCALAPACK) Ax=b Matrix-Matrix Multiplication (BLAS/PBLAS) Ax-b=0 4

Linear Algebra Libraries
05/23/11 Linear Algebra Libraries Serial Implementation BLAS (Basic Linear Algebra Subprograms) LAPACK (Linear Algebra PACKage) Parallel Implementation BLACS (Basic Linear Algebra Communication Subprograms) PBLAS (Parallel BLAS) SCALAPACK (Scalable LAPACK) 5

05/23/11 Example Case 6

05/23/11 Example Case N M 7

05/23/11 Example Case N NRHS M 8

Serial Implementation (Ax=b)
05/23/11 Serial Implementation (Ax=b) DGESV: Computes the solution to a real system of linear equations A*X=B DGESV( N, NRHS, A, LDA, IPIV, B, LDB, INFO ) N : The number of linear equations. NRHS: The number of right hand sides. A: On entry, the N-by-N matrix A On exit, the factors L and U of A. LDA: The leading dimension of the matrix A. IPIV: The pivot indices that define the permutation matrix P B: On entry, the matrix of right hand side matrix B. On exit, the N-by-NRHS solution matrix X. LDB: The leading dimension of the array B. INFO: If equal to zero the solve was successful. 9

Serial Implementation (Ax-b=0)
05/23/11 Serial Implementation (Ax-b=0) DGEMM: Perform one of the matrix-matrix operations C = a*A*B + b*C DGEMM( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) TRANSA: Specifies if “normal” or “transpose” matrix A will be used. TRANSB: Specifies if “normal” or “transpose” matrix B will be used. M: The number of rows of the matrix A and C. N: The number of columns of the matrix B and C. K: The number of columns of the matrix A and rows of the matrix B. ALPHA: The scalar alpha. A: The M-by-K matrix A. LDA: The leading dimension of the matrix A. B: The K-by-N matrix B. LDB: The leading dimension of the matrix A. BETA: The scalar beta. C: The M-by-N matrix C. LDC: The leading dimension of the matrix A. 10

Example Case (Parallel Implementation)
05/23/11 Example Case (Parallel Implementation) 11

05/23/11 Example Case (Parallel Implementation) N NRHS M 12

05/23/11 Example Case (Parallel Implementation) 2 x 2 = 4 cpus N NRHS M 13

05/23/11 Example Case (Parallel Implementation) 3 x 2 = 6 cpus N NRHS M 14

Parallel Implementation (Ax=b)
05/23/11 Parallel Implementation (Ax=b) PDGESV: Computes the solution to a real system of linear equations A*X=B DGESV ( N, NRHS, A, LDA, IPIV, B, LDB, INFO ) PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO ) IA: The row index in the global array A. JA: The column index in the global array A. DESCA: The array descriptor for the distributed matrix A. IB: The row index in the global array B. JB: The column index in the global array B. DESCB: The array descriptor for the distributed matrix B. 15

Parallel Implementation (Ax-b=0)
05/23/11 Parallel Implementation (Ax-b=0) PDGEMM: Perform one of the matrix-matrix operations C = a*A*B + b*C DGEMM ( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC ) PDGEMM( TRANSA, TRANSB, M, N, K, ALPHA, A, IA, JA, DESCA, B, IB, JB, DESCB, BETA, C, IC, JC, DESCC ) IA: The row index in the global array A. JA: The column index in the global array A. DESCA: The array descriptor for the distributed matrix A. IB: The row index in the global array B. JB: The column index in the global array B. DESCB: The array descriptor for the distributed matrix B. IC: The row index in the global array C. JC: The column index in the global array C. DESCC: The array descriptor for the distributed matrix C. 16

Serial Implementation
05/23/11 Serial Implementation Standard BLAS Goto BLAS ATLAS BLAS ACML (AMD Core Math Library) Intel MKL 17

Serial Implementation Results (Ax-b=0)
05/23/11 Serial Implementation Results (Ax-b=0) * Intel Xeon E5345 @ 2.33GHz 18

Serial Implementation Results (Ax=b)
05/23/11 Serial Implementation Results (Ax=b) * Intel Xeon E5345 @ 2.33GHz 19

Parallel Implementation Results (Ax-b=0)
05/23/11 Parallel Implementation Results (Ax-b=0) * Intel Xeon E5345 @ 2.33GHz 20

Parallel Implementation Results (Ax=b)
05/23/11 Parallel Implementation Results (Ax=b) * Intel Xeon E5345 @ 2.33GHz 21

Conclusions Optimized Linear Algebra Libraries improve performance
05/23/11 Conclusions Optimized Linear Algebra Libraries improve performance Scale becomes better as the problems get bigger Distributed Memory Libraries can treat larger problems than Shared Memory Libraries 22

05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support.

Similar presentations

Presentation on theme: "05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support.

Similar presentations

Presentation on theme: "05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support."— Presentation transcript:

Similar presentations

About project

Feedback