Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to ScaLAPACK. 2 ScaLAPACK ScaLAPACK is for solving dense linear systems and computing eigenvalues for dense matrices.

Similar presentations


Presentation on theme: "Introduction to ScaLAPACK. 2 ScaLAPACK ScaLAPACK is for solving dense linear systems and computing eigenvalues for dense matrices."— Presentation transcript:

1 Introduction to ScaLAPACK

2 2 ScaLAPACK ScaLAPACK is for solving dense linear systems and computing eigenvalues for dense matrices

3 3 ScaLAPACK Scalable Linear Algebra PACKage Developing team from University of Tennessee, University of California Berkeley, ORNL, Rice U.,UCLA, UIUC etc. Support in Commercial Packages –NAG Parallel Library –IBM PESSL –CRAY Scientific Library –VNI IMSL –Fujitsu, HP/Convex, Hitachi, NEC

4 4 Description A Scalable Linear Algebra Package (ScaLAPACK) A library of linear algebra routines for MPP and NOW machines Language : Fortran Dense Matrix Problem Solvers – Linear Equations – Least Squares – Eigenvalue

5 5 Technical Information ScaLAPACK web page –http://www.netlib.org/scalapackhttp://www.netlib.org/scalapack ScaLAPACK User’s Guide Technical Working Notes –http://www.netlib.org/lapack/lawnhttp://www.netlib.org/lapack/lawn

6 6 Packages LAPACK –Routines are single-PE Linear Algebra solvers, utilities, etc. BLAS –Single-PE processor work-horse routines (vector, vector-matrix & matrix-matrix) PBLAS –Parallel counterparts of BLAS. Relies on BLACS for blocking and moving data.

7 7 Packages (cont.) BLACS –Constructs 1-D & 2-D processor grids for matrix decomposition and nearest neighbor communication patterns. Uses message passing libraries (MPI, PVM, MPL, NX, etc) for data movement.

8 8 Dependencies & Parallelism of Component Packages Serial Parallel

9 9 Data Partitioning Initialize BLACS routines –Set up 2-D mesh (mp, np) –Set (CorR) Column or Row Major Order –Call returns context handle (icon) –Get row & column through gridinfo (myrow,mycol) blacs_gridinit( icon, CorR, mp, np ) blacs_gridinfo( icon, mp, np, myrow, mycol)

10 1010 Data Partitioning (cont.) Block-cyclic Array Distribution. –Block: (groups of sequential elements) –Cyclic: (round-robin assignment to PEs) Example 1-D Partitioning: –Global 9-element array distributed on a 3-PE grid with 2 element blocks: block(2)-cyclic(3)

11 1 Block-Cyclic Partitioning

12 1212 2-D Partitioning Example 2-D Partitioning –A(9,9) mapped onto PE(2,3) grid block(2,2) –Use 1-D example to distribute column elements in each row, a block(2)-cyclic(3) distribution. –Assign rows to first PE dimension by block(2)- cyclic(2) distribution.

13 1313

14 1414

15 1515 Syntax for descinit OUT idesc Descriptor IN m Row Size (Global) IN n Column Size (Global) IN mb Row Block Size IN nb Column Size IN i Starting Grid Row IN j Starting Grid Column IN icon BLACS context IN mla Leading Dimension of Local Matrix OUT ier Error number The starting grid location is usually (i=0,j=0). IorO arg Description descinit(idesc, m,n, mb,nb, i,j, icon, mla, ier)

16 1616 Application Program Interfaces (API) Drivers –Solves a Complete Problem Computational Components –Performs Tasks: LU factorization, etc. Auxiliary Routines –Scaling, Matrix Norm, etc. Matrix Redistribution/Copy Routine –Matrix on PE grid1 -> Matrix on PE grid2

17 1717 API (cont..) Prepend LAPACK equivalent names with P. PXYYZZZ Computation Performed Matrix Type Data Types Data Type real double cmplx dble cmplx X S D C Z

18 1818 Matrix Types (YY) PXYYZZZ DB General Band (diagonally dominant-like) DT general Tridiagonal (Diagonally dominant-like) GB General Band GE GEneral matrices (e.g., unsymmetric, rectangular, etc.) GG General matrices, Generalized problem HE Complex Hermitian OR Orthogonal Real PB Positive definite Banded (symmetric or Hermitian) PO Positive definite (symmetric or Hermitian) PT Positive definite Tridiagonal (symmetric or Hermitian) ST Symmetric Tridiagonal Real SY SYmmetric TR TRiangular (possibly quasi-triangular) TZ TrapeZoidal UN UNitary complex

19 1919 Routine types SLLinear Equations (SVX*) SVLU Solver VDSingular Value EVEigenvalue (EVX**) GVXGeneralized Eigenvalue *Estimates condition numbers. *Provides Error Bounds. *Equilibrates System. **Selected Eigenvalues Drivers (ZZZ) PXYYZZZ

20 2020 Rules of Thumb Use a square processor grid, mp=np Use block size 32 (Cray) or 64 –for efficient matrix multiply Number of PEs to use = MxN/10**6 Bandwith/node ( MB /sec) > peak rate ( MFLOPS )/10 Latency < 500 usecs Use vendor’s BLAS

21 2121 Example program: Linear Equation Solution Ax=b (page 1) program Sample_ScaLAPACK_program implicit none include "mpif.h" REAL :: A ( 1000, 1000) REAL, DIMENSION(:,:), ALLOCATABLE :: A_local INTEGER, PARAMETER :: M_A=1000, N_A=1000 INTEGER :: DESC_A(1:9) REAL, DIMENSION(:,:), ALLOCATABLE :: b_local REAL :: b( 1000,1 ) INTEGER, PARAMETER :: M_B=1000, N_B=1 INTEGER :: DESC_b(1:9),i,j INTEGER :: ldb_y INTEGER :: ipiv(1:1000),info INTEGER :: myrow,mycol !row and column # of PE on PE grid INTEGER :: ictxt !context handle.....like MPIcommunicator INTEGER :: lda_y,lda_x ! leading dimension of local array A(@) ! row and column # of PE grid INTEGER, PARAMETER :: nprow=2, npcol=2 ! ! RSRC...the PE row that owns the first row of A ! CSRC...the PE column that owns the first column of A

22 2 Example program (page 2) INTEGER, PARAMETER :: RSRC = 0, CSRC = 0 INTEGER, PARAMETER :: MB = 32, NB=32 ! ! iA...the global row index of A, which points to the beginning !of submatrix that will be operated on. ! INTEGER, PARAMETER :: iA = 1, jA = 1 INTEGER, PARAMETER :: iB = 1, jB = 1 REAL :: starttime, endtime INTEGER :: taskid,numtask,ierr !********************************** ! NOTE: ON THE NPACI MACHINES DO NOT NEED TO CALL INITBUFF ! ON OTHER MACHINES YOU MAY NEED TO CALL INITBUFF !************************************* ! ! This is where you should define any other variables ! INTEGER, EXTERNAL :: NUMROC ! ! Initialization of the processor grid. This routine ! must be called ! ! start MPI initialization routines

23 2323 Example program (page 3) CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD,taskid,ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,numtask,ierr) ! start timing call on 0,0 processor IF (taskid==0) THEN starttime =MPI_WTIME() END IF CALL BLACS_GET(0,0,ictxt) CALL BLACS_GRIDINIT(ictxt,'r',nprow,npcol) ! ! This call returns processor grid information that will be ! used to distribute the global array ! CALL BLACS_GRIDINFO(ictxt, nprow, npcol, myrow, mycol ) ! ! define lda_x and lda_y with numroc ! lda_x...the leading dimension of the local array storing the !........local blocks of the distributed matrix A ! lda_x = NUMROC(M_A,MB,myrow,0,nprow) lda_y = NUMROC(N_A,NB,mycol,0,npcol) ! ! resizing A so don't waste space !

24 2424 Example program (page 4) ALLOCATE( A_local (lda_x,lda_y)) if (N_B < NB )then ALLOCATE( b_local(lda_x,N_B) ) else ldb_y = NUMROC(N_B,NB,mycol,0,npcol) ALLOCATE( b_local(lda_x,ldb_y) ) end if ! ! defining global matrix A ! do i=1,1000 do j = 1, 1000 if (i==j) then A(i,j) = 1000.0*i*j else A(i,j) = 10.0*(i+j) end if end do ! defining the global array b do i = 1,1000 b(i,1) = 1.0 end do !

25 2525 Example program (page 5) ! subroutine setarray maps global array to local arrays ! ! YOU MUST HAVE DEFINED THE GLOBAL MATRIX BEFORE THIS POINT ! CALL SETARRAY(A,A_local,b,b_local,myrow,mycol, lda_x,lda_y ) ! ! initialize descriptor vector for scaLAPACK ! CALL DESCINIT(DESC_A,M_A,N_A,MB,NB,RSRC,CSRC,ictxt,lda_x,info) CALL DESCINIT(DESC_b,M_B,N_B,MB,N_B,RSRC,CSRC,ictxt,lda_x,info) ! ! call scaLAPACK routines ! CALL PSGESV( N_A,1,A_local,iA,jA, DESC_A,ipiv,b_local,iB,jB,DESC_b,info) ! ! check ScaLAPACK returned ok !

26 2626 Example program (page 6) if (info.ne. 0)then print *,'unsucessfull return...something is wrong! info=',info else print *,'Sucessfull return' end if CALL BLACS_GRIDEXIT( ictxt ) CALL BLACS_EXIT ( 0 ) ! call timing routine on processor 0,0 to get endtime IF (taskid==0) THEN endtime = MPI_WTIME() print*,'wall clock time in second = ',endtime - starttime END IF ! end MPI CALL MPI_FINALIZE(ierr) ! ! end of main ! END

27 2727 Example program (page 7) SUBROUTINE SETARRAY(AA,A,BB,B,myrow,mycol,lda_x, lda_y ) ! reads in global matrix A_local and ! distributes to the local arrays. ! implicit none REAL :: AA(1000,1000) REAL :: BB(1000,1) INTEGER :: lda_x, lda_y REAL :: A(lda_x,lda_y) REAL ::ll,mm,Cr,Cc INTEGER :: ii,jj,I,J,myrow,mycol,pr,pc,h,g INTEGER, PARAMETER :: nprow = 2, npcol =2 INTEGER, PARAMETER ::N=1000,M=1000,NB=32,MB=32,RSRC=0,CSRC=0 REAL :: B(lda_x,1) INTEGER, PARAMETER :: N_B = 1

28 2828 Example program (page 8) do I=1,M do J = 1,N ! finding out which PE gets this I,J element Cr = real( (I-1)/MB ) h = RSRC+aint(Cr) pr = mod( h,nprow) Cc = real( (J-1)/MB ) g = CSRC+aint(Cc) pc = mod(g,nprow) ! ! check if on this PE and then set A ! if (myrow ==pr.and. mycol==pc)then ! ii,jj coordinates of local array element ! ii = x + l*MB ! jj = y + m*NB ! ll = real( ( (I-1)/(nprow*MB) ) ) mm = real( ( (J-1)/(npcol*NB) ) ) ii = mod(I-1,MB) + 1 + aint(ll)*MB jj = mod(J-1,NB) + 1 + aint(mm)*NB A(ii,jj) = AA(I,J) end if end do

29 2929 Example Program (page 9) ! reading in and distributing B vector ! do I = 1, M J = 1 ! finding out which PE gets this I,J element Cr = real( (I-1)/MB ) h = RSRC+aint(Cr) pr = mod( h,nprow) Cc = real( (J-1)/MB ) g = CSRC+aint(Cc) pc = mod(g,nprow) ! check if on this PE and then set A if (myrow ==pr.and. mycol==pc)then ll = real( ( (I-1)/(nprow*MB) ) ) mm = real( ( (J-1)/(npcol*NB) ) ) jj = mod(J-1,NB) + 1 + aint(mm)*NB ii = mod(I-1,MB) + 1 + aint(ll)*MB B(ii,jj) = BB(I,J) end if end do end subroutine

30 3030 Cray T3E IBM SP2 Now Berkeley N= 2000 4000 6000 8000 10000 2x2 4x4 8x8 2x2 4x4 8x8 2x2 4x4 8x8 PxGEMM (mflops)

31 3131 1x4 2x8 4x16 1x4 2x8 4x16 1x4 2x8 4x16 Cray T3E IBM SP2 Now Berkeley N= 2000 5000 75000 10000 15000 1x4 2x8 4x16 1x4 2x8 4x16 1x4 2x8 4x16 PxGESV (mflops)


Download ppt "Introduction to ScaLAPACK. 2 ScaLAPACK ScaLAPACK is for solving dense linear systems and computing eigenvalues for dense matrices."

Similar presentations


Ads by Google