Presentation is loading. Please wait.

Presentation is loading. Please wait.

edit type on title master Fortran ISV Release I to L LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 2 Hsin-Ying Lin

Similar presentations


Presentation on theme: "edit type on title master Fortran ISV Release I to L LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 2 Hsin-Ying Lin"— Presentation transcript:

1

2 edit type on title master Fortran ISV Release I to L LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 2 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Tuning LINPACK NxN for HP Platforms Hsin-Ying Lin [lin@rsn.hp.com]lin@rsn.hp.com Piotr Luszczek [luszczek@utk.edu] MLIB team/HEPS/SCL/TCD Hewlett Packard Company HiPer ’ 01 Bremen, Germany October 8, 2001

3 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 3 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Why tune LINPACK N*N  Customers use TOP500 list as one of the criteria to purchase machines  HP wants to increase the number of computers on the TOP500 list and to help demonstrate HP’s commitment to high performance computing  See http://www.top500.org/http://www.top500.org/

4 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 4 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM What is LINPACK NxN  LINPACK NxN benchmark Solves system of linear equations by some method Allows the vendors to choose size of problem for benchmark Measures execution time for each size problem  LINPACK NxN report N max – the size of the chosen problem run on a machine R max – the performance in Gflop/s for the chosen size problem run on the machine N 1/2 – the size where half the R max execution rate is achieved R peak – the theoretical peak performance Gflop/s for the machine  LINPACK NxN is used to rank TOP500 fastest computers in the world

5 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 5 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM TOP500 – Past, Present, and Future  June 2000 – 47 HP systems Cut-off: 43.82 Gflop/s (Performance of 500 th computer)  November 2000 – 5 HP systems Cut-off: 55.1 GFLOP/s (26% increase from June 2000)  June 2001 – 41 HP systems Cut-off: 67.78 GFLOP/s (23% increase from November 2000)  November 2001 – ??? HP systems Cut-off: 83-92 GFLOP/s (23-36% estimated increase from June 2001)

6 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 6 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM HP list in TOP500 (June 2001)

7 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 7 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM HP’s TOP500 Status and Goals  About 30 systems missed the entry threshold 55.1 Gflop/s by 1 Gflop/s on Nov. 1, 2000 Goal for Nov. 1, 2001: Ensure all 64 CPU Superdome systems are listed in TOP500  Lack of excellent MPI based Linpack N*N algorithms despite relatively good single node Linpack N*N performance Goal for Nov. 1, 2001: Develop better scalable algorithm for multiple node systems

8 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 8 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM The Road to Highly Scalable LINPACK NxN Algorithm Studied the public domain software HPL (High Performance LINPACK benchmark): Q: Why HPL? A: Other vendors use HPL for their LINPACK N*N benchmark and show good scalability. See: http://www.netlib.org/benchmark/hplhttp://www.netlib.org/benchmark/hpl

9 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 9 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM HPL(High Performance LINPACK)  MPI implementation of LINPACK NxN benchmark  Algorithm keywords One- and two-dimensional block-cyclic data distribution Right-looking variant of the LU factorization Row partial pivoting Multiple look-ahead depths Recursive panel factorization  Highly tunable (matrix dimension, blocking factor, grid topology, broadcast/factorization algorithms, data alignment)

10 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 10 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM HPL(High Performance LINAPCK) HPL solves a linear system of order n of the form: A x = b  Compute LU factorization with partial pivoting of n-by-(n+1) matrix: [A,b] = [[L,U],y]  Since the lower triangular factor L is applied to b as factorization progress, the solution x is obtained by solving the upper triangular system: Ux = y

11 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 11 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Caveat of HPL  The lower triangular matrix L is left un- pivoted and the array of pivots is not returned.  Array b is part of Matrix A.  These imply that HPL is not a general LU factorization software and it cannot be used to solve multiple right hand sides simultaneously.

12 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 12 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Cyclic 1D division of matrix into 8 panels – with 4 processors 0 1 2 3 4 56 7 P0P0 P3P3 P2P2 P1P1 P0P0 P3P3 P2P2 P1P1 Factor panel 0 Update panel 1-7 using panel 0 Factor panel 1 Update panel 2-7 using panel 1 Factor panel 7... Factor panel 2.

13 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 13 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Look Ahead Algorithm 0 1 2 3 4 56 7 P0P0 P3P3 P2P2 P1P1 P0P0 P3P3 P2P2 P1P1 Factor panel 0 Update panel 5 using panel 0 Factor panel 1.. Mark panel 1 as factored Update panel 1 using panel 0 Update panel 5 using panel 1

14 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 14 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Characteristics of HPL  Is most suitable for cluster system, i.e. relatively many low-performance CPUs connected with a relatively low-speed network.  Is not suitable for SMPs as MPI incurs overhead which causes substantial deterioration of performance for a benchmark code.  When look-ahead technique is used with MPI, it requires additional memory to be allocated on each CPU for communication buffer. In an SMP system, such buffer is unnecessary due to the shared memory mechanism.

15 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 15 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Approach for Tuning LINPACK NxN  Leverage algorithms in HPL Use pthreads instead of MPI for single node Use hybrid of MPI and pthreads for multi-node (Constellation) system; MPI across nodes and pthreads within the node  Leverage HP MLIB’s BLAS routines to improve single CPU performance. See http://www.hp.com/go/mlib http://www.hp.com/go/mlib

16 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 16 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM SD PA8600 vs. other machines Note: Small is better for the number under “Ratio”

17 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 17 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Constellation PA8600 Performance 1.9x 3.8x 3.9x G: Gigabit Ethernet H: Hyper Fabric

18 LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 18 Hsin-Ying Lin lin@rsn.hp.com (T)(972)497-4897 hiper01.ppt Printed:2/4/2016 6:43:31 AM Summary  We believe that we reached our first goal.  Accomplished our second goal to have better scalable code for HP Constellation system.  4x32 CPUs SD PA8600 could be ranked close to TOP 100, based on TOP500 list of June 2001.  1x64 CPUs SD PA8600 could be ranked within TOP 250 based on TOP500 list of June 2001.  Performance/CPU of SD PA8600 is about 1.5x, 1.9x, and 2.5x of IBM Power3, SGI O3000, and Sun HPC1000 respectively.


Download ppt "edit type on title master Fortran ISV Release I to L LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 2 Hsin-Ying Lin"

Similar presentations


Ads by Google