Benchmarks for Parallel Systems Sources/Credits:  “Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University.

Benchmarks for Parallel Systems Sources/Credits:  “Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University of Tennessee, Knoxville TN, 37996, Computer Science Technical Report Number CS - 89 – 85, April 8, 2004, url:http://www.netlib.org/benchmark/performance.psurl:http://www.netlib.org/benchmark/performance.ps  http://www.top500.org http://www.top500.org  FAQ: http://www.netlib.org/utk/people/JackDongarra/faq- linpack.htmlhttp://www.netlib.org/utk/people/JackDongarra/faq- linpack.html  Courtesy: Jack Dongarra (Top500)  http://www.top500.org http://www.top500.org  The LINPACK Benchmark: Past, Present, and Future, Jack Dongarra, Piotr Luszczek, and Antoine Petitet  NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/

LINPACK (Dongarra: 1979)  Dense system of linear equations  Initially used as a user’s guide for LINPACK package  LINPACK – 1979  N=100 benchmark, N=1000 benchmark, Highly Parallel Computing benchmark

LINPACK benchmark  Implemented on top of BLAS1  2 main operations – DGEFA(Gaussian elimination - O(n 3 )) and DGESL(Ax = b – O(n 2 ))  Major operation (97%) – DAXPY: y = y + α.x  Called n 3 /3 + n 2 times. Hence 2n 3 /3 + 2n 2 flops (approx.)  64-bit floating point arithmetic

LINPACK  N=100, 100x100 system of equations. No change in code. User asked to give a timing routine called SECOND, no compiler optimizations  N=1000, 1000x1000 – user can implement any code, should provide the required accuracy: Towards Peak Performance (TPP). Driver program always uses 2n 3 /3 +2n 2  “Highly Parallel Computing” benchmark – any software, matrix size can be chosen. Used in Top500  Based on 64-bit floating point arithmetic

LINPACK  100x100 – inner loop optimization  1000x1000 – three-loop/whole program optimization  Scalable parallel program – Largest problem that can fit in memory

HPL (High Performance LINPACK)

HPL Algorithm 2-D block-cyclic data distribution Right-looking LU Panel factorization: various options - Crout, left or right-looking recursive variants based on matrix multiply - Number of sub-panels - recursive stopping criteria - pivot search and broadcast by binary-exchange

HPL algorithm  Panel broadcast: -  Update of trailing matrix: - look-ahead pipeline  Validity check - should be O(1)

Top500 (www.top500.org)  Top500 – 1993  Twice a year – June and November  Top500 gives Nmax, Rmax, N1/2, Rpeak

TOP500 list – Data shown  ManufacturerManufacturer or vendor  Computer Type indicated by manufacturer or vendor  Installation SiteCustomer  LocationLocation and country  YearYear of installation/last major update  Installation Type Academic, Research, Industry, Vendor, Classified, Government  Installation Areae.g. Research: Energy / Industry: Finance  # ProcessorsNumber of processors  R max Maxmimal LINPACK performance achieved  R peak Theoretical peak performance  N max Problem size for achieving Rmax  N 1/2 Problem size for achieving half of Rmax  N world Position within the TOP500 ranking

24th List: The TOP 5 Ran k Site Country/Year Computer / Processors Manufacturer R max R peak 1IBM/DOE IBM/DOE United States/2004 BlueGene/L beta-System BlueGene/L DD2 beta-System (0.7 GHz PowerPC 440) / 32768 IBM BlueGene/L DD2 beta-System (0.7 GHz PowerPC 440) 70720 91750 2NASA/Ames Research Center/NAS NASA/Ames Research Center/NAS United States/2004 Columbia SGI Altix 1.5 GHz, Voltaire Infiniband / 10160 SGI SGI Altix 1.5 GHz, Voltaire Infiniband 51870 60960 3The Earth Simulator Center The Earth Simulator Center Japan/2002 Earth-SimulatorEarth-Simulator / 5120 NEC 35860 40960 4Barcelona Supercomputer Center Barcelona Supercomputer Center Spain/2004 MareNostrum eServer BladeCenter JS20 (PowerPC970 2.2 GHz), Myrinet / 3564 IBM eServer BladeCenter JS20 (PowerPC970 2.2 GHz), Myrinet 20530 31363 5Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory United States/2004 Thunder Intel Itanium2 Tiger4 1.4GHz - Quadrics / 4096 California Digital Corporation Intel Itanium2 Tiger4 1.4GHz - Quadrics 19940 22938

24th List: India RankSite Country/Year Computer / Processors Manufacturer R max R peak 267Tech Pacific Exports C Tech Pacific Exports C India/2004 Integrity Superdome, 1.5 GHz, HPlexIntegrity Superdome, 1.5 GHz, HPlex / 288 HP 1210 1728 289Semiconductor Company (F) Semiconductor Company (F) India/2003 xSeries Cluster Xeon 2.4 GHz - Gig-ExSeries Cluster Xeon 2.4 GHz - Gig-E / 574 IBM 1196.41 2755.2 435Geoscience (C) Geoscience (C) India/2004 xSeries Xeon 3.06 GHz - Gig-ExSeries Xeon 3.06 GHz - Gig-E / 256 IBM 961.28 1566.72 438Institute of Mathematical Sciences, C.I.T Campus Institute of Mathematical Sciences, C.I.T Campus India/2004 KABRU Pentium Xeon Cluster 2.4 GHz - SCI 3D / 288 IMSc-Netweb-Summation Pentium Xeon Cluster 2.4 GHz - SCI 3D 959 1382.4 445Geoscience (B) Geoscience (B) India/2004 BladeCenter Xeon 3.06 GHz, Gig-EthernetBladeCenter Xeon 3.06 GHz, Gig-Ethernet / 252 IBM 946.26 1542.24 446Geoscience (B) Geoscience (B) India/2004 BladeCenter Xeon 3.06 GHz, Gig-EthernetBladeCenter Xeon 3.06 GHz, Gig-Ethernet / 252 IBM 946.26 1542.24 447Geoscience (B) Geoscience (B) India/2004 BladeCenter Xeon 3.06 GHz, Gig-EthernetBladeCenter Xeon 3.06 GHz, Gig-Ethernet / 252 IBM 946.26 1542.24 448Geoscience (B) Geoscience (B) India/2004 BladeCenter Xeon 3.06 GHz, Gig-EthernetBladeCenter Xeon 3.06 GHz, Gig-Ethernet / 252 IBM 946.26 1542.24

Manufacturer

Architecture

Processor Generation

System Processor Count

NAS Parallel Benchmarks - NPB  Also for evaluation of Supercomputers  A set of 8 programs from CFD  5 kernels, 3 pseudo applications  NPB 1 – Original benchmarks  NPB 2 – NAS’s MPI implementation. NPB 2.4 Class D has more work and more I/O  NPB 3 – based on OpenMP, HPF, Java  GridNPB3 – for computational grids  NPB 3 multi-zone – for hybrid parallelism

NPB 1.0 (March 1994)  Defines class A and class B versions  “Paper and pencil” algorithmic specifications  Generic benchmarks as compared to MPI-based LinPack  General rules for implementations – Fortran90 or C, 64-bit arithmetic etc.  Sample implementations provided

Kernel Benchmarks  EP – embarrassingly parallel  MG – multigrid. Regular communication  CG – conjugate gradient. Irregular long distance communication  FT – a 3-D PDE using FFT. Rigorous test of long distance communication  IS – large integer sort  Detailed rules regarding - brief statement of the problem - algorithm to be practiced - validation of results - where to insert timing calls - method for generating random numbers - submission of results

Pseudo applications / Synthetic CFDs  Benchmark 1 – perform few iterations of the approximate factorization algorithm (SP)  Benchmark 2 - perform few iterations of diagonal form of the approximate factorization algorithm (BT)  Benchmark 3 - perform few iterations of SSOR (LU)

Class A and Class B Sample Code Class A Class B

NPB 2.0 (1995)  MPI and Fortran 77 implementations  2 parallel kernels (MG, FT) and 3 simulated applications (LU, SP, BT)  Class C – bigger size  Benchmark rules – 0%, 5%, >5% change in source code

NPB 2.2 (1996), 2.4 (2002), 2.4 I/O (Jan 2003)  EP and IS added  FT rewritten  NPB 2.4 – class D and rationale for class D sizes  2.4 I/O – a new benchmark problem based on BT (BTIO) to test the output capabilities  A MPI implementation of the same (MPI-IO) – different options using collective buffering or not etc.

Benchmarks for Parallel Systems Sources/Credits:  “Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University.

Similar presentations

Presentation on theme: "Benchmarks for Parallel Systems Sources/Credits:  “Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Benchmarks for Parallel Systems Sources/Credits:  “Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University.

Similar presentations

Presentation on theme: "Benchmarks for Parallel Systems Sources/Credits:  “Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University."— Presentation transcript:

Similar presentations

About project

Feedback