Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 High Performance Computing: A Look Behind and Ahead Jack Dongarra Computer Science Department University of Tennessee.

Similar presentations


Presentation on theme: "1 High Performance Computing: A Look Behind and Ahead Jack Dongarra Computer Science Department University of Tennessee."— Presentation transcript:

1 1 High Performance Computing: A Look Behind and Ahead Jack Dongarra Computer Science Department University of Tennessee

2 2 High Performance Computers u From the beginning of the digital age, supercomputers have been time machines that let researchers peer into the future, both intellectually and temporally.  Intellectually they bring to life models of complex phenomena when economics and other constraints preclude experimentation.  Temporally they reduce the time to solution by enabling us to evaluate larger and more complex models than would be possible on more conventional systems.

3 3 High Performance Computers u ~ 25 years ago  1x10 6 Floating Point Ops/sec (Mflop/s) »Scalar based u ~ 15 years ago  1x10 9 Floating Point Ops/sec (Gflop/s) »Vector & Shared memory computing, bandwidth aware »Block partitioned, latency tolerant u ~ Today  1x10 12 Floating Point Ops/sec (Tflop/s) »Highly parallel, distributed processing, message passing, network based »data decomposition, communication/computation u ~ 5 years away  1x10 15 Floating Point Ops/sec (Pflop/s) »Many more levels MH, combination/grids&HPC »More adaptive, LT and bandwidth aware, fault tolerant, extended precision, attention to SMP nodes

4 4 Architecture/Systems Continuum u Custom processor with custom interconnect  Cray X1  NEC SX-7  IBM Regatta  IBM Blue Gene/L u Commodity processor with custom interconnect  SGI Altix »Intel Itanium 2  Cray Red Storm »AMD Opteron u Commodity processor with commodity interconnect  Clusters »Pentium, Itanium, Opteron, Alpha »GigE, Infiniband, Myrinet, Quadrics  NEC TX7  IBM eServer  Dawning Loosely Coupled Tightly Coupled u Best processor performance for codes that are not “cache friendly” u Good communication performance u Simplest programming model u Most expensive u Good communication performance u Good scalability u Best price/performance (for codes that work well with caches and are latency tolerant) u More complex programming model Custom Commod Hybrid

5 5 Top 500 Computers - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem Updated twice a year SC‘xy in the States in November Meeting in Germany in June Size Rate TPP performance

6 6 u A supercomputer is a hardware and software system that provides close to the maximum performance that can currently be achieved. u Over the last 10 years the range for the Top500 has increased greater than Moore’s Law u 1993:  #1 = 59.7 GFlop/s  #500 = 422 MFlop/s u 2004:  #1 = 70 TFlop/s  #500 = 850 GFlop/s What is a Supercomputer? Why do we need them? Almost all of the technical areas that are important to the well-being of humanity use supercomputing in fundamental and essential ways. Computational fluid dynamics, protein folding, climate modeling, national security, in particular for cryptanalysis and for simulating nuclear weapons to name a few.

7 7 24th List: The TOP10 ManufacturerComputer Rmax [TF/s] Installation SiteCountryYear#Proc 1IBM BlueGene/L β-System 70.72DOE/IBMUSA200432768 2SGI Columbia Altix, Infiniband 51.87NASA AmesUSA200410160 3NECEarth-Simulator35.86Earth Simulator CenterJapan20025120 4IBM MareNostrum BladeCenter JS20, Myrinet 20.53 Barcelona Supercomputer Center Spain20043564 5CCD Thunder Itanium2, Quadrics 19.94 Lawrence Livermore National Laboratory USA20044096 6HP ASCI Q AlphaServer SC, Quadrics 13.88 Los Alamos National Laboratory USA20028192 7Self Made X Apple XServe, Infiniband 12.25Virginia TechUSA20042200 8IBM/LLNL BlueGene/L DD1 500 MHz 11.68 Lawrence Livermore National Laboratory USA20048192 9IBMpSeries 65510.31Naval Oceanographic OfficeUSA20042944 10Dell Tungsten PowerEdge, Myrinet 9.82NCSAUSA20032500 399 system > 1 TFlop/s; 294 machines are clusters, top10 average 8K proc

8 8 Performance Development My Laptop

9 9 Performance Projection

10 10 Performance Projection My Laptop

11 11 Chip (2 processors) Compute Card (2 chips, 2x1x1) 4 processors Node Card (32 chips, 4x4x2) 16 Compute Cards 64 processors System (64 racks, 64x32x32) 131,072 procs Rack (32 Node boards, 8x8x16) 2048 processors 2.8/5.6 GF/s 4 MB (cache) 5.6/11.2 GF/s 1 GB DDR 90/180 GF/s 16 GB DDR 2.9/5.7 TF/s 0.5 TB DDR 180/360 TF/s 32 TB DDR IBM BlueGene/L 131,072 Processors “Fastest Computer” BG/L 700 MHz 32K proc 16 racks Peak: 91.7 Tflop/s Linpack: 70.7 Tflop/s 77% of peak BlueGene/L Compute ASIC Full system total of 131,072 processors

12 12 Customer Segments / Systems

13 13 Manufacturers / Systems

14 14 Processor Types

15 15 Interconnects / Systems

16 16

17 17 Clusters (NOW) / Systems

18 18 Power: Watts/Gflop (smaller is better) Top 20 systems Based on processor power rating only

19 19 Top500 Computers in Australia RankSite Manufac ture ComputerAreaYearr-max N procsPeak 130PGSIBM xSeries Xeon 3.06 GHz - Gig-EIndustry200419235123133 224Consumer GoodsIBM eServer pSeries 690 (1.7 GHz Power4+, GigE)Industry200413053522394 317 Bureau of Meteorology / CSIRO HPCCCNECSX-6/144M18Research200411301441152 330 Australian Centre for Advanced Computing and CommunicationsDell PowerEdge 1750, Pentium4 Xeon 3.06 GHz, GigEAcademic200310953041860 455 University of QueenslandSGISGI Altix 1.3 GHzAcademic20039372081082

20 20 Top500 Conclusions u Microprocessor based supercomputers have brought a major change in accessibility and affordability. u Clusters continue to account of more than half of all installed high- performance computers worldwide.

21 21 Future: Petaflops ( fl pt ops/s) u A Pflop for 1 second  a laptop computing for 1 year. u From an algorithmic standpoint  concurrency  data locality  latency & sync  floating point accuracy  dynamic redistribution of workload  new language and constructs  role of numerical libraries  algorithm adaptation to hardware failure

22 22 Petaflop (10 15 flop/s) Computers Within the Next 5 Years u Five basis design points:  Conventional technologies »4.8 GHz processor, 8000 nodes, each w/16 processors  Processing-in-memory (PIM) designs »Reduce memory access bottleneck  Superconducting processor technologies »Digital superconductor technology, Rapid Single-Flux-Quantum (RSFQ) logic & hybrid technology multi-threaded (HTMT)  Special-purpose hardware designs »Specific applications e.g. GRAPE Project in Japan for gravitational force computations  Schemes utilizing the aggregate computing power of processors distributed on the web »SETI@home ~35 Tflop/sSETI@home

23 23 Petaflops (10 15 flop/s) Computer Today? 2 GHz processor (O(10 9 ) ops/s) .5 Million PCs  $.5B ($1K each)  100 Mwatts  5 acres .5 Million Windows licenses!!  PC failure every second

24 24 Real Crisis With HPC Is With The Software u Programming is stuck  Arguably hasn’t changed since the 60’s u It’s time for a change  Complexity is rising dramatically »highly parallel and distributed systems  From 10 to 100 to 1000 to 10000 to 100000 of processors!! »multidisciplinary applications u A supercomputer application and software are usually much more long-lived than a hardware  Hardware life typically five years at most.  Fortran and C are the main programming models u Software is a major cost component of modern technologies.  The tradition in HPC system procurement is to assume that the software is free. u We don’t have many great ideas about how to solve this problem.

25 25 Some Current Unmet Needs u Performance / Portability u Fault tolerance u Better programming models  Global shared address space  Visible locality u Maybe coming soon (incremental, yet offering real benefits):  Global Address Space (GAS) languages: UPC, Co-Array Fortran, Titanium) »“Minor” extensions to existing languages  More convenient than MPI »Have performance transparency via explicit remote memory references u The critical cycle of prototyping, assessment, and commercialization must be a long-term, sustaining investment, not a one time, crash program.

26 26 Collaborators / Support u Top500 Team  Erich Strohmaier, NERSC  Hans Meuer, Mannheim  Horst Simon, NERSC


Download ppt "1 High Performance Computing: A Look Behind and Ahead Jack Dongarra Computer Science Department University of Tennessee."

Similar presentations


Ads by Google