Download presentation
Presentation is loading. Please wait.
Published byBlaze Peters Modified over 9 years ago
1
1 High Performance Computing: A Look Behind and Ahead Jack Dongarra Computer Science Department University of Tennessee
2
2 High Performance Computers u From the beginning of the digital age, supercomputers have been time machines that let researchers peer into the future, both intellectually and temporally. Intellectually they bring to life models of complex phenomena when economics and other constraints preclude experimentation. Temporally they reduce the time to solution by enabling us to evaluate larger and more complex models than would be possible on more conventional systems.
3
3 High Performance Computers u ~ 25 years ago 1x10 6 Floating Point Ops/sec (Mflop/s) »Scalar based u ~ 15 years ago 1x10 9 Floating Point Ops/sec (Gflop/s) »Vector & Shared memory computing, bandwidth aware »Block partitioned, latency tolerant u ~ Today 1x10 12 Floating Point Ops/sec (Tflop/s) »Highly parallel, distributed processing, message passing, network based »data decomposition, communication/computation u ~ 5 years away 1x10 15 Floating Point Ops/sec (Pflop/s) »Many more levels MH, combination/grids&HPC »More adaptive, LT and bandwidth aware, fault tolerant, extended precision, attention to SMP nodes
4
4 Architecture/Systems Continuum u Custom processor with custom interconnect Cray X1 NEC SX-7 IBM Regatta IBM Blue Gene/L u Commodity processor with custom interconnect SGI Altix »Intel Itanium 2 Cray Red Storm »AMD Opteron u Commodity processor with commodity interconnect Clusters »Pentium, Itanium, Opteron, Alpha »GigE, Infiniband, Myrinet, Quadrics NEC TX7 IBM eServer Dawning Loosely Coupled Tightly Coupled u Best processor performance for codes that are not “cache friendly” u Good communication performance u Simplest programming model u Most expensive u Good communication performance u Good scalability u Best price/performance (for codes that work well with caches and are latency tolerant) u More complex programming model Custom Commod Hybrid
5
5 Top 500 Computers - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem Updated twice a year SC‘xy in the States in November Meeting in Germany in June Size Rate TPP performance
6
6 u A supercomputer is a hardware and software system that provides close to the maximum performance that can currently be achieved. u Over the last 10 years the range for the Top500 has increased greater than Moore’s Law u 1993: #1 = 59.7 GFlop/s #500 = 422 MFlop/s u 2004: #1 = 70 TFlop/s #500 = 850 GFlop/s What is a Supercomputer? Why do we need them? Almost all of the technical areas that are important to the well-being of humanity use supercomputing in fundamental and essential ways. Computational fluid dynamics, protein folding, climate modeling, national security, in particular for cryptanalysis and for simulating nuclear weapons to name a few.
7
7 24th List: The TOP10 ManufacturerComputer Rmax [TF/s] Installation SiteCountryYear#Proc 1IBM BlueGene/L β-System 70.72DOE/IBMUSA200432768 2SGI Columbia Altix, Infiniband 51.87NASA AmesUSA200410160 3NECEarth-Simulator35.86Earth Simulator CenterJapan20025120 4IBM MareNostrum BladeCenter JS20, Myrinet 20.53 Barcelona Supercomputer Center Spain20043564 5CCD Thunder Itanium2, Quadrics 19.94 Lawrence Livermore National Laboratory USA20044096 6HP ASCI Q AlphaServer SC, Quadrics 13.88 Los Alamos National Laboratory USA20028192 7Self Made X Apple XServe, Infiniband 12.25Virginia TechUSA20042200 8IBM/LLNL BlueGene/L DD1 500 MHz 11.68 Lawrence Livermore National Laboratory USA20048192 9IBMpSeries 65510.31Naval Oceanographic OfficeUSA20042944 10Dell Tungsten PowerEdge, Myrinet 9.82NCSAUSA20032500 399 system > 1 TFlop/s; 294 machines are clusters, top10 average 8K proc
8
8 Performance Development My Laptop
9
9 Performance Projection
10
10 Performance Projection My Laptop
11
11 Chip (2 processors) Compute Card (2 chips, 2x1x1) 4 processors Node Card (32 chips, 4x4x2) 16 Compute Cards 64 processors System (64 racks, 64x32x32) 131,072 procs Rack (32 Node boards, 8x8x16) 2048 processors 2.8/5.6 GF/s 4 MB (cache) 5.6/11.2 GF/s 1 GB DDR 90/180 GF/s 16 GB DDR 2.9/5.7 TF/s 0.5 TB DDR 180/360 TF/s 32 TB DDR IBM BlueGene/L 131,072 Processors “Fastest Computer” BG/L 700 MHz 32K proc 16 racks Peak: 91.7 Tflop/s Linpack: 70.7 Tflop/s 77% of peak BlueGene/L Compute ASIC Full system total of 131,072 processors
12
12 Customer Segments / Systems
13
13 Manufacturers / Systems
14
14 Processor Types
15
15 Interconnects / Systems
16
16
17
17 Clusters (NOW) / Systems
18
18 Power: Watts/Gflop (smaller is better) Top 20 systems Based on processor power rating only
19
19 Top500 Computers in Australia RankSite Manufac ture ComputerAreaYearr-max N procsPeak 130PGSIBM xSeries Xeon 3.06 GHz - Gig-EIndustry200419235123133 224Consumer GoodsIBM eServer pSeries 690 (1.7 GHz Power4+, GigE)Industry200413053522394 317 Bureau of Meteorology / CSIRO HPCCCNECSX-6/144M18Research200411301441152 330 Australian Centre for Advanced Computing and CommunicationsDell PowerEdge 1750, Pentium4 Xeon 3.06 GHz, GigEAcademic200310953041860 455 University of QueenslandSGISGI Altix 1.3 GHzAcademic20039372081082
20
20 Top500 Conclusions u Microprocessor based supercomputers have brought a major change in accessibility and affordability. u Clusters continue to account of more than half of all installed high- performance computers worldwide.
21
21 Future: Petaflops ( fl pt ops/s) u A Pflop for 1 second a laptop computing for 1 year. u From an algorithmic standpoint concurrency data locality latency & sync floating point accuracy dynamic redistribution of workload new language and constructs role of numerical libraries algorithm adaptation to hardware failure
22
22 Petaflop (10 15 flop/s) Computers Within the Next 5 Years u Five basis design points: Conventional technologies »4.8 GHz processor, 8000 nodes, each w/16 processors Processing-in-memory (PIM) designs »Reduce memory access bottleneck Superconducting processor technologies »Digital superconductor technology, Rapid Single-Flux-Quantum (RSFQ) logic & hybrid technology multi-threaded (HTMT) Special-purpose hardware designs »Specific applications e.g. GRAPE Project in Japan for gravitational force computations Schemes utilizing the aggregate computing power of processors distributed on the web »SETI@home ~35 Tflop/sSETI@home
23
23 Petaflops (10 15 flop/s) Computer Today? 2 GHz processor (O(10 9 ) ops/s) .5 Million PCs $.5B ($1K each) 100 Mwatts 5 acres .5 Million Windows licenses!! PC failure every second
24
24 Real Crisis With HPC Is With The Software u Programming is stuck Arguably hasn’t changed since the 60’s u It’s time for a change Complexity is rising dramatically »highly parallel and distributed systems From 10 to 100 to 1000 to 10000 to 100000 of processors!! »multidisciplinary applications u A supercomputer application and software are usually much more long-lived than a hardware Hardware life typically five years at most. Fortran and C are the main programming models u Software is a major cost component of modern technologies. The tradition in HPC system procurement is to assume that the software is free. u We don’t have many great ideas about how to solve this problem.
25
25 Some Current Unmet Needs u Performance / Portability u Fault tolerance u Better programming models Global shared address space Visible locality u Maybe coming soon (incremental, yet offering real benefits): Global Address Space (GAS) languages: UPC, Co-Array Fortran, Titanium) »“Minor” extensions to existing languages More convenient than MPI »Have performance transparency via explicit remote memory references u The critical cycle of prototyping, assessment, and commercialization must be a long-term, sustaining investment, not a one time, crash program.
26
26 Collaborators / Support u Top500 Team Erich Strohmaier, NERSC Hans Meuer, Mannheim Horst Simon, NERSC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.