Computer and Computational Sciences Division Los Alamos National Laboratory Ideas that change the world Achieving Usability and Efficiency in Large-Scale Parallel Computing Systems Kei Davis and Fabrizio Petrini Performance and Architectures Lab (PAL), CCS-3
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 2 CCS-3 P AL Schedule n Introduction n Break n Existing Systems n Break n Case Study n Break n A New Approach
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 3 CCS-3 P AL Part 1: Introduction 1. The need for more capability 2. The big issues 3. A taxonomy of systems in three dimensions
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 4 CCS-3 P AL The Need for More Capability The most constant difficulty in contriving the engine has arisen from the desire to reduce the time in which the calculations were executed to the shortest which is possible. Charles Babbage, Our interest is in scientific computing—large-scale, numerical, parallel applications run on large-scale parallel machines.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 5 CCS-3 P AL Definitions n Computing capacity: total deliverable computing power from a system or set of systems. (Power— rate of delivery) n Computing capability: computing power available to a single application. Highest-end computing is primarily concerned with capability—why else build such machines?
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 6 CCS-3 P AL The Need for Large-Scale Parallel Machines n It is the insatiable demand for ever more computational capability that has driven the creation of many Tflop-scale parallel machines (Earth Simulator, LANL’s ASCI Q, LLNL’s Thunder and BlueGene/L, etc.) n Petaflop machines are on the horizon, for example DARPA HPCS program (High Productivity Computing Systems)
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 7 CCS-3 P AL One-upmanship? Is this merely one-upmanship with the Japanese? From The Roadmap for the Revitalization of High-End Computing, Computing Research Association: […] there is a growing recognition that a new set of scientific and engineering discoveries could be catalyzed by access to very-large-scale computer systems—those in the 100 teraflop to petaflop range.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 8 CCS-3 P AL Requirements for ASCI In our own arena, Advanced Simulation and Computing (ASC) for stockpile stewardship; climate, ocean, and urban infrastructure modeling, etc., Within 10 years, estimates of the demand for Capability and general physics arguments indicate a machine of 1000TF=1 PetaFlop (PF) will be needed to execute the most demanding jobs. Such demand is inevitable; it should not be viewed, however, as some plateau in required Capability: there are sound technical reasons to expect even greater Capability demand in the future.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 9 CCS-3 P AL Large Component Count n Increases in performance will be achieved through single processor improvements and increases in component count n For example, BlueGene/L will have 133,120 processors and 608,256 memory modules n The large component count will make any assumption of complete reliability unrealistic 133,120 processors 608,256 DRAM
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 10 CCS-3 P AL Sensitivity to Failures n In a large-scale machine a failure of a single component usually causes a significant fraction of the system to fail because 1. Components are strongly coupled (e.g., a failure of a fan will lead to other failures due to overheating) 2. The state of the application is not stored redundantly, and loss of any state is catastrophic 3. In capability mode, many processing nodes are running the same application, and are tightly coupled together
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 11 CCS-3 P AL The Need for Transparent Fault-Tolerance n System software must be resilient to failures, to allow continuing execution of in the presence of failures n Most of the investment is in the application software (250M$/year for MPI software in the ASCI TriLabs) n Economical constraints impose a limited level of redundancy n Other considerations include cost of development, scalability and efficiency
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 12 CCS-3 P AL The JASON’s Report n A recent report from the JASON’s, a committee of distinguished scientists chartered by the US government, raised the sensitive question of whether ASCI machines can be used as capability engines n For that to be possible, major advances in fault- tolerance are needed n The recommendation of the report is to skip one generation of supercomputers, due to the lack of good technical/scientific solutions
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 13 CCS-3 P AL MTBF as a Function of System Size
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 14 CCS-3 P AL Failure Distribution (ASCI Blue Mountain)
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 15 CCS-3 P AL State of the Art in Large- Scale Supercomputers n We can assemble large-scale systems by wiring together hardware and “bolting together” software components n But we have almost no control on the machine: not only faults but also performance anomalies
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 16 CCS-3 P AL 1.2 The Big Issues From DoE Office of Science By the end of this decade petascale computers with thousands of times more computational power than any in current use will be vital tools for expanding the frontiers of science and for addressing vital national priorities. These systems will have tens to hundreds of thousands of processors, an unprecedented level of complexity, and will require significant new levels of scalability and fault management. [Emphasis added]
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 17 CCS-3 P AL Office of Science cont’d Current and future large-scale parallel systems require that such services be implemented in a fast and scalable manner so that the OS/R does not become a performance bottleneck. Without reliable, robust operating systems and runtime environments the computational science research community will be unable to easily and completely employ future generations of extreme-scale systems for scientific discovery.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 18 CCS-3 P AL DARPA Defence Advanced Research Projects Administration (DARPA) High Productivity Computing Systems (HPCS) mission: Provide economically viable high productivity systems for the national security and industrial user communities with the following design attributes in the latter part of this decade: Performance Programmability Portability Robustness
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 19 CCS-3 P AL Our Translation n Performance—achieving achievable performance (not, e.g., some percentage of theoretical peak) n Programmability/portability—standard interfaces, transparency of mechanisms for fault tolerance n Robustness—graceful failover
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 20 CCS-3 P AL 1.3 A Taxonomy of Systems Q: Is it a supercomputer or just a cluster? A: It is a continuum along multiple dimensions. A taxonomy of systems of three dimensions: n Degree of integration of compute node; n Collective primitives provided by the network interface, programmability, global address space; n Degree of integration of system software.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 21 CCS-3 P AL Note This taxonomy is useful for our explication, but we make no claims that it n is canonical, n that it captures highly specialized architectures (for example custom-designed special-purpose digital processors, vector processors, floating-point processors). We are concerned with the big `general purpose’ parallel machines.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 22 CCS-3 P AL Compute Node Degree of integration of compute node between processors, memory, and network interface n Single processor—SMP—multiple CPU cores per chip n Number of levels of cache, proximity of caches to CPU core n Proximity of network interface to CPU core: on- chip—off-chip direct connection—separated by I/O interface
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 23 CCS-3 P AL Network Interface n Collective primitives provided by network interface: none—functionally rich; n Programmability of network interface: none— general purpose n Provision of virtual global address space
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 24 CCS-3 P AL System software n Degree of integration of system software much more about this later…