UT-BATTELLE High Performance Computing: Past Highlights and Future Trends David W.Walker Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN U. S. A.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Outline of Talk Trends in hardware performance. Advances in algorithms. Obstacles to efficient parallel programming. Successes and disappointments from the past 2 decades of parallel computing. Futures trends. Problem-solving environments. Petascale computing. Alternative algorithmic approaches. Concluding remarks.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Moore’s Law: A Dominant Trend KFlop/s 1 MFlop/s 1 GFlop/s 1 TFlop/s EDSAC 1 UNIVAC 1 IBM 7090 CDC 6600 CDC 7600IBM 360/195 Cray 1 Cray X-MP Cray 2 TMC CM-2 TMC CM-5Cray T3D ASCI Red
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Era of Modern Supercomputing In 1976 the introduction of the Cray 1 ushered in the era of modern supercomputing. –ECL chip technology –Shared memory, vector processing –Good software environment –About 100 Mflop/s peak –Cost about $5 million The Intel iPSC/1 was introduced in 1985 –Distributed memory –More scalable hardware –8 Mflop/s peak for 64 processor machine –Explicit message passing
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Competing Paradigms Shared memory vs. distributed memory Scalar vs. vector processing Custom vs. commodity processors Cluster vs. stand-alone system ?
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Recent Trends The Top500 list provides statistics on high performance computers, based on the performance of the LINPACK benchmark. Before 1995 the Top500 list was dominated by systems at US government research sites. Since 1995 commercial and industrial sites have figured more prominently in the Top500. Reasons: In 1994 companies such as SGI and Sun began selling symmetric multiprocessor (SMP) systems. IBM SP2 systems also popular with industrial sites. Dedicated database systems important as well as web servers.
Architectures 91 const, 14 clus, 275 mpp, 120 smp
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 CPU Technology
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Performance in the Top500
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Performance
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Application Areas
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Application Areas Rmax
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Systems Installed by Area
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Data by Continent
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Systems Installed by Continent
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Rmax by Continent
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Top500 Systems Installed by Manufacturer
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Future Extrapolations from Top500 Data
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Some Conclusions from Top500 Data Rapid turnover in architectures, vendors, and technologies. But long-term performance trends appear steady - how long will this continue? Moderately parallel systems now in widespread commercial use. Highest performance systems still found mostly in government-funded sites doing Grand and National challenges - mostly numerically intensive simulations.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Advances in Algorithms Advances in algorithms have led to performance improvements of several orders of magnitude in certain areas. Obvious example is the FFT: O(N 2 ) O(NlogN) Other examples include: –fast multipole methods –wavelet-based algorithms –sparse matrix solvers –etc.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Problems with High Performance Computing HPC is “difficult.” Often large disparity between peak performance and actual performance. Application developers must be aware of the memory hierarchy, and program accordingly. Lack of standard software environment and tools has been a problem. Not many commercial products. Platforms quickly become obsolete so it costs a lot of money to stay at the forefront of HPC technology. A Cray Y-MP C90, purchased in 1993 when the list price was $35M, is being sold on the eBay auction web site. “If there are no takers, we'll have to pay Cray about $30,000 to haul it away for salvage.” Mike Schneider, Pittsburgh Supercomputer Center
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Successes of Parallel Computing Portability. In the early days of HPC each machine came with its own application programming interface, and a number of competing research projects offered “compatibility” interfaces. Standardised APIs are now available –MPI for message passing machines –OpenMP for shared memory machines Libraries. Some success has been achieved with developing parallel libraries. For example, –ScaLAPACK for dense and banded numerical linear algebra (Dongarra et al.). –SPRNG for parallel random number generation (NCSA). –FFTW package developed at MIT by Matteo Frigo and Steven G. Johnson for parallel fast Fourier transforms.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Application Successes Climate modelling. The Climate Change Prediction Program seeks to understand the processes that control the Earth's climate and predict future changes in climate due to natural and human influences. Tokamak design. The main goal of the Numerical Tokamak Turbulence Project is to develop realistic fluid and particle simulations of tokamak plasma turbulence in order to optimize performance of fusion devices. Rational drug design. The goal is to discover and design new drugs based on computer simulations of macro-molecular structure. Computational fluid dynamics. This is important in the design of aerospace vehicles and cars. Quantum chromodynamics. Lattice QCD simulations allow us to make first-principle calculations of hadronic properties. The development of scalable massively parallel computers was motivated largely by a set of Grand Challenge applications :
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Difficulties and Disappointments Automatic parallelizing compilers. –Automatic vectorization was successful on vector machines. –Automatic parallelization worked quite well on shared memory machines. –Automatic parallelization has been less successful on distributed memory machines. The compiler must decide how to partition the data and assign it to processes to maximize the number of local memory accesses and minimize communication. Software tools for parallel computers. There are few standard software tools for parallel computers. Those that exist are mostly research projects - there are few commercial products. High Performance Fortran. Compilers and tools for HPF were slow in appearing, and HPF was not well-suited to irregular problems.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Future Trends in High Performance Computing Metacomputing using distributed clusters and the Grid –multidisciplinary applications –collaborative applications –advanced visualization environments Ultra-high performance computing –quantum computing –macro-molecular computing –petascale computing Problem-solving environments: Grid portals to computational resources Different algorithmic emphasis –“keep it simple”, cellular automata and particle-based methods –automatic performance tuning and “intelligent algorithms” –interval arithmetic techniques
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Metacomputing and the Grid Metacomputing refers to the use of multiple platforms (or nodes ) to seamlessly construct a single virtual computer. In general the nodes may be arbitrarily distant from one another. Some of the nodes may be specialised for a particular task. The nodes themselves may be sequential or parallel computers. A software component running on a single node may make use of MPI or OpenMP. Interaction between nodes is mediated by a software layer such as CORBA, Globus, or Legion. In a common model we view the nodes as offering different sets of computing services with known interfaces.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Metacomputing Limitations This type of distributed metacomputing is limited by the bandwidth of the communication infrastructure connecting the nodes. Limited use for compute-intensive applications. Tasks must be loosely coupled. May be useful for some multi- disciplinary applications.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Important Metacomputing Issues Resource discovery - how do nodes publicise their services. Resource scheduling - how to optimise resource use when there are multiple users. Resource monitoring - need to be able to monitor the bandwidth between nodes and the load on each. Code mobility - often in data-intensive applications it makes more sense to send the code to the data, rather than the other way round. What is the appropriate software infrastructure for interaction between nodes?
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Tele-Presence and Metacomputing More generally, the nodes of the metacomputer do not have to be computers - they may be people, experimental instruments, satellites, etc. The remote control of instruments, such as astronomical telescopes or electron microscopes, often involves several collaborators who interact with the instrument and with each other through a thin client interface. In recent work at Cardiff University researchers have developed a WAP interface that allows an MPI application running on a network of workstations to be controlled using a mobile phone.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Collaborative Immersive Visualization Essential feature - observer appears to be in the same space as the visualized data. Observer can navigate within the visualization space relative to the data. Several observers can co-exist in the same visualization space - ideal for remote collaboration.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Hardware Options CAVE: a fully immersive environment. ORNL system has stereoscopic projections onto 3 walls and the floor. ImmersaDesk: projects stereoscopic images onto a single flat panel display. Stereoscopic workstation: a stereoscopic viewing device, such as CrystalEyes, can be used on workstations and PCs. Stereo- ready graphics cards are becoming increasingly available.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) CAVE
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Immersive Visualization and Terascale Computing Scientific simulations, experiments, and observations generate vast amounts of data that often overwhelm data management, analysis, and visualization capabilities. Collaborative IV is becoming important in interpreting and extracting insights from this wealth of data. Immersive visualization capability is essential in a credible terascale computing facility.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Collaborative Framework for Simulation-enabled Processes Training Developers Users Cost Analysts Producers Testers Logistics/ Support Analysts Tools Processes Participants Program Managers Data System Developers Subsys/Tech Developers Web-Centric Data Integration Model Integration (CORBA & HLA) Web-based Collaborative Environment with Visualization Support
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Research Issues in Collaborative Immersive Visualization Collaborative use of immersive visualization across a range of hardware platforms within a distributed computing environment requires “resource-aware” middleware. Data management, analysis, rendering, and visualization should be tailored to the resources available. Make the visualization system resource-aware so that tasks of data extraction, processing, rendering, and communication across network can be optimized. Permit a wide range of platforms, ranging from CAVEs to laptops, to be used for collaborative data exploration and navigation.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Videmus Prototype Develop a collaborative immersive environment for navigating and analysing very large numerical data sets. Provide suite of resource-aware visualization tools for 3D scalar and vector fields. Support steering, and the retrieval and annotation of data. Permit collaborators to interact in the immersive space by audio and gestures. Make visualization adapt to network bandwidth - if bandwidth is low data may be compressed or lower resolution used. Use server-side processing to lessen load on client and network. Use software agents in implementation.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) CAVE ImmersaDesk Workstation Server Data server Compute server Rendering server Videmus Architecture Data Request Agent Data Dispatch Agent
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Other Collaborative Visualization Projects The Electronic Visualization Laboratory at UIC are world leaders, but not particularly focused at scientific visualization. NCSA have done a lot of work on middleware for advanced visualization, and human factors related research. The Virtual Environments Lab at Old Dominion University. Good potential collaborators. COVISE from the University of Stuttgart. SNL has projects in VR for mesh generation, and the “Navigating Science” project to develop a method of exploring and analysing scientific literature using virtual reality.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Massive data processing and visualization Recent acquisition of 9000 CDs of digital data –Vector maps –Scanned maps –Elevation data –Imagery Visualization: desktop to immersive VR environment Storage strategies Data exchange issues Collaborative environment Digital Earth Observatory HPAC Data, Climate and Groundwater Data, Transportation and Energy Data Probe ESnet3
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Motivation for Problem-Solving Environments The aim is scientific insight. Better understanding of the fundamental laws of the universe and how these interact to produce complex phenomena. New technologies for economic competitiveness and a cleaner and safer environment.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Aspects of Scientific Computing Prediction : as in traditional computational science. Abstraction : the recognition of patterns and inter- relationships. –Visualization for steering, navigation, and immersion. –Data mining. Collaboration : brings a wide range of expertise to a problem. We use computers to advance science through:
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Innovative Environments Gives transparent access to heterogeneous distributed resources Supports all aspects of software creation and use Seamlessly incorporates new hardware and software We seek to support prediction, abstraction, and collaboration in an integrated computing environment that: Problem-Solving Environment
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Problem-Solving Environments are specific to application domain, e.g., PSE for climate modeling, PSE for material science, etc. provide easy access to distributed computing resources so end user can focus on the science and not computer issues. deliver state-of-the-art problem-solving power to the end user. increase research productivity.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) PSEs and Complexity Modeling complex physical systems requires complex computer hardware (hierarchical memory, parallelism) and complex computer software (numerical methods, message passing, etc.). PSEs handle this complexity for the end user.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Vision for PSEs PSEs herald a new era in scientific computing, both in power and how resources are accessed and used PSEs will become the main gateway for scientists to access terascale computing resources. PSEs will allow users to access these resources from any web connection. They are as web portals to the Grid. PSE’s support for collaborative computational science will change the pervading research culture, making it more open and accountable.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Synergies Research Hardware Research Software Research Culture Better = bigger & faster Distributed, immersive Collaborative Better = more open & accountable
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Software Technologies for PSEs XML is used for interface specification and defining the component model. Java is used for platform-independent programming. CORBA provides for transparent interaction between distributed resources. Agents for user support, resource monitoring and discovery. Wherever possible use accepted software standards and component-based software engineering.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Main Features of a PSE Collaborative code development environment. Intelligent resource management system. Expert assistance in application design and input data specification. Electronic notebooks for recording and sharing results of research.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Collaborative Code Development Environment The collaborative code development environment uses a visual programming tool for seamlessly integrating code from multiple sources. Applications are created by plugging together software components. Legacy codes in any major scientific programming language can be handled.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Novel Ideas for PSE Research Intelligence is important in PSEs for efficient use and management of resources, for ease of use, and user support. The PSE must be able to learn what works best. Living documents are a novel way of electronically publishing research results. Readers can replay simulations, and experiment with changing input parameters. Resource-aware visualization refers to ability of a visualization to adapt to the hardware platform, which may range from a PC to a CAVE.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Why PSEs? Need: enhanced scientific insight; reduced development costs; improved product quality and industrial efficiency. Need: transparent means of integrating distributed computers, instruments, sensors, and people. Need: improved software productivity to extract maximum benefit from advances in computers, networks, and algorithms.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Why Now? Confluence of complementary technologies. Faster networks and communications. Network software technologies such as CORBA, Java, and XML. “Big Science” is inherently distributed and collaborative, and needs to migrate to WAN environments to progress.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) What’s the Problem? High-level, problem-specification languages, often coupled with expert system. For example, PDE solvers, numerical integration, etc. Problem composition in form of dataflow graph using a GUI. Typically used in modelling and simulation of physical systems.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) PSE Requirements Expert assistance in problem specification and input. Transparent access to distributed heterogeneous resources. Interactivity and computational steering. Advanced/immersive visualisation. Integration with other knowledge repositories and databases.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Technologies for PSEs Hardware: Increasingly powerful computers Increasingly fast networks - gigabit ethernet, vBNS, etc. Immersive visualisation platforms - CAVEs, ImmersaDesks, etc.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Technologies for PSEs Software: CORBA for transparent interaction between distributed resources. Java for platform-independent programming. XML interface specification. MPI for message-passing in SPMD codes.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) An Example PSE Architecture Main PSE sub-systems are: Visual Program Composition Environment (VPCE) for graphically composing applications. Intelligent Resource Management System (IRMS) for scheduling applications on distributed resources.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) VPCE Overview GUI is used to build an application from software components - either a java or CORBA object with its interface specified in XML. Each component may have a performance model and help file. An annotated dataflow graph is produced that is passed to the IRMS.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) IRMS Overview IRMS locates software and hardware resources through information servers. IRMS then schedules components on appropriate resources based on performance models and database of experience from previous runs. Genetic and neural network algorithms may be used.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) The PSE Research Community European Research Conference on PSEs took place June 1999 in Spain. Next one in summer EuroTools SIG on PSEs. Cardiff PSE project web site.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) US Software Infrastructure Globus: provides core services for grid-enabled computing. Legion: an object-based metacomputing project. The Grid is a computational and network infrastructure providing pervasive, uniform, and reliable access to distributed resources.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) European Software Infrastructure UNICORE: Uniform access to Computing Resources. Aimed at providing uniform, secure, batch access to distributed resources. POLDER: a more ambitious metacomputing project.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) European Software Infrastructure CODINE: resource management system targeted at optimal use of all software and hardware resources in a heterogeneous networked environment. CCS: Computing Centre Software - resource management for networked high-performance computers. paderborn.de/pc2/projects/ccs/
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) European Software Infrastructure GRD: Global Resource Director for distributed environments featuring policy management and dynamic scheduling. NWIRE: Netwide resources - management system for WAN-based resources.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) COVISE Visualisation Environment The Co llaborative Visualisation and Simulation Environment is a distributed software environment that seamlessly integrates simulations, post-processing, and visualisation. COVISE supports collaborative working, and is available commercially.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Ctadel and PDE Problems Code-generation tool for applications based on differential equations using high-level language specifications is an environment for the automatic generation of efficient Fortran or HPF programs for PDE-based problems. Used in HIRLAM numerical weather forecast system.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) An Environment for Cellular Automata CAMELCAMEL is a CA environment designed for message-passing parallel computers. It hides parallelism issues from a user. CARPETUser specifies only the transition function of a single cell of the system with CARPET, a high-level cellular language.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) A PSE for Numerical General Relativity CACTUS is a collaborative software environment for composing applications for the solution of general relativity problems. Has been used in distributed computing experiments using Globus. Interactive visualisation important.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) JACO3: Industrial Design PSE Java and CORBA based collaborative environment for coupled simulations. A CORBA based high performance distributed computing environment for coupling simulation codes. Optimal design of complex and expensive products like airplanes, satellites, or cars.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) A PSE for Stochastic Analysis Promenvir: Probabilistic mechanical design environment - a metacomputing tool for stochastic analysis. It can automatically generate a series of stochastic computational experiments, and run them on the available resources It has been used for optimal design problems in the automobile industry.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) PSE for Engineering Simulations JULIUS : Joint Industrial Interface for End-User Simulations. Integrated HPC environment for multi-disciplinary engineering simulations. Aimed at reducing design time for industrial products. End-users are engineers.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Ultra-High Performance Computing Quantum computing. Based on the principle of superposition that says that a particle can simultaneously be in multiple quantum states. In theory, this allows a quantum computer to perform enormous numbers of operations in parallel, and thus achieve very high performance. But not yet - there are lots of problems, such as how do you couple a quantum computer to its environment. Molecular computing. Macro-molecules such as DNA can be used to perform massively parallel searches and provide ultra-high capacity storage devices. But not yet - a stable molecular computer is still decades away. Superconductor-based logic. This is the basis of the Hybrid Technologies Multi-Threaded (HTMT) architecture, and may lead to a petaflop computing by floating-point operations per second.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Future of Semiconductor Chips. Due to fundamental physical constraints it is expected that the clock speed of silicon-based semiconductor chips will level out at about 6GHz by Need about of these chips to reach 1 Pflop/s, a small power plant to provide the energy! For most of us Moore’s Law will no longer hold. We can look forward to cheap, long-lasting computers with stable software.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) The HTMT Architecture and Petascale Computing Superconductor Rapid Single-Flux Quantum (RSFQ) logic as the basis of 150 GHz processing elements. Processing-in-Memory (PIM) chips to reduce memory access bottlenecks. An optical interconnection network with a communication speed of 500 Gbps per channel. Hardware support for latency management. The HTMT project aims to build a petascale computer by 2010, based on a number of innovative technologies:
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Superconductor RSFQ logic RSFQ devices running at up to 700 GHz have been tested. RSFQ processing elements are high-speed, but have low power consumption. Digital bits are encoded by a single quantum of magnetic flux. Data are transferred in picosecond SFQ pulses. Uses superconducting niobium. Processes must be cooled to liquid helium temperatures, but costs are acceptable. This technology is not for the mass market!
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Processing-In-Memory Chips A central issue in the design and use of HPC systems has been the fact that peak processing speeds have been increasing at a faster rate than memory access speeds. This has led to systems with complex memory hierarchies. PIM designs seek to reduce the memory access bottleneck by integrating processing logic and memory on a single chip. Commercial and research PIM chips already exist.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) HTMT and Hierarchical Memory Superconductor processing elements (Spells) each with 1 Mbyte of memory cooled by liquid helium - cryogenic RAM (CRAM). SRAM PIMs cooled by liquid nitrogen. DRAM PIMs that are connected to the SRAM PIMs by an optical network. Holographic 3/2 memory (HRAM). The HTMT architecture has a deep memory hierarchy. There are 4 levels in the hierarchy:
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Latency Management It takes many processor cycles for a SPELL to access memory in the PIM or HRAM levels of the memory hierarchy. Latency management is crucial o the effective use of the HTMT. Multithreading is used as the basis of the HTMT execution model. The PIM-based levels are used in thread context management to keep the SPELLs supplied with work to do.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) HTMT Challenges The HTMT architecture presents many technical challenges in almost every aspect of its design. It will require a new approach to programming and algorithm design. To make best use of the memory hierarchy simple regular algorithms (that may have a higher operation count) will be favored over more complex inhomogeneous algorithms. Applications will require a high degree of concurrency to keep the SPELLs busy. Given the requirements of high concurrency and latency tolerance, it is likely that highly-tuned software libraries and PSEs will be important in using HTMT computers. Needs long-term funding commitment. Currently the HTMT project is not funded.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Alternative Algorithmic Approaches Algorithms with a regular structure, but a higher operation count, may be better than those with irregular structure. Slower algorithms that are more accurate and stable may be better than faster algorithms that are less accurate and stable. Cellular automata (CA) appear very well suited for future generation HPC machines. Interval-based algorithms may play a greater role in the future. Automatic tuning of numerical libraries. Intelligent algorithms and “algorithmic bombardment.” The gap between processor speeds and memory access speeds is expected to increase in the future, so latency tolerant algorithms will continue to be important.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Cellular Automata CAs are highly parallel, very regular in structure, can handle complex geometries, and are numerically stable. Dynamics of the CA mimics the fine-grain dynamics of the actual physical system being modeled. Complex collective global behavior arises from simple components obeying simple interaction rules. Over the next 10 years CA will play an increasing role in the simulation of physical (and social) phenomena. Cellular automata offer an alternative to classical PDE-based techniques for solving certain problems.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) CA for Surface Reactions A cellular automaton is used to model the reaction of carbon monoxide and oxygen to form carbon dioxide CO + O CO 2 Reactions take place on surface of a crystal which serves as a catalyst. This is used in models of catalytic converters.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) The Problem Domain The problem domain is a periodic square lattice representing the crystal surface. CO and O 2 are adsorbed onto the crystal surface from the gas phase. Parameter y is the fraction of CO and 1-y is the fraction of O 2.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Interaction Rules Choose a lattice site at random and attempt to place a CO or an O 2 there with probabilities y and 1-y, respectively. If site is occupied then the CO or O 2 bounces off, and a new trial begins. O 2 disassociates so we have to find 2 adjacent sites for these. The following rules determine what happens next.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Interaction Rules for CO oxygen CO 1. CO adsorbed 3. CO and O react 4. CO 2 desorbs 2. Check 4 neighbors for O
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Interaction Rules for O 1. O 2 adsorbed 2. O 2 disassociates 3. Check 6 neighbors for CO 4. O and CO react 5. CO 2 desorbs oxygen CO O2O2
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Steady State Reaction For y 1 <y<y 2 we get a steady state. y 1 0.39 y 2 0.53
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) CO Poisoning: y > y 2
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Oxygen Poisoning: y < y 1
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Load Imbalance Load imbalance is smaller for smaller block sizes. Load imbalance is large as CO poisoning occurs. 512x512 lattice y = 0.53
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Interval-Based Algorithms Each quantity is represented by a lower and upper bound within which it is guaranteed to lie. Interval representation provides rigorous accuracy information that is absent in the point representation. Interval approach provides a way to keep track of initial uncertainties in the input data, errors in analytic approximations, rounding errors, etc. Important in critical design processes - space shuttle, aircraft, nuclear reactors, etc. Interval methods have existed for a long time but had bad performance because they were implemented in software. Recently compiler and hardware support for interval methods have become available.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Automatically Tuned Numerical Software Libraries This idea is exemplified by the Automatic Tuned Linear Algebra Software (ATLAS) project of Dongarra et al. Numerical routines are developed with a large design space spanned by many tunable parameters, such as: –blocking size –loop nesting permutations –loop unrolling depths –pipelining strategies –register allocation –instruction schedules When ATLAS is installed on a new platform a set of runs automatically determines the best parameter values for that platform. Software must be able to dynamically explore its computational environment, and intelligently adapt as resource availability changes.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Poly-Algorithmic Approaches On some advanced architecture computers the detailed order of arithmetic operations may not be pre-determined. For some problems it may not be possible to predict a priori which solution method is best, or even which will converge. Algorithmic bombardment applies several algorithms concurrently in the hope that at least one will converge to a solution. Could also first try a fast but unreliable method, and then if a problem occurred use a slower but more reliable method to fix it. Poly-algorithmic methods could be made available as black boxes, or offer the user varying degrees of control over the methods used.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Summary: the Past and Present Parallel computing has transformed computational simulation and modeling, enabling a new paradigm of scientific investigation termed computational science. Parallel computing is becoming more widespread in commercial and industrial organizations. But software environments and tools for supporting parallel computing are disappointing and not widely used. Software reusability needs to be improved. OpenMP and MPI are the de facto standards for shared and distributed memory platforms, respectively.
Computer Science and Mathematics UT-BATTELLE U.S. D EPARTMENT OF E NERGY O AK R IDGE N ATIONAL L ABORATORY David W. Walker, +1 (865) Summary: the Present and the Future Metacomputing in a distributed environment is attracting a lot of interest, but appears to be of limited use for compute intensive applications. This may change as network bandwidth improves. Application-specific problem-solving environments address issues of software reusability, transparent access to distributed computing resources, data visualization, exploration, and analysis within an integrated software environment. Multi-disciplinary applications, support for collaboration, and advanced visualization interfaces are becoming more important. Performance of conventional chips will level off in a few years, but radical new technologies offer further dramatic increases in compute power. New algorithmic approaches will be needed to exploit future HPC platforms effectively.