Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
Agenda Introduction Supercomputer classification Architecture and implementations Commodity clusters Processors Operating systems Summary
Supercomputer „A supercomputer is a device for turning compute- bound problems into I/O-bound problem” - Seymour Cray A supercomputer is a computer system that leads the world in terms of processing capacity, particularly speed of calculations, at the time of its introduction. source:
Supercomputer History (1) Manchester Mark I MIT Whirlwind IBM KFLOPS CDC MFLOPS CDC MFLOPS CDC Cyber 76
Supercomputer History (2) Cray MFLOPS Cray X-MP MFLOPS Cray Y-MP GFLOPS Fujitsu Numerical Wind Tunnel GFLOPS Intel ASCI Red TFLOPS IBM ASCI White, SP Power3 375 MHz TFLOPS NEC Earth Simulator - 35 TFLOPS
Supercomputer Classes (1) General-purpose supercomputers: –vector processing machines - the same operation carried out on a large amount of data simultaneously –tightly connected cluster computers (NUMA) - communication oriented architectures engineered from ground up, based on high speed interconnects and large number of processors –commodity clusters - collection of large number of commodity PCs (COTS) interconnected by high- bandwidth low-latency network
Supercomputer Classes (2) Special-purpose supercomputers - high performance computing devices with a hardware architecture dedicated to solve a single problem (equipped with custom ASICS or FPGA chips) Examples –Deep Blue –GRAPE for astrophysics
Flynn taxonomy (1) SISD - Single Instruction Single Data (DEC, Sun Microsystems, PC) SIMD - Single Instruction Multiple Data –computers with large number o processing units (i.e. ALUs) - CPP DAP Gamma II, Quadrics Apemille –vector processing machines - NEC SX6, IA32 MMX MISD - Multiple Instruction Single Data –theoretical model, no practical implementation
Flynn taxonomy (2) MIMD - Multiple Instruction Multiple Data –SM-MIMD - Shared Memory MIMD global address space SMP systems and ccNUMA systems –DM-MIMD - Distributed Memory MIMD many nodes with local address spaces high-bandwidth, low-latency communication common NUMA architectures (Non Uniform Memory Access) operating system have to be communication oriented (Mach project)
SM-MIMD implementations S-COMA - Simple Cache-Only Memory Architecture –common SMP systems ccNUMA - Cache Coherent NUMA –SGI Origin 3000 –SGI Altix 3000 –HP SuperDome
S-COMA (SMP) CPU 0 RAM L2 cache CPU 1CPU N
ccNUMA CPU 0 RAM 0 L2 cache CPU 1 L3 cache L2 cache CPU N-1 L2 cache CPU N L3 cache RAM K
ccNUMA implementation SGI Altix 3000 (ccNUMA) 64 Itanium 2 (IA64) processors C-brick modules with 2 CPUs and ASIC SHUB NUMAflex, NUMAlink interconnects (6.4 GB/s, 2.4 GB/s) Modified Linux kernel (2.6 NUMA support)
DM-MIMD implementations Massively parallel systems (NUMA) –communication oriented architecture –low-latency, high-bandwidth interconnects –topologies: hypercube, torus, tree –Butterfly networks, Omega networks, engineered from ground up communication
DM-MIMD implementations Commodity clusters –a cluster is a collection of connected, independent computers working in unison to solve a problem –COTS technology –nodes are interconnected by Ethernet LAN, Myrinet, QsNet ELAN etc. –computation can be performed by using popular programming toolkits and frameworks: OpenMP, MPI –clusters require dedicated management software
NUMA implementations Cray T3E-1350 Processor: Alpha MHz Number of CPUs: D Torus topology Operating system: UNICOS/mk - microkernel based Peak performance: 3 TFLOPS
Commodity cluster implementation (1) Linux Networx/Quadrics Processor: Intel Xeon 2.4 GHz CPUs: 2304 Interconnections: QsNet ELAN3 Operating system: Linux + management tools + Lustre Cluster File System Peak performance: 7.6 TFLOPS 3 rd computer on TOP500 list Developed for Lawrence Livermore National Laboratory in 2002
Commodity cluster implementation (2) HP XC6000 Cluster (XC3000 Cluster) Processor: Intel Itanium 2 6M 1.5 GHz (Intel Xeon 3 GHz) Node: HP Integrity rx2600 (HP ProLiant DL380) Number of processors: Interconnections: QsNet ELAN3 (Myricom Myrinet XP) Operating system: Linux + SSI Middleware + management tools + Lustre Cluster File System Peak performance: 34 CPUs GFLOPS, 512 CPUs - 3 TFLOPS
Commodity Clusters - software Operating system - Linux or SSI Linux (Single System Image) Platform for specialized applications for science, engineering and business (simulation, modeling, data mining) Distributed computation environments are used for software development (OpenMP, MPI) Common supercomputer applications require porting to clusters
Performance Scaling Scale-Out (Cluster) Scale-Up (SMP, ccNUMA) Scale Right
Processors (1) Many types of existing processors are used in supercomputers Microprocessor development directions: –Increasing of clock frequency and speed instruction stream processing –Processing of large collection of data in single processor instruction - SIMD –Control path multiplication – multithreading
Processors (2) Vector processors –NEC SX-6 –Cray (Cray X1) RISC processors –MIPS –IBM Power4 –Alpha CISC processors –IA32 –AMD x86-64 VLIW processors –IA64
Intel Itanium 2 features State-of-the-art unconventional 64-bit architecture New programming model implementing VLIW paradigm EPIC technology – Explicitly Parallel Instruction Computing – compiler determines instruction dependency informing processor how to process an instruction stream parallel Many registers ( bit), register stack management 6 GFLOPS peak performance Full advantages of the processor can be used by dedicated compiler
Operating systems Monolithic kernel based OSs - UNIX (modification of existing solutions) –BSD –Solaris –Irix –Linux Microkernel based OSs –Mach
Microkernel architecture Task ATask B Kernel Task C Kernel Hardware
Summary Today’s there is a lot of supercomputer architectures Both vector processors and common RISC, CISC, VLIW chips are used for supercomputers Commodity clusters under control of Linux OS are an attractive method for supercomputer implementation
TOP 500 list (1) 1. Earth Simulator, NEC TFLOPS 2. HP Alphaserver SC, HP TFLOPS 3. Linux Networx / Quadrics IA TFLOPS
Top 500 list (2) Source: