Technologies for Cluster Computing Oren Laadan Columbia University ECI, July 2005
ECI – July Course Overview (contd) What is Cluster Computing ? Parallel computing, enabling technologies, definition of a cluster, taxonomy Middleware SSI, operating system support, software support Virtualization & Process Migration Resource sharing Job assignment, load balancing, information dissemination Grids
ECI – July Motivation Demanding Applications Modeling and simulations (physics, weather, CAD, aero-dynamics, finance, pharmaceutical) Business and E-commerce (Ebay, Oracle) Internet (Google, eAnything) Number crunching (encryption, data mining) Entertainment (animation, simulators) CPUs are reaching physical limits Dimensions Heat dissipation
ECI – July How to Run Applications Faster ? 3 ways to improve performance: Work Harder Work Smarter Get Help And in computers: Using faster hardware Optimized algorithms and techniques Multiple computers to solve a particular task
ECI – July Parallel Computing Hardware: Instructions or Data ? SISD – classic cpu SIMD – vector computers MISD – pipelined computers MIMD – general purpose parallelism Sofware ajdustments Parallel programming: multiple processes collaborating, with communication and synchronization between them Operating systems, compilers etc.
ECI – July Parallel Computer Architectures Taxononmy of MIMD: SMP - Symmetric Multi Processing MPP - Massively Parallel Processors CC-NUMA - Cache-Coherent Non- Uniform Memory Access Distributed Systems COTS – Commodity Off The Shelf NOW – Network of Workstations Clusters
ECI – July Taxononmy of MIMD (contd) SMP 2-64 processors today Everything shared Single copy of OS Scalability issues (hardware, software) MPP Nothing shared Several hundred nodes Fast interconnection Inferior cost/performance ratio
ECI – July Taxonomy of MIMD (contd) CC-NUMA Scalable multiprocessor system Global view of memory at each node Distributed systems Conventional networks of independent nodes Multiple system images and OS Each node can be of any type (SMP, MPP etc) Difficult to use and extract performance
ECI – July Taxonomy of MIMD (contd) Clusters Nodes connected with high-speed network Operate as an integrated collection of resources Single system image High performance computing – commodity super computing High availability computing – missions critical applications
ECI – July Taxonomy of MIMD - summary
ECI – July Enabling Technologies Performance of individual components Microprocessor (x2 every 18 months) Memory capacity (x4 every 3 years) Storage (capacity same !) – SAN, NAS Network (scalable gigabit networks) OS, Programming environments Applications Rate of performance improvements exceeds specialized systems
ECI – July The “killer” workstation Traditional usage Workstations w/ Unix for science & industry PC’s for administrative work & work processing Recent trend Rapid convergence in processor performance and kernel-level functionality of PC vs Workstations Killer CPU, killer memory, killer network, killer OS, killer applications…
ECI – July Computer Food Chain
ECI – July Towards Commodity HPC Link together multiple computers to jointly solve a computational problem Ubiquitous availability of commodity high performance components Out: expensive and specialized proprietary and parallel computers In: cheaper clusters of loosely coupled workstations
ECI – July History of Cluster Computing s2000+ PDA Clusters
ECI – July Why PC/WS Clustering Now ? Individual PCs/WS become increasing powerful Development cycle of supercomputers too long Commodity networks bandwidth is increasing and latency is decreasing Easier to integrate into existing networks Typical low user utilization of PCs/WSs ( < 10% ) Development tools for PCs/WS are more mature PCs/WS clusters are cheap and readily available Clusters can leverage from future technologies and be easily grown
ECI – July What is a Cluster ? Cluster - a parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. Each node in the cluster is A UP/MP system with memory, I/O facilities, & OS Connected via fast interconnect or LAN Appear as a single system to users and applications
ECI – July Cluster Architecture Sequential Applications Parallel Applications Parallel Programming Environment Cluster Middleware (Single System Image and Availability Infrastructure) Cluster Interconnection Network/Switch PC/Workstation Network Interface Hardware Communications Software PC/Workstation Network Interface Hardware Communications Software PC/Workstation Network Interface Hardware Communications Software PC/Workstation Network Interface Hardware Communications Software Sequential Applications Parallel Applications
ECI – July A Winning Symbiosis Parallel Processing Create MPP or DSM –like parallel processing systems Network RAM Use cluster-wide available memory to aggregate a substantial cache in RAM Software RAID Use arrays of WS disks to provide cheap, highly available and scalable storage and parallel IO Multi-path communications Use multiple networks for parallel file transfer
ECI – July Design Issues Cost/performance ratio Increased Availability Single System Image (look-and-feel of one system) Scalability (physical, size, performance, capacity) Fast communication (network and protocols) Resource balancing (cpu, network, memory, storage) Security and privacy Manageability (administration and control) Usability and applicability (programming environment, cluster-aware apps)
ECI – July Cluster Objectives High performance Usually dedicated clusters for HPC Partitioning between users High throughput Steal idle cycles (cycle harvesting) Maximum utilization of available resources High availability Fail-over configuration Heartbeat connections Combined: HP nd HA
ECI – July Example: MOSIX at HUJI
ECI – July Example: Berkeley NOW
Cluster Components Nodes Operating System Network Interconnects Communication protocols & services Middleware Programming models Applications
ECI – July Cluster Components: Nodes Multiple High Performance Computers PCs Workstations SMPs (CLUMPS) Processors Intel/AMD x86 Processors IBM PowerPC Digital Alpha Sun SPARC
ECI – July Cluster Components: OS Basic services: Easy access to hardware Share hardware resources seemlessly Concurrency (multiple threads of control) Operating Systems: Linux(Beowulf, and many more) Microsoft NT(Illinois HPVM, Cornell Velocity) SUN Solaris(Berkeley NOW, C-DAC PARAM) Mach ( -kernel)(CMU) Cluster OS (Solaris MC, MOSIX) OS gluing layers(Berkeley Glunix)
ECI – July Cluster Components: Network High Performance Networks/Switches Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) SCI (Scalable Coherent Interface- 12µs latency) ATM (Asynchronous Transfer Mode) Myrinet (1.2Gbps) QsNet (5µsec latency for MPI messages) FDDI (fiber distributed data interface) Digital Memory Channel InfiniBand
ECI – July Cluster Components: Interconnects Standard Ethernet 10 Mbps, cheap, easy way deploy bandwidth & latency don’t match CPU capabilities Fast Ethernet, and Gigabit Ethernet Fast Ethernet – 100 Mbps Gigabit Ethernet – 1000Mbps Myrinet 1.28 Gbps full duplex interconnect, 5-10 s latency Programmable on-board processor Leverage MPP technology
ECI – July Interconnects (contd) Infiniband Latency < 7 s Insdustry standard based on VIA Connects components within a system SCI – Scalable Coherent Interface Interconnection technology for clusters Directory based cache scheme VIA – Virtual Interface Architecture Standard for low-latency communications software interface
ECI – July Cluster Interconnects: Comparison CriteriaGigabit Ethernet Gigabit cLAN InfinibandMyrinetSCI Bandwidth (MB/s)< 100< < 320 Latency (µs)< < Hardware Availability Now Linux SupportNow Max # of nodes1000’s > 1000’s1000’s Protocol implementation HardwareFirmware on adaptor HardwareFirmware on adaptor VIA supportNT / Linux SoftwareLinuxSoftware MPI supportMVICH3 rd partyMPI/Pro3 rd party
ECI – July Cluster Components: Communication protocols Fast Communication Protocols (and user Level Communication): Standard TCP/IP, 0-Copy TCP/IP Active Messages (Berkeley) Fast Messages (Illinois) U-net (Cornell) XTP (Virginia) Virtual Interface Architecture (VIA)
ECI – July Cluster Components: Communication services Communication infrastructure Bulk-data transport Streaming data Group communications Provide important QoS parameters Latency, bandwidth, reliability, fault-tolerance Wide range of communication methodologies RPC DSM Stream-based and message passing (e.g., MPI, PVM)
ECI – July Cluster Components: Middleware Resides between the OS and the applications Provides infrastructure to transparently support: Single System Image (SSI) Makes collection appear as a single machine System Availability (SA) Monitoring, checkpoint, restart, migration Resource Management and Scheduling (RMS)
ECI – July Cluster Components: Programming Models Threads (PCs, SMPs, NOW..) POSIX Threads, Java Threads OpenMP MPI (Message Passing Interface) PVM (Parallel Virtual Machine) Software DSMs (Shmem) Compilers Parallel code generators, C/C++/Java/Fortran Performance Analysis Tools Visualization Tools
ECI – July Cluster Components: Applications Sequential Parametric Modeling Embarrassingly parallel Parallel / Distributed Cluster-aware Grand Challenging applications Web servers, data-mining
ECI – July Clusters Classification (I) Application Target High Performance (HP) Clusters Grand Challenging Applications High Availability (HA) Clusters Mission Critical applications
ECI – July Clusters Classification (II) Node Ownership Dedicated Clusters Non-dedicated clusters Adaptive parallel computing Communal multiprocessing
ECI – July Clusters Classification (III) Node Hardware Clusters of PCs (CoPs) Piles of PCs (PoPs) Clusters of Workstations (COWs) Clusters of SMPs (CLUMPs)
ECI – July Clusters Classification (IV) Node Operating System Linux Clusters (e.g., Beowulf) Solaris Clusters (e.g., Berkeley NOW) NT Clusters (e.g., HPVM) AIX Clusters (e.g., IBM SP2) SCO/Compaq Clusters (Unixware) Digital VMS Clusters HP-UX clusters Microsoft Wolfpack clusters
ECI – July Clusters Classification (V) Node Configuration Homogeneous Clusters All nodes will have similar architectures and run the same OS Semi-Homogeneous Clusters Similar architectures and OS, varying performance capabilities Heterogeneous Clusters All nodes will have different architectures and run different OSs
ECI – July Clusters Classification (VI) Levels of Clustering Group Clusters (#nodes: 2-99) Departmental Clusters (#nodes: 10s to 100s) Organizational Clusters (#nodes: many 100s) National Metacomputers (WAN/Internet) International Metacomputers (Internet-based, #nodes: 1000s to many millions) Grid Computing Web-based Computing Peer-to-Peer Computing
ECI – July Summary: Key Benefits High Performance With cluster-aware applications High Throughput Resource balancing and sharing High Availability Redundancy in hardware, OS, applications Expandability and Scalability Expand on-demand by adding HW