1 Classification of Cluster Computer
2 Clusters Classification..1 c Based on Focus (in Market) –High Performance (HP) Clusters Grand Challenging Applications –High Availability (HA) Clusters Mission Critical applications
3 HA Cluster: Server Cluster with "Heartbeat" Connection
4 Clusters Classification..2 c Based on Workstation/PC Ownership –Dedicated Clusters –Non-dedicated clusters Adaptive parallel computing Also called Communal multiprocessing
5 Clusters Classification..3 c Based on Node Architecture.. –Clusters of PCs (CoPs) –Clusters of Workstations (COWs) –Clusters of SMPs (CLUMPs)
6 Building Scalable Systems: Cluster of SMPs (Clumps) Performance of SMP Systems Vs. Four-Processor Servers in a Cluster
7 Clusters Classification..4 c Based on Node OS Type.. –Linux Clusters (Beowulf) –Solaris Clusters (Berkeley NOW) –NT Clusters (HPVM) –AIX Clusters (IBM SP2) –SCO/Compaq Clusters (Unixware) –…….Digital VMS Clusters, HP clusters, ………………..
8 Clusters Classification..5 c Based on node components architecture & configuration (Processor Arch, Node Type: PC/Workstation.. & OS: Linux/NT..): –Homogeneous Clusters All nodes will have similar configuration –Heterogeneous Clusters Nodes based on different processors and running different OSes.
9 Clusters Classification..6a Dimensions of Scalability & Levels of Clustering Network Technology Platform Uniprocessor SMP Cluster MPP CPU / I/O / Memory / OS (1) (2) (3) Campus Enterprise Workgroup Department Public Metacomputing
10 Clusters Classification..6b c Group Clusters (#nodes: 2-99) –(a set of dedicated/non-dedicated computers - mainly connected by SAN like Myrinet) c Departmental Clusters (#nodes: ) c Organizational Clusters (#nodes: many 100s) c (using ATMs Net) c Internet-wide Clusters=Global Clusters: (#nodes: 1000s to many millions) –Metacomputing –Web-based Computing –Agent Based Computing Java plays a major in web and agent based computing
11 Cluster Middleware and Single System Image
12 Contents c What is Middleware ? c What is Single System Image ? c Benefits of Single System Image c SSI Boundaries c SSI Levels c Relationship between Middleware Modules. c Strategy for SSI via OS c Solaris MC: An example OS supporting SSI c Cluster Monitoring Software
13 What is Cluster Middleware ? c An interface between user applications and cluster hardware and OS platform. c Middleware packages support each other at the management, programming, and implementation levels. c Middleware Layers: –SSI Layer –Availability Layer: It enables the cluster services of Checkpointing, Automatic Failover, recovery from failure, fault-tolerant operating among all cluster nodes.
14 Middleware Design Goals c Complete Transparency –Lets the see a single cluster system.. Single entry point, ftp, telnet, software loading... c Scalable Performance –Easy growth of cluster no change of API & automatic load distribution. c Enhanced Availability –Automatic Recovery from failures Employ checkpointing & fault tolerant technologies –Handle consistency of data when replicated..
15 What is Single System Image (SSI) ? c A single system image is the illusion, created by software or hardware, that a collection of computing elements appear as a single computing resource. c SSI makes the cluster appear like a single machine to the user, to applications, and to the network. c A cluster without a SSI is not a cluster
16 Benefits of Single System Image c Usage of system resources transparently c Improved reliability and higher availability c Simplified system management c Reduction in the risk of operator errors c User need not be aware of the underlying system architecture to use these machines effectively
17 SSI vs. Scalability (design space of competing arch.)
18 Desired SSI Services c Single Entry Point –telnet cluster.my_institute.edu –telnet node1.cluster. institute.edu c Single File Hierarchy: xFS, AFS, Solaris MC Proxy c Single Control Point: Management from single GUI c Single virtual networking c Single memory space - DSM c Single Job Management: Glunix, Condin, LSF c Single User Interface: Like workstation/PC windowing environment (CDE in Solaris/NT), may it can use Web technology
19 Availability Support Functions c Single I/O Space (SIO): –any node can access any peripheral or disk devices without the knowledge of physical location. c Single Process Space (SPS) –Any process on any node create processes cluster wide and they communicate through signal, pipes, etc, as if they are one a single node. c Checkpointing and Process Migration. –Saves the process state and intermediate results in memory to disk to support rollback recovery when node fails. PM for Load balancing... c Reduction in the risk of operator errors c User need not be aware of the underlying system architecture to use these machines effectively
20 SSI Levels c It is a computer science notion of levels of abstractions (house is at a higher level of abstraction than walls, ceilings, and floors). Application and Subsystem LevelOperating System Kernel Level Hardware Level
21 Cluster Computing - Research Projects c Beowulf (CalTech and Nasa) - USA c CCS (Computing Centre Software) - Paderborn, Germany c Condor - Wisconsin State University, USA c DJM (Distributed Job Manager) - Minnesota Supercomputing Center c DQS (Distributed Queuing System) - Florida State University, USA c EASY - Argonne National Lab, USA c HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US c far - University of Liverpool, UK c Gardens - Queensland University of Technology, Australia c Generic NQS (Network Queuing System),University of Sheffield, UK c NOW (Network of Workstations) - Berkeley, USA c NIMROD - Monash University, Australia c PBS (Portable Batch System) - NASA Ames and LLNL, USA c PRM (Prospero Resource Manager) - Uni. of S. California, USA c QBATCH - Vita Services Ltd., USA
22 Cluster Computing - Commercial Software c Codine (Computing in Distributed Network Environment) - GENIAS GmbH, Germany c LoadLeveler - IBM Corp., USA c LSF (Load Sharing Facility) - Platform Computing, Canada c NQE (Network Queuing Environment) - Craysoft Corp., USA c OpenFrame - Centre for Development of Advanced Computing, India c RWPC (Real World Computing Partnership), Japan c Unixware (SCO-Santa Cruz Operations,), USA c Solaris-MC (Sun Microsystems), USA
23 Representative Cluster Systems 1. Solaris -MC 2. Berkeley NOW 3. their comparison with Beowulf & HPVM
24 Next Generation Distributed Computing: The Solaris MC Operating System
25 Why new software? c Without software, a cluster is: –Just a network of machines –Requires specialized applications –Hard to administer c With a cluster operating system: –Cluster becomes a scalable, modular computer –Users and administrators see a single large machine –Runs existing applications –Easy to administer c New software makes cluster better for the customer
26 Cluster computing and Solaris MC c Goal: use computer clusters for general-purpose computing c Support existing customers and applications c Solution: Solaris MC (Multi Computer) operating system A distributed operating system (OS) for multi-computers
27 What is the Solaris MC OS ? c Solaris MC extends standard Solaris c Solaris MC makes the cluster look like a single machine c Global file system c Global process management c Global networking c Solaris MC runs existing applications unchanged c Supports Solaris ANI (Application binary interface)
28 Applications c Ideal for: c Web and interactive servers c Databases c File servers c Timesharing c Benefits for vendors and customers c Preserves investment in existing applications c Modular servers with low entry-point price and low cost of ownership c Easier system administraion c Solaris could become a preferred platform for clustered systems
29 Solaris MC is a running research system c Designed, built and demonstrated Solaris MC prototype c CLuster of SPARCstations connected with Myrinet network c Runs unmodified commercial parallel database, scalable Web server, parallel make c Next: Solaris MC Phase II c High availability c New I/O work to take advantage of clusters c Performance evaluation
30 Advantages of Solaris MC c Leverages continuing investment in Solaris c Same applications: binary-compatible c Same kernel, device drivers, etc. c As portable as base Solaris - will run on SPARC, x86, PowerPC c State of the art distributed systems techniques c High availability designed into the system c Powerful distributed object-oriented framework c Ease of administration and use c Looks like a familiar multiprocessor server to users, sytem administrators, and applications
31 Solaris MC details c Solaris MC is a set of C++ loadable modules on top of Solaris –Very few changes to existing kernel c A private Solaris kernel per node: provides reliability c Object-oriented system with well-defined interfaces
32 Solaris MC components c Object and communication support c High availability support c PXFS global distributed file system c Process mangement c Networking
33 Object Orientation c Better software maintenance, change, and evolution c Well-defined interfaces c Separate implementation from interface c Interface inheritance c Solaris MC uses: c IDL: a better way to define interfaces c CORBA object model: a better RPC (Remote Procedure Call) c C++: a better C
34 Object and Communication Framework c Mechanism for nodes and modules to communicate c Inter-node and intra-node interprocess communication c Optimized protocols for trusted computing base c Efficient, low-latency communication primitives c Object communication independent of interconnect c We use Ethernet, fast Ethernet, FibreChannel, Myrinet Allows interconnect hardware to be up grade d
35 High Availability Support c Node failure doesn’t crash entire system c Unaffected nodes continue running c Better than a SMP c A requirement for mission critical market c Well-defined failure boundaries c Separate kernel per node - OS does not use shared memory c Object framework provides support c Delivers failure notifications to servers and clients c Group membership protocol detects node failures c Each subsystem is responsible for its recovery c Filesystem, process management, networking, applications
36 PXFS: Global Filesystem c Single-system image of file sytem c Backbone of Solaris MC c Coherent access and caching of files and directories c Caching provides high performance c Access to I/O devices
37 PXFS: An object-oriented VFS c PXFS builds on existing Solaris file sytems c Uses the vnode/virtual file system interface (VFS) externally c Uses object communication internally
38 Process management c Provide global view of processes on any node c Users, administrators, and applications see global view c Supports existing applications c Uniform support for local and remote processes c Process creation/waiting/exiting (including remote execution) c Global process identifiers, groups, sessions c Signal handling c procfs (/proc)
39 Process management benefits c Global process management helps users and administrators c Users see familiar single machine process model c Can run programs on any node c Location of process in the cluster doesn’t matter c Use existing commands and tools: unmodified ps, kill, etc.
40 Networking goals c Cluster appears externally as a single SMP server c Familiar to customers c Access cluster through single network address c Multiple network interfaces supported but not required c Scalable design c protocol and network application processing on any mode c Parallelism provides high server performance
41 Networking: Implementation c A programmable “packet filter” c Packets routed between network device and the correct node c Efficient, scalable, and supports parallelism c Supports multiple protocols with existing protocol stacks c Parallelism of protocol processing and applications c Incoming connections are load-balanced across the cluster
42 Status 4 node, 8 CPU prototype with Myrinet demonstrated Object and communication infrastructure Global file system (PXFS) with coherency and caching Networking TCP/IP with load balancing Global process management (ps, kill, exec, wait, rfork, /proc) Monitoring tools Cluster membership protocols Demonstrated applications Commercial parallel database Scalable Web server Parallel make Timesharing c Solaris-MC team is working on high availability
43 Summary of Solaris MC c Clusters likely to be an important market c Solaris MC preserves customer investment in Solaris c Uses existing Solaris applications c Familiar to customers c Looks like a multiprocessor, not a special cluster architecture c Ease of administration and use c Clusters are ideal for important applications c Web server, file server, databases, interactive services c State-of-the-art object-oriented distributed implementation c Designed for future growth
44 Berkeley NOW Project
45 Berkeley c Design & Implementation of higher-level system c Global OS (Glunix) c Parallel File Systems (xFS) c Fast Communication (HW for Active Messages) c Application Support c Overcoming technology shortcomings c Fault tolerance c System Management c NOW Goal: Faster for Parallel AND Sequential
46 NOW Software Components AM L.C.P. VN segment Driver Unix Workstation AM L.C.P. VN segment Driver Unix Workstation AM L.C.P. VN segment Driver Unix Workstation AM L.C.P. VN segment Driver Unix (Solaris) Workstation Global Layer Unix Myrinet Scalable Interconnect Large Seq. Apps Parallel Apps Sockets, Split-C, MPI, HPF, vSM Active Messages Name Svr Scheduler
47 Active Messages: Lightweight Communication Protocol c Key Idea: Network Process ID attached to every message that HW checks upon receipt c Net PID match, as fast as before c Net PIC mismatch, interrupt and invoke OS c Can mix LAN messages and MPP messages; invoke OS & TCP/IP only when not cooperating (if everyone uses same physical layer format)
48 MPP Active Messages c Key Idea: associate a small user-level handler directly with each message c Sender injects the message directly into the network c Handler executes immediately upon arrival c Pulls the message out of the network and integrates it into the ongoing computation, or replies c No buffering (beyond transport), no parsing, no allocation, primitive scheduling
49 Active Message Model c Every message contains at its header the address of a user level handler which gets executed immediately in user level c No receive side buffering of messages c Supports protected multiprogramming of a large number of users onto finite physical network resource c Active message operations, communication events and threads are integrated in a simple and cohesive model c Provides naming and protection
50 Active Message Model (Contd..) data structs data structs primary computation primary computation handler datapc Active Message Network
51 xFS: File System for NOW c Serverless File System: All data with clients c Uses MP cache coherency to reduce traffic c Files striped for parallel transfer c Large file cache (“cooperative caching-Network RAM”) Miss Rate Response Time Client/Server10%1.8 ms xFS 4%1.0 ms (42 WS, 32 MB/WS, 512 MB/server, 8KB/access)
52 Glunix: Gluing Unix c It is built onto of Solaris c It glues together Solaris running on Cluster nodes. c Support transparent remote execution, load balancing, allows to run existing applications. c Provides globalized view of system resources like SolarisMC c Gang schedule parallel jobs to be as good as dedicated MPP for parallel jobs
53 3 Paths for Applications on NOW? c Revolutionary (MPP Style): write new programs from scratch using MPP languages, compilers, libraries,… c Porting: port programs from mainframes, supercomputers, MPPs, … c Evolutionary: take sequential program & use 1)Network RAM: first use memory of many computers to reduce disk accesses; if not fast enough, then: 2)Parallel I/O: use many disks in parallel for accesses not in file cache; if not fast enough, then: 3)Parallel program: change program until it sees enough processors that is fast =>Large speedup without fine grain parallel program
54 Comparison of 4 Cluster Systems
55 Clusters Revisited
56 Summary + We have discussed Clusters +Enabling Technologies +Architecture & its Components +Classifications +Middleware +Single System Image +Representative Systems
57 Conclusions + Clusters are promising.. +Solve parallel processing paradox +Offer incremental growth and matches with funding pattern. +New trends in hardware and software technologies are likely to make clusters more promising..so that +Clusters based supercomputers can be seen everywhere!