Clustering Next Wave In PC Computing. 2 PP150299.ppt Cluster Concepts 101  This section is about clusters in general, we’ll get to Microsoft’s Wolfpack.

Clustering Next Wave In PC Computing

2 PP150299.ppt Cluster Concepts 101  This section is about clusters in general, we’ll get to Microsoft’s Wolfpack cluster implementation in the next section.

3 PP150299.ppt Why Learn About Clusters  Today clusters are a niche Unix market  But Microsoft will bring clusters to the masses  Last October, Microsoft announced NT clusters  SCO announced UnixWare clusters  Sun announced Solaris / Intel clusters  Novell announced Wolf Mountain clusters  In 1998, 2M Intel servers will ship  100K in clusters  In 2001, 3M Intel servers will ship  1M in clusters (IDC’s forecast)  Clusters will be a huge market and RAID is essential to clusters

4 PP150299.ppt What Are Clusters?  Group of independent systems that  Function as a single system  Appear to users as a single system  And are managed as a single system’  Clusters are “virtual servers”

5 PP150299.ppt Why Clusters  #1. Clusters Improve System Availability  This is the primary value in Wolfpack-I clusters  #2. Clusters Enable Application Scaling  #3. Clusters Simplify System Management  #4. Clusters (with Intel servers) Are Cheap

6 PP150299.ppt Why Clusters - #1  #1. Clusters Improve System Availability  When a networked server fails, the service it provided is down  When a clustered server fail, the service it provided “failsover” and downtime is avoided Mail Server Internet Server Networked Servers Clustered Servers Mail & Internet

7 PP150299.ppt Why Clusters - #2  #2. Clusters Enable Application Scaling  With networked SMP servers, application scaling is limited to a single server  With clusters, applications scale across multiple SMP servers (typically up to 16 servers)

8 PP150299.ppt Why Clusters - #3  #3. Clusters Simplify System Management  Clusters present a Single System Image; the cluster looks like a single server to management applications  Hence, clusters reduce system management costs Three Management Domains One Management Domain

9 PP150299.ppt Why Clusters - #4  #4. Clusters (with Intel servers) Are Cheap  Essentially no additional hardware costs  Microsoft charges an extra $3K per node  Windows NT Server$1,000  Windows NT Server, Enterprise Edition$4,000 Note: Proprietary Unix cluster software costs $10K to $25K per node.

10 PP150299.ppt An Analogy to RAID  RAID Makes Disks Fault Tolerant  Clusters make servers fault tolerant  RAID Increases I/O Performance  Clusters increase compute performance  RAID Makes  Disks Easier to Manage  Clusters make servers easier to manage RAID

11 PP150299.ppt Two Flavors of Clusters  #1. High Availability Clusters  Microsoft’s Wolfpack 1  Compaq’s Recovery Server  # 2. Load Balancing Clusters (a.k.a. Parallel Application Clusters)  Microsoft’s Wolfpack 2  Digital’s VAXClusters Note: Load balancing clusters are a superset of high availability clusters.

12 PP150299.ppt High Availability Clusters  Two node clusters (node = server)  During normal operations, both servers do useful work  Failover  When a node fails, applications failover to the surviving node and it assumes the workload of both nodes MailWeb Mail & Web

13 PP150299.ppt High Availability Clusters  Failback  When the failed node is returned to service, the applications failback MailWeb Mail

14 PP150299.ppt Load Balancing Clusters  Multi-node clusters (two or more nodes)  Load balancing clusters typically run a single application, (e.g. database, distributed across all nodes)  Cluster capacity is increased by adding nodes (but like SMP servers, scaling is less than linear) 3,000 TPM3,600 TPM

15 PP150299.ppt Load Balancing Clusters  Cluster rebalances the workload when a node dies  If different apps are running on each server, they failover to the least busy server or as directed by predefined failover policies

16 PP150299.ppt Two Cluster Models  #1. “Shared Nothing” Model  Microsoft’s Wolfpack Cluster  #2. “Shared Disk” Model  VAXClusters

17 PP150299.ppt #1. “Shared Nothing” Model  At any moment in time, each disk is owned and addressable by only one server  “Shared nothing” terminology is confusing  Access to disks is shared -- on the same bus  But at any moment in time, disks are not shared RAID

18 PP150299.ppt #1. “Shared Nothing” Model  When a server fails, the disks that it owns “failover” to the surviving server transparently to the clients RAID

19 PP150299.ppt #2. “Shared Disk” Model  Disks are not owned by servers but shared by all servers  At any moment in time, any server can access any disk  Distributed Lock Manager arbitrates disk access so apps on different servers don’t step on one another (corrupt data) RAID

20 PP150299.ppt Cluster Interconnect  This is about how servers are tied together and how disks are physically connected to the cluster

21 PP150299.ppt Cluster Interconnect  Clustered servers always have a client network interconnect, typically Ethernet, to talk to users  And at least one cluster interconnect to talk to other nodes and to disks RAID Cluster Interconnect Client Network HBA

22 PP150299.ppt Cluster Interconnects (cont’d)  Or They Can Have Two Cluster Interconnects  One for nodes to talk to each other -- “Heartbeat Interconnect”  Typically Ethernet  And one for nodes to talk to disks -- “Shared Disk Interconnect”  Typically SCSI or Fibre Channel RAID Shared Disk Interconnect Cluster Interconnect HBA NIC

Micosoft’s Wolfpack Clusters

24 PP150299.ppt Clusters Are Not New  Clusters Have been Around Since 1985  Most UNIX Systems are Clustered  What’s New is Microsoft Clusters  Code named “Wolfpack”  Named Microsoft Cluster Server (MSCS)  Software that provides clustering  MSCS is part of Window NT, Enterprise Server

25 PP150299.ppt Microsoft Cluster Rollout  Wolfpack-I  In Windows NT, Enterprise Server, 4.0 (NT/E, 4.0) [Also includes Transaction Server and Reliable Message Queue]  Two node “failover cluster”  Shipped October, 1997  Wolfpack-II  In Windows NT, Enterprise Server 5.0 (NT/E 5.0)  “N” node (probably up to 16) “load balancing cluster”  Beta in 1998 and ship in 1999

26 PP150299.ppt MSCS (NT/E, 4.0) Overview  Two Node “Failover” Cluster  “Shared Nothing” Model  At any moment in time, each disk is owned and addressable by only one server  Two Cluster Interconnects  “Heartbeat” cluster interconnect  Ethernet  Shared disk interconnect  SCSI (any flavor)  Fibre Channel (SCSI protocol over Fibre Channel)  Each Node Has a “Private System Disk”  Boot disk

27 PP150299.ppt MSCS (NT/E, 4.0) Topologies  #1. Host-based (PCI) RAID Arrays  #2. External RAID Arrays

28 PP150299.ppt NT Cluster with Host-Based RAID Array  Each node has  Ethernet NIC -- Heartbeat  Private system disk (generally on an HBA)  PCI-based RAID controller -- SCSI or Fibre  Nodes share access to data disks but do not share data RAID Shared Disk Interconnect “Heartbeat” Interconnect RAIDHBA NIC

29 PP150299.ppt NT Cluster with SCSI External RAID Array  Each node has  Ethernet NIC -- Heartbeat  Multi-channel HBA’s connect boot disk and external array  Shared external RAID controller on the SCSI Bus -- DAC SX RAID Shared Disk Interconnect “Heartbeat” Interconnect HBA NIC

30 PP150299.ppt NT Cluster with Fibre External RAID Array  DAC SF or DAC FL (SCSI to disks)  DAC FF (Fibre to disks) RAID Shared Disk Interconnect “Heartbeat” Interconnect HBA NIC

MSCS -- A Few of the Details Managers -->

32 PP150299.ppt Cluster Interconnect & Heartbeats  Cluster Interconnect  Private Ethernet between nodes  Used to transmit “I’m alive” heartbeat messages  Heartbeat Messages  When a node stops getting heartbeats, it assumes the other node has died and initiates failover  In some failure modes both nodes stop getting heartbeats (NIC dies or someone trips over the cluster cable)  Both nodes are still alive  But each thinks the other is dead  Split brain syndrome  Both nodes initiate failover  Who wins?

33 PP150299.ppt Quorum Disk  Special cluster resource that stores the cluster log  When a node joins a cluster, it attempts to reserve the quorum disk (purple disk)  If the quorum disk does not have an owner, the node takes ownership and forms a cluster  If the quorum disk has an owner, the node joins the cluster RAID Disk Interconnect Cluster “Heartbeat” Interconnect RAID HBA

34 PP150299.ppt Quorum Disk  If Nodes Cannot Communicate (no heartbeats)  Then only one is allow to continue operating  They use the quorum disk to decide which one lives  Each node waits, then tries to reserve the quorum disk  Last owner waits the shortest time and if it’s still alive will take ownership of the quorum disk  When the other node attempts to reserve the quorum disk, it will find that it’s already owned  The node that doesn’t own the quorum disk then failsover  This is called the Challenge / Defense Protocol

35 PP150299.ppt Microsoft Cluster Server (MSCS)  MSCS Objects  Lots of MSCS objects but only two we care about  Resources and Groups  Resources  Applications, data files, disks, IP addresses,...  Groups  Application and related resources like data on disks

36 PP150299.ppt Microsoft Cluster Server (MSCS)  When a server dies, groups failover  When a server is repaired and returned to service, groups failback  Since data on disks is included in groups, disks failover and failback Group: Mail Resource Group: Mail Resource Group: Mail Resource Group: Web Resource Group: Web Resource Group: Web Resource

37 PP150299.ppt Groups Failover  Groups are the entities that failover  And they take their disks with them Group: Mail Resource Group: Mail Resource Group: Mail Resource Group: Web Resource Group: Web Resource Group: Web Resource Group: Mail Resource Group: Mail Resource Group: Mail Resource

38 PP150299.ppt Microsoft Cluster Certification  Two Levels of Certification  Cluster Component Certification  HBA’s and RAID controllers must be certified  When they pass: They’re listed on the Microsoft web site www.microsoft.com/hwtest/hcl/ They’re eligible for inclusion in cluster system certification  Cluster System Certification  Complete two node cluster  When they pass: They’re listed on the Microsoft web site They’ll be supported by Microsoft  Each Certification Takes 30 - 60 Days

Mylex NT Cluster Solutions

40 PP150299.ppt Commodity PC Market (Including Mobile) PC based Workstations Performance Desktop PCs Target Markets Entry Level Servers AcceleRAID ™ 200 eXtremeRAID-1100 DAC-PJ DAC-PG AcceleRAID ™ 250 DAC-FF DAC-FL DAC-SF DAC-SX AcceleRAID ™ 150 Mid Range Enterprise Servers

41 PP150299.ppt Internal vs External RAID Positioning  Internal RAID  Lower cost solution  Higher performance in read-intensive applications  Proven TPC-C performance enhances cluster performance  External RAID  Higher performance in write-intensive applications  Write-back cache is turned-off in PCI-RAID controllers  Higher connectivity  Attach more disk drives  Greater footprint flexibility  Until PCI-RAID implements fibre

42 PP150299.ppt Why We’re Better -- External RAID  Robust Active - Active Fibre Implementation  Shipping active - active for over a year  It works in NT (certified) and Unix environments  Have Fibre on the back-end soon  Mirrored Cache Architecture  Without mirrored cache, data is inaccessible or dropped on the floor when a controller fails  Unless you turn-off the write-back cache which degrades write performance by 5x to 30x.  Four to Six Disk Channels  I/O bandwidth and capacity scaling  Dual Fibre Host Ports  NT expects to access data over pre-configured paths  If it doesn’t find the data over the expected path, then I/O’s don’t complete and applications fail

43 PP150299.ppt SX Active / Active Duplex HBA SX Ultra SCSI Disk Interconnect Cluster Interconnect

44 PP150299.ppt SF (or FL) Active / Active Duplex HBA SF FC HBA Single FC Array Interconnect

45 PP150299.ppt SF (or FL) Active / Active Duplex HBA Dual FC Array Interconnect FC HBA FC Disk Interconnect FC HBA SF

46 PP150299.ppt FF Active / Active Duplex HBA Single FC Array Interconnect FC HBA FF

47 PP150299.ppt FF Active / Active Duplex HBA Dual FC Array Interconnect FC HBA FF

48 PP150299.ppt Why We’ll Be Better -- Internal RAID  Deliver Auto-Rebuild  Deliver RAID Expansion  MORE-IAdd Logical Units On-line  MORE-IIAdd or Expand Logical Units On-line  Deliver RAID Level Migration  0 ---> 1  1 ---> 0  0 ---> 5  5 ---> 0  1 ---> 5  5 ---> 1  And (of course) Award Winning Performance

49 PP150299.ppt  Nodes have:  Ethernet NIC -- Heartbeat  Private system disks (HBA)  PCI-based RAID controller eXtreme RAID Shared Disk Interconnect “Heartbeat” Interconnect eXtreme RAID HBA NIC NT Cluster with Host-Based RAID Array

50 PP150299.ppt Why eXtremeRAID & DAC960PJ Clusters  Typically four or less processors  Offers a less expensive, integrated RAID solution  Can combine clustered and non clustered applications in the same enclosure  Uses today’s readily available hardware

51 PP150299.ppt TPC-C Performance for Clusters Two External Ultra Channels At 40 MB/sec 32 bit PCI bus between the controller and the server, providing burst data transfer rates up to 132 MB/sec. Three internal Ultra Channels At 40 MB/sec 66 Mhz I960 processor off-loads RAID management from the host CPU DAC960PJ

52 PP150299.ppt  eXtremeRAID™ achieves breakthrough in RAID technology, eliminates storage bottlenecks and delivers scaleable performance for NT Clusters. LEDsSerial Port 233 MHz RISC processor CPU NVRAM Ch 0Ch 1 Ch 0 (bottom) Ch 2 (top) SCSI PCI Bridge BASS DAC Memory Module with BBU 80 MB/sec. 64 bit PCI bus doubles data bandwidth between the controller and the server, providing burst data transfer rates up to 266 MB/sec. 3 - Ultra2 SCSI LVD channels for up to 42 shared storage devices and Connectivity Up To 12 Meters 233 MHz strong ARM RISC processor off-loads RAID management from the host CPU Mylex’s new firmware is optimized for performance and manageability eXtremeRAID™ supports up to 42 drives, per cluster, as much as 810 GB of capacity per controller. Performance increases as you add drives. eXtremeRAID ™ : Blazing Clusters

53 PP150299.ppt eXtremeRAID ™ 1100 NT Clusters  Nodes have:  Ethernet NIC -- Heartbeat  Private system disks (HBA)  PCI-based RAID controller  Nodes share access to data disks but do not share data 3 Shared Ultra2 Interconnects “Heartbeat” Interconnect HBA NIC eXtreme RAID eXtreme RAID

54 PP150299.ppt Cluster Support Plans  Internal RAID  Windows NT 4.01998  Windows NT 5.01999  Novell OrionQ4 98  SCOTBD  SUNTBD  External RAID  Windows NT 4.01998  Windows NT 5.01999  Novell OrionTBD  SCOTBD

55 PP150299.ppt Plans For NT Cluster Certification  Microsoft Clustering (submission dates)  DACSXCompleted (Simplex)  DACSFCompleted (Simplex)  DACSXJuly (Duplex)  DACSFJuly (Duplex)  DACFLAugust (Simplex)  DACFLAugust (Duplex)  DAC960 PJQ4 ‘99  eXtremeRAID™ 1164 Q4 ‘99  AcceleRAID™Q4 ‘99

56 PP150299.ppt What RAID Arrays are Right for Clusters eXtremeRAID ™ -1100 AcceleRAID ™ 200 AcceleRAID ™ 250 DAC SF DAC FL DAC FF

Clustering Next Wave In PC Computing. 2 PP150299.ppt Cluster Concepts 101  This section is about clusters in general, we’ll get to Microsoft’s Wolfpack.

Similar presentations

Presentation on theme: "Clustering Next Wave In PC Computing. 2 PP150299.ppt Cluster Concepts 101  This section is about clusters in general, we’ll get to Microsoft’s Wolfpack."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering Next Wave In PC Computing. 2 PP150299.ppt Cluster Concepts 101  This section is about clusters in general, we’ll get to Microsoft’s Wolfpack.

Similar presentations

Presentation on theme: "Clustering Next Wave In PC Computing. 2 PP150299.ppt Cluster Concepts 101  This section is about clusters in general, we’ll get to Microsoft’s Wolfpack."— Presentation transcript:

Similar presentations

About project

Feedback