Clustering Technology In Windows NT Server, Enterprise Edition Jim Gray Microsoft Research Gray@Microsoft.com research.Microsoft.com/~gray.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Clustering Technology For Scaleability Jim Gray Microsoft Research
Clustering Technology In Windows NT Server, Enterprise Edition Jim Gray Microsoft Research research.Microsoft.com/~gray.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Chapter 3 Chapter 3: Server Hardware. Chapter 3 Learning Objectives n Describe the base system requirements for Windows NT 4.0 Server n Explain how to.
Lesson 1: Configuring Network Load Balancing
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Installing software on personal computer
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
GDC Workshop Session 1 - Storage 2003/11. Agenda NAS Quick installation (15 min) Major functions demo (30 min) System recovery (10 min) Disassembly (20.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
Chapter 7: Using Windows Servers to Share Information.
Module 13: Configuring Availability of Network Resources and Content.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Module 1: Installing and Upgrading to Exchange Server 2003.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
1 Web Server Administration Chapter 2 Preparing For Server Installation.
Module 10: Maintaining High-Availability. Overview Introduction to Availability Increasing Availability Using Failover Clustering Standby Servers and.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Microsoft Corporation Windows 2000 클러스터 s 구축 Microsoft TAM Yong Il Lee.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
Components of a Sysplex. A sysplex is not a single product that you install in your data center. Rather, a sysplex is a collection of products, both hardware.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Clustering Servers Chapter Seven. Exam Objectives in this Chapter:  Plan services for high availability Plan a high availability solution that uses clustering.
Module 7: SQL Server Special Considerations. Overview SQL Server High Availability Unicode.
70-412: Configuring Advanced Windows Server 2012 services
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
SMOOTHWALL FIREWALL By Nitheish Kumarr. INTRODUCTION  Smooth wall Express is a Linux based firewall produced by the Smooth wall Open Source Project Team.
Networking Week #10 OBJECTIVES Chapter #6 Questions Review Chapter #8.
Chapter 7: Using Windows Servers
rain technology (redundant array of independent nodes)
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Create setup scripts simply and easily.
Chapter Objectives In this chapter, you will learn:
Server Upgrade HA/DR Integration
Services DFS, DHCP, and WINS are cluster-aware.
Failover and High Availability
Managing Multi-User Databases
Direct Attached Storage and Introduction to SCSI
Network Load Balancing
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Software Design and Architecture
Objectives Differentiate between the different editions of Windows Server 2003 Explain Windows Server 2003 network models and server roles Identify concepts.
Introduction to Networks
Introduction to Networks
Clustering Technology For Fault Tolerance
Direct Attached Storage and Introduction to SCSI
QNX Technology Overview
Web Server Administration
SpiraTest/Plan/Team Deployment Considerations
Chapters 1-3 Concepts NT Server Capabilities
Introduction To Distributed Systems
Chapter 2: Planning for Server Hardware
Hard Drives & RAID PM Video 10:28
Designing IIS Security (IIS – Internet Information Service)
Instructor Materials Chapter 5: Windows Installation
Seminar on Enterprise Software
Presentation transcript:

Clustering Technology In Windows NT Server, Enterprise Edition Jim Gray Microsoft Research Gray@Microsoft.com research.Microsoft.com/~gray

Today’s Agenda Windows NT® clustering MSCS (Microsoft Cluster Server) Demo MSCS background Design goals Terminology Architectural details Setting up a MSCS cluster Hardware considerations Cluster application issues Q&A

Extra Credit Included in your presentation materials but not covered in this session Reference materials SCSI primer Speakers notes included Hardware Certification

MSCS In Action

High Availability Versus Fault Tolerance High Availability: mask outages through service restoration Fault-Tolerance: mask local faults RAID disks Uninterruptible Power Supplies Cluster Failover Disaster Tolerance: masks site failures Protects against fire, flood, sabotage,.. Redundant system and service at remote site

Windows NT Clusters What is clustering to Microsoft? Group of independent systems that appear as a single system Managed as a single system Common namespace Services are “cluster-wide” Ability to tolerate component failures Components can be added transparently to users Existing client connectivity is not effected by clustered applications

Microsoft Cluster Server 2-node available 97Q3 Commoditize fault-tolerance (high availability) Commodity hardware (no special hardware) Easy to set up and manage Lots of applications work out of the box. Multi-node Scalability in NT5 timeframe

MSCA Initial Goals Manageability Availability Reliability Manage nodes as a single system Perform server maintenance without affecting users Mask faults, so repair is non-disruptive Availability Restart failed applications and servers Un-availability ~ MTTR / MTBF , so quick repair Detect/warn administrators of failures Reliability Accommodate hardware and software failures Redundant system without mandating a dedicated “stand by” solution

MSCS Cluster Server A Server B Heartbeat Cluster management Client PCs Server A Server B Heartbeat Cluster management Disk cabinet A Disk cabinet B

Failover Example Server 1 Server 2 Server 1 Server 2 Browser Web site Database Database Web site files Database files

Basic MSCS Terms Resource - basic unit of failover Group - collection of resources Node - Windows NT® Server running cluster software Cluster - one or more closely-coupled nodes, managed as a single entity

MSCS Namespace Cluster view Cluster name Node name Node name Virtual server name Virtual server name Virtual server name Virtual server name

MSCS Namespace Outside world view Cluster Node 1 Node 2 Virtual server 1 Virtual server 2 Virtual server 3 Internet Information Server SQL MTS “Falcon” Microsoft Exchange IP address: 1.1.1.1 Network name: WHECCLUS IP address: 1.1.1.2 Network name: WHECNode1 IP address: 1.1.1.3 Network name: WHECNode2 IP address: 1.1.1.4 Network name: WHEC-VS1 IP address: 1.1.1.5 Network name: WHEC-VS2 IP address: 1.1.1.6 Network name: WHEC-VS3

Windows NT Clusters Target applications Application & Database servers E-mail, groupware, productivity applications server Transaction processing servers Internet Web servers File and print servers

MSCS Design Philosophy Shared nothing Simplified hardware configuration Remoteable tools Windows NT manageability enhancements Never take a “cluster” down: shell game rolling upgrade Microsoft® BackOffice™ product support Provide clustering solutions for all levels of customer requirements Eliminate cost and complexity barriers

MSCS Design Philosophy Availability is core for all releases Single server image for administration, client interaction Failover provided for unmodified server applications, unmodified clients (cluster-aware server applications get richer features) Failover for file and print are default Scalability is phase 2 focus

Non-Features Of MSCS Not lock-step/fault-tolerant Not able to “move” running applications MSCS restarts applications that are failed over to other cluster members Not able to recover shared state between client and server (i.e., file position) All client/server transactions should be atomic Standard client/server development rules still apply ACID always wins

Setting Up MSCS Applications

Attributes Of Cluster- Aware Applications A persistence model that supports orderly state transition Database example ACID transactions Database log recovery Client application support IP clients only How are retries supported? No name service location dependencies Custom resource DLL is a good thing

MSCS Services For Application Support Name service mapper GetComputerName resolves to virtual server name Registry replication Key and underlying keys and values are replicated to the other node Atomic Logged to insure partitions in time are handled

Application Deployment Planning System configuration is crucial Adequate hardware configuration You can’t run Microsoft BackOffice on a 32-MB 75mhz Pentium Planning of preferred group owners Good understanding of single-server performance is critical See Windows NT Resource Kit performance planning section Understand working set size What is acceptable performance to the business units?

Evolution Of Cluster- Aware Applications Active/passive - general out-of- the-box applications Active/active - applications that can run simultaneously on multiple nodes Highly scalable - extending the active/active through I/O shipping, process groups, and other techniques

Application Evolution Application Node 1 Node 2 Microsoft SQL Server  Microsoft Transaction  Server (MTS) Internet Information  Server (IIS) Microsoft Exchange  Server

Evolution Of Cluster- Aware Applications Application Node 1 Node 2 Node 3 Node 4 Microsoft SQL Server     Microsoft Transaction     Server (MTS) Internet Information     Server (IIS) Microsoft Exchange     Server

Resources What are they? Resources are basic system components such as physical disks, processes, databases, IP addresses, etc., that provide a service to clients in a client/server environment They are online in only one place in the cluster at a time They can fail over from one system in the cluster to another system in the cluster

Resources MSCS includes resource DLL support for: Physical and logical disk IP address and network name Generic service or application File share Print queue Internet Information Server virtual roots Distributed Transaction Coordinator (DTC) Microsoft Message Queue (MSMQ) Supports resource dependencies Controlled via well-defined interface Group: offers a “virtual server”

Cluster Service To Resource Windows NT cluster service Resource monitor Initiate changes Resource events Physical disk resource DLL IP address resource DLL Generic app resource DLL Database resource DLL Disk Network App Database

Cluster Abstractions Resource Cluster Resource Group Resource: program or device managed by a cluster e.g., file service, print service, database server can depend on other resources (startup ordering) can be online, offline, paused, failed Resource Group: a collection of related resources hosts resources; belongs to a cluster unit of co-location; involved in naming resources Cluster: a collection of nodes, resources, and groups cooperation for authentication, administration, naming

Resources Resource Cluster Group Resources have... Type: what it does (file, DB, print, Web…) An operational state (online/offline/failed) Current and possible nodes Containing Resource Group Dependencies on other resources Restart parameters (in case of resource failure)

Resource Fails over (moves) from one machine to another Logical disk IP address Server application Database May depend on another resource Well-defined properties controlling its behavior

Resource Dependencies A resource may depend on other resources A resource is brought online after any resources it depends on A resource is taken offline before any resources it depends on All dependent resources must fail over together

Dependency Example Database resource DLL Generic application IP address resource DLL Drive E: resource DLL Drive F: resource DLL

Group Example Payroll group Database resource DLL Generic application IP address resource DLL Drive E: resource DLL Drive F: resource DLL

MSCS Architecture Network Cluster API Resource API Cluster administrator Cluster API DLL Cluster API stub Cluster.Exe Cluster API DLL Log Manager Global Update Manager Database Manager Membership Manager Event Processor Checkpoint Manager Object Manager Node Manager Failover Manager Resource Manager Resource monitors Application resource DLL Resource API Physical resource DLL Logical resource DLL Application resource DLL Reliable Cluster Transport + Heartbeat Network

MSCS Architecture Cluster service is comprised of the following objects Failover Manager (FM) Resource Manager (RM) Node Manager (NM) Membership Manager (MM) Event Processor (EP) Database Manager (DM) Object Manager (OM) Global Update Manager (LM) Checkpoint Manager (CM) More about these in the next session

Setting Up An MSCS Cluster

MSCS Key Components Two servers Shared SCSI bus Interconnect Multi versus uniprocessor Heterogeneous servers Shared SCSI bus SCSI HBAs, SCSI RAID HBAs, HW RAID boxes Interconnect Many types can be supported Remember, two NICs per node PCI for cluster interconnect Complete MSCS HCL configuration

MSCS Setup Most common problems Duplicate SCSI IDs on adapters Incorrect SCSI cabling SCSI Card order on PCI bus Configuration of SCSI Firmware Let’s walk through getting a cluster operational

Test Before You Build Bring each system up independently Network adapters Cluster interconnect Organization interconnect SCSI and disk function NTFS volume(s)

Top Ten Setup “Concerns” 10. SCSI is not well known. Please use the MSCS and IHV setup documentation. Consider the SCSI book reference for this session 9. Build a support model that will support clustering requirements. For example, in clustering components are paired exactly (i.e., SCSI bios revision levels. Include this in your plans) 8. Build extra time into your deployment planning to accommodate cluster setup, both for hardware and software. Hardware examples include SCSI setup. Software issues would include installation across cluster nodes 7. Know the certification process and its support implications

Top Ten Setup “Concerns” 6. Applications will become more cluster-aware through time. This will include better setup, diagnostics, and documentation. In the meantime, plan and test accordingly 5. Clustering will impact your server maintenance and upgrade methodologies. Plan accordingly 4. Use multiple network adapters and hubs to eliminate single points of failure (everywhere possible) 3. Today’s clustering solutions are more complex to install and configure than single servers. Plan your deployments accordingly 2. Make sure that your cabinet solutions and peripherals both fit and function well. Consider the serviceability implications 1. Cabling is a nightmare. Color coded, heavily documented, Y cable inclusive, maintenance-designed products are highly desirable

Cluster Management Tools Cluster administrator Monitor and manage cluster Cluster CLI/COM Command line and COM interface Minor modifications to existing tools Performance monitor Add ability to watch entire cluster Disk administrator Add understanding of shared disks Event logger Broadcast events to all nodes

MSCS Reference Materials In Search of Clusters; The Coming Battle In Lowly Parallel Computing Gregory F. Pfister ISBN 0-13-437625-0 The Book of SCSI Peter M. Ridge ISBN 1-886411-02-6

The Basics Of SCSI Why SCSI? Types of interfaces? Caching and performance… RAID The future…

Why SCSI? Faster then IDE - intelligent card/drive Uses less processor time Can transfer data up to 100 MB/sec. More devices on a single chain - up to 15 Wider variety of devices DASD Scanners CD-ROM writers and optical drives Tape drives

Types Of Interfaces SCSI and SCSI II Wide SCSI Ultra SCSI Ultra wide 50-pin, 8-bit, max transfer = 10 MB/s (early 1.5 to 5 MB/s ) Internal transfer rate = 4 to 8 MB/s Wide SCSI 68-pin, 16-bit, max transfer = 20 MB/s Internal transfer rate = 7 to 15.5 MB/s Ultra SCSI 50-pin, 8-bit, higher transfer rate, max transfer = 20 MB/s Ultra wide 68-pin, 16-bit, max transfer rate = 40 MB/s Internal transfer rate = 7 to 30 MB/s

Performance Factors Cache on the drive or controller Caching in the OS Different variables Seek time Transfer rates

Redundant Array Of Inexpensive Disks (RAID) Developed from paper published in 1987 at University of California Berkeley The idea is to combine multiple inexpensive drives (eliminate SLED - single large expensive drive) Provided redundancy by storing parity information

The Future For SCSI Faster interfaces - why? Fibre Channel Optical standard Proposed as part of SCSI III (not final) Up to 100 MB/s transfer Still using ultra-wide SCSI inside enclosures Drives with optical interfaces not available yet in quantity, higher cost than SCSI

The Future Of SCIS Fibre Channel-arbitrated loop Ring instead of bus architecture Can support up to 126 devices/hosts Hot pluggable through the use of a port bypass circuit No disruption of the loop as devices are added/removed Generally implemented using a backplane design

HCL List For MSCS Servers on normal Windows NT HCL Self-test of MP machines soon MSCS SCSI component HCL Tested by WHQL Must pass Windows NT HCT as well MSCS interconnect HCL Not required to pass 100% of HCT I.e., point-to-point adapters

MSCS System Certification Process Windows NT 4.0+ Server HCL Complete MSCS configuration ready for self-test Windows NT 4.0+ SCSI HCL Windows NT 4.0+ MSCS SCSI HCL Windows NT 4.0+ Network HCL

Testing Phases HW compatibility (24 hours) One-node testing (24 hours) SCSI and interconnect testing One-node testing (24 hours) Eight clients Two-node with failover (72 hours) Eight-client with asynchronous failovers Stress testing (24 hours) Dual initiator I/O, split-brain problems Simultaneous reboots

Final MSCS HCL Only complete configurations are supported Self test results sent to Microsoft Logs checked and configuration reviewed HCL updated on Web and for next major Windows NT release For more details see the MSCS Certification document