Module 2: Concepts of Server Clusters

Module 2: Concepts of Server Clusters

Clustering Techniques: Availability and Scalability:
Introduction to Server Clusters Clustering Techniques: Availability and Scalability: Introduction to Microsoft Windows 2000 Cluster Service Key Concepts of a Server Cluster Cluster Disks. Quorum Resource. Cluster Communications. Groups and Resources. Resource Dependency Trees. Virtual Servers. Virtual Server Name Resolution. Failover and Failback.

Active/Passive Configuration. Active/Active Configuration.
Cluster Concepts Choosing a Server Cluster Configuration Active/Passive Configuration. Active/Active Configuration. Hybrid Configuration. Single Node Virtual Server. Applications and Services on Server Clusters Applications. Services. File and Print Shares. Identifying Performance Limitations.

Overview Introduction to Server Clusters
Key Concepts of a Server Cluster Choosing a Server Cluster Configuration Applications and Services on Server Clusters

This module provides an explanation of server cluster terms and key concepts. Topics include considerations for choosing cluster configuration options and determining which applications and services will be included in the server cluster. Information that is unique to the installation of Microsoft® Cluster service is covered, such as naming and addressing conventions and how resources and groups function within a server cluster.

After completing this module, you will be able to:
Explain the features of clustering technologies. Define the key terms and concepts of a server cluster. Choose a server cluster configuration. Describe how Cluster service supports applications and services.

Introduction to Server Clusters

A server cluster is a group of computers and storage devices that work together and can be accessed by clients as a single system. There are two types of network communications in a server cluster. The nodes communicate with each other over a high performance, reliable network, and share one or more common storage devices. Clients communicate to logical servers, referred to as virtual servers, to gain access to grouped resources, such as file or print shares, services such as Windows Internet Name Service (WINS), and applications like Microsoft Exchange Server.

When a client connects to the virtual server, the server routes the request to the node controlling the requested resource, service, or application. If the controlling node fails, any clustered services or applications running on the failed node will restart on a surviving designated node. There are three types of clustering techniques commonly used: shared everything, mirrored servers, and shared nothing. Microsoft Cluster Service uses the shared nothing model.

You can configure server clusters to address both availability and scalability issues. The failover capability of Microsoft Cluster Service makes resources more available than in a non-clustered environment. It is also an economical way to scale up when you need greater performance.

Clustering Techniques
Shared Everything Model Mirrored Servers Shared Nothing Model

There are a variety of cluster implementation models that are used widely in the computer industry. Common models are shared everything, mirrored servers, and shared nothing. It is possible for a cluster to support both the shared everything model and the shared nothing model. Typically, applications that require only limited shared access to data work best in the shared everything model. Applications that require maximum scalability will benefit from the shared nothing cluster model.

Shared Everything Model
In the shared everything, or shared device model, software running on any computer in the cluster can gain access to any hardware resource connected to any computer in the cluster (for example, a hard drive, random access memory (RAM), and CPU). The shared everything server clusters permit every server to access every disk. Allowing access to all of the disks originally required expensive cabling and switches, plus specialized software and applications. If two applications require access to the same data, much like a symmetric multiprocessor (SMP) computer, the cluster must synchronize access to the data. In most shared device cluster implementations, a component called a Distributed Lock Manager (DLM) is used to handle this synchronization.

The Distributed Lock Manager (DLM)
The Distributed Lock Manager (DLM) is a service running on the cluster that keeps track resources within the cluster. If multiple systems or applications attempt to reference a single resource, the DLM recognizes and resolves the conflict. However, using a DLM introduces a certain amount of overhead into the system in the form of additional message traffic between nodes of the cluster in addition to the performance loss due to serialized access to hardware resources. Shared everything clustering also has inherent limits on scalability, because DLM contention grows geometrically as you add servers to the cluster.

Mirrored Servers An alternative to the shared everything and shared nothing models is to run software that copies the operating system and the data to a backup server. This technique mirrors every change from one server to a copy of the data on at least one other server. This technique is commonly used when the locations of the servers are too far apart for the other cluster solutions. The data is kept on a backup server at a disaster recovery site and is synchronized with a primary server. However, a mirrored server solution cannot deliver the scalability benefits of clusters. Mirrored servers may never deliver as high a level of availability and manageability as shared-disk clustering, because there is always a finite amount of time during the mirroring operation in which the data at both servers is not identical.

Shared Nothing Model The shared nothing model, also known as the partitioned data model, is designed to avoid the overhead of the DLM in the shared everything model. In this model, each node of the cluster owns a subset of the hardware resources that make up the cluster. As a result, only one node can own and access a hardware resource at a time. A shared-nothing cluster has software that can transfer ownership to another node in the event of a failure. The other node takes ownership of the hardware resource so that the cluster can still access it. The shared nothing model is asymmetric. The cluster workload is broken down into functionally separate units of work that different systems performed in an independent manner. For example, Microsoft SQL Server™ may run on one node at the same time as Exchange is running on the other.

Shared Nothing Model (continue)
In this model, requests from client applications are automatically routed to the system that owns the resource. This routing extends to server applications that are running on a cluster. For example, if a cluster application such as Internet Information Services (IIS) needs to access a SQL Server database on another node, the node it is running on passes the request for the data to the other node. Remote procedure call (RPC) provides the connectivity between processes that are running on different nodes.

Shared Nothing Model (continue)
A shared nothing cluster provides the same high level of availability as a shared everything cluster and potentially higher scalability, because it does not have the inherent bottleneck of a DLM. An added advantage is that it works with standard applications because there are no special disk access requirements. Examples of shared nothing clustering solutions include Tandem NonStop, Informix Online/XPS, and Microsoft Windows 2000 Cluster service. Note: Cluster service uses the shared nothing model. By default, Cluster service does not allow simultaneous access from both nodes to the shared disks or any resource. Cluster service can support the shared device model as long as the application supplies a DLM.

Availability and Scalability
Cluster Service Improves Availability of Applications and Services Scalability Cluster Service Improves Scalability by Adding More Computers to the Cluster

Microsoft Cluster service makes resources, such as services and applications, more available by providing for restart and failover of the resource. Another benefit of Cluster service is that it provides greater scalability of the resource because you can separate applications and services to run on different servers.

Availability When a system or component in the cluster fails, the cluster software responds by dispersing the work from the failed system to the remaining systems in the cluster. Cluster service improves the availability of client/server applications by increasing the availability of server resources. Using Cluster service, you can set up applications on multiple nodes in a cluster. If one node fails, the applications on the failed node are available on the other node. Throughout this process, client communications with applications usually continue with little or no interruption. In most cases, the interruption in service is detected in seconds, and services can be available again in less than a minute (depending on how long it takes to restart the application).

Availability (continue)
Clustering provides high availability with static load balancing, but it is not a fault tolerant solution. Fault tolerant solutions offer error-free, nonstop availability, usually by keeping a backup of the primary system. This backup system remains idle and unused until a failure occurs, which makes this an expensive solution.

Scalability When the overall load exceeds the capabilities of the systems in the cluster, instead of replacing an existing computer with a new one with greater capacity, you can add additional hardware components to increase the node’s performance, while maintaining availability of applications that are running on the cluster. Using Microsoft clustering technology, it is possible to incrementally add smaller, standard systems to the cluster as needed to meet overall processing power requirements.

Scalability (continue)
Clusters are highly scalable; you can add CPU, input/output (I/O), storage, and application resources incrementally to efficiently expand capacity. A highly scalable solution creates reliable access to system resources and data, and protects your investment in both hardware and software resources. Server clusters are affordable because they can be built with commodity hardware (high-volume components that are relatively

Multimedia: Microsoft Windows 2000 Cluster Service

Key Concepts of a Server Cluster
Client Private Network Server Cluster Quorum Disk 1 A Group of Resources Virtual Server Print Share File Share Public Node A Node B

Server cluster architecture consists of physical cluster components and logical cluster resources. Microsoft Cluster service is the software that manages all of the cluster-specific activity. Microsoft Cluster service is the software that manages all of the cluster-specific activity. Physical components provide data storage and processing for the logical cluster resources. Physical components are nodes, cluster disks, and communication networks. Logical cluster resources are groups of resources, such as Internet Protocol (IP) addresses and virtual server names, and services such as WINS. Clients interact with the logical cluster resources.

Nodes Nodes are the units of management for the server cluster. They are also referred to as systems and the terms are used interchangeably. A node can be online or offline, depending on whether it is currently in communication with the other cluster nodes. Note: Windows 2000 Advanced Server supports two node server clusters. Windows Datacenter supports four node server clusters.

Cluster Disks Cluster disks are shared hard drives to which both server cluster nodes attach by means of a shared bus. You store data for file and print shares, applications, resources, and services on the shared disks.

Quorum Resource The quorum resource plays a vital role in allowing a node to form a cluster and in maintaining consistency of the cluster configuration for all nodes. The quorum resource holds the cluster management data and recovery log, and arbitrates between nodes to determine which node controls the cluster. The quorum resource resides on a shared disk. It is best to use a dedicated cluster disk for the quorum resource, so that it will not be affected by the failover policies of other resources, or by the space that other applications require. It is recommended that the quorum be on a disk partition of at least 500 MB.

Cluster Communications
A server cluster communicates on a public, private, or mixed network. The public network is used for client access to the cluster. The private network is used for intracluster communications, also referred to as node-to- node communications. The mixed network can be used for either type of cluster communications. One of the types of communications on the private network monitors the health of each node in the cluster. Each node periodically exchanges IP packets with the other node in the cluster to determine if both nodes are operational. This process is referred to as sending heartbeats.

Resources Resources are the basic unit that Cluster service manages.
Examples of resources are physical hardware devices, such as disk drives, or logical items, such as IP addresses, network names, applications, and services. A cluster resource can run only on a single node at any time, and is identified as online when it is available for a client to use.

Groups Groups are a collection of resources that Cluster service manages as a single unit for configuration purposes. Operations that are performed on groups, such as taking groups offline or moving them to another node, affect all of the resources that are contained within that group. Ideally, a group will contain all of the elements that are needed to run a specific application, and for client systems to connect to the application.

Virtual Servers Virtual servers have server names that appear as physical servers to clients. Cluster service uses a physical server to host one or more virtual servers. Each virtual server has an IP address and a network name that are published to clients on the network. Users access applications or services on virtual servers in the same way that they would if the application or service were on a physical server.

Failover and Failback Failover is the process of moving a group of resources from one node to another in case of a failure of a node, or one of the resources in the group. Failback is the process of returning a group of resources to the node on which it was running before the failover occurred.

Cluster Disks Node A Node B Disk 4 Disk 3 Disk 2 Disk 1 Quorum

Each node must have a connection to a shared storage area where shared cluster data, such as configuration data, is stored. This shared storage area is referred to as the cluster disk. The cluster can gain access to a cluster disk through a Small Computer System Interface (SCSI) bus or a Fibre Channel bus. In addition, services and applications that the cluster provides should keep shared data, such as Web pages, on the cluster disk on the shared bus. Cluster service is based on the shared nothing model of clustering. The shared nothing model allows the Windows 2000 cluster file system model to support the native NTFS file system, rather than requiring a dedicated cluster file system. Note: The cluster disks must be NTFS and basic disks.

A single cluster member controls each file system partition at any instant in time. However, because a node places a SCSI reserve on a cluster disk rather than a partition, the same node must own all of the partitions on the same physical disk at any given time. Each node can reserve a separate disk on the same shared bus, so you can divide the cluster disks on the bus between the nodes in the cluster. For high-end configurations, you can achieve additional I/O scaling through distributed striping technology such as RAID 5. Using distributed striping technology means that below a file system partition on a single node, that partition can actually be a stripe set whose physical disks span multiple disks. Such striping must be hardware RAID. Cluster service does not support any software fault tolerant RAID arrays.

Quorum Resource Data Storage Arbitration Quorum Ownership
Updates for Nodes Coming Online

Each cluster has a special resource known as the quorum resource
Each cluster has a special resource known as the quorum resource. You specify an initial location for the quorum resource when you install the first node of a cluster. You can use the cluster administration tools to change the quorum location to a different storage resource. The quorum resource contains cluster configuration files and provides two vital functions: data storage and arbitration. Only one node at a time controls the quorum. Upon startup of the cluster, Cluster service uses the quorum resource recovery logs for node updates.

For example: If Node B is offline and Node A makes a change to the cluster, the change is saved in the registry of Node A and also to the cluster configuration files on the quorum. If Node A goes offline and Node B starts, Node B will be updated from the cluster configuration files on the quorum.

Data Storage The quorum resource is vital to the successful operation of a cluster because it stores cluster management data, such as the configuration database and recovery logs for changes that are made to cluster data. It must be available when you form the cluster, and whenever you change the configuration database. All of the nodes of the cluster have access to the quorum resource by means of the owning node. Note: To ensure the availability of the cluster, it is recommended that the quorum be on a Redundant Array of Independent Disks (RAID) 5 array.

Arbitration The Cluster service uses the quorum resource to decide which node owns the cluster. Arbitration refers to the decision-making function of the quorum resource if both cluster nodes independently try to take control of the cluster. Consider the following situation in a two-node cluster. The networks that are providing communication between Nodes A and B fail. Each node assumes that the other node has failed, and attempts to operate the cluster as the remaining node. Arbitration determines which node owns the quorum. The node that does not own the quorum must take its resources offline. The node that controls the quorum resource then brings all of the cluster resources online.

Quorum Ownership Only one node can control the quorum. When a node restarts, Cluster service determines whether the owner of the quorum is online. If there is no owner of the quorum, Cluster service assigns ownership to the starting node. If Cluster service finds that another node is online and owns the quorum resource, it will join the starting node to the cluster, and will not assign the ownership of the quorum to this node.

Updates for Nodes Coming Online
Only one node can control the quorum. When a node restarts, Cluster service determines whether the owner of the quorum is online. If there is no owner of the quorum, Cluster service assigns ownership to the starting node. If Cluster service finds that another node is online and owns the quorum resource, it will join the starting node to the cluster, and will not assign the ownership of the quorum to this node.

Caution: Do not modify the access permissions on the disk that contains the quorum resource. Cluster service must have full access to the quorum log. Cluster service uses the quorum log file to write all of the cluster state and configuration changes that cannot be updated if the other node is offline. For this reason, you should never restrict either node’s access to the folder \MSCS on the quorum disk which contains the quorum log.

Cluster Communications
Private Network Public Network Mixed Network

It is strongly recommended that a cluster have more than one network connection. A single network connection threatens the cluster with a single point of failure. There are three options for network configurations, private, public, and mixed. Each network configuration requires its own dedicated network card.

Private Network Cluster nodes need to be consistently in communication over a network to ensure that both nodes are online. Cluster service can utilize a private network that is separate from client communications. Once a connection is configured as a private network it can only be used for internal cluster communications, and is known as a private network or interconnect. The private network will be the default route for node-to-node communication. The cluster cannot use a private network for client-to-node communication.

Private Network (continue)
Heartbeats Each node in a cluster periodically exchanges sequenced User Datagram Protocol (UDP) datagrams with the other node in the cluster to determine if it is up and running correctly, and to monitor the health of a network link. This process is referred to as sending heartbeats.

Public Network The public network connection is used as a dedicated client- to-node communication network. The cluster cannot use the public network for node-tonodecommunication.

Mixed Network Another configuration option is to create a network that is used for both private and public communication. This is called a mixed network. Using a mixed network does not change the recommendation for two networks. Important: The recommended configuration for server clusters is a dedicated private network for node-to-node communication and a mixed network. The mixed network acts as a backup connection for node-to-node communication should the private network fail. This configuration avoids having any single point of network failure.

Groups and Resources Server Cluster Node A Node B \\Cluster1 10.0.0.3
Logical Disk \\Cluster1 Logical Disk Node A Group 1 Disk 1 \\Server1 File Share Printer Share Disk 2 Node B Group 2 Disk 3 \\Server2 Application Group 3

A Microsoft clustered solution can contain many resources
A Microsoft clustered solution can contain many resources. For administrative purposes, you can logically assign resources to groups. Some examples of resources are applications, services, disks, file shares, print shares, Transmission Control Protocol/Internet Protocol (TCP/IP) addresses, and network names. You may create multiple groups within the cluster so that you can distribute resources among nodes in the cluster. The ability to distribute groups independently allows more than one cluster node to handle the workload.

Groups A group can contain many resources, but can only belong to one physical disk. A physical disk can contain many groups. Any node in the cluster can own and manage groups of resources. A group can be online on only one node at any time. All resources in a group will therefore move between nodes as a unit. Groups are the basic units of failover and failback. The node that is hosting a group must have sufficient capacity to run all of the resources in that group.

Groups (continue) If you wish to set up several server applications, for example SQL Server, Exchange, and IIS, to run on the same cluster, you should consider having one group for each application, complete with their own virtual server. Otherwise, if all of the applications are in the same group they have to run on the same node at the same time, so no load distribution across the cluster is possible. In the event of a failure within a group, the cluster software transfers the entire group of resources to a remaining node in the cluster. The network name, address, and other resources for the moved group remain within the group after the transfer. Therefore, clients on the network may still access the same resources by the same network name and IP address.

Resources A resource represents certain functionality that is offered on the cluster. It may be physical, for example a hard disk, or logical, for example an IP address. Resources are the basic management and failure units of Cluster service. Resources may, under control of Cluster service, migrate to another node as part of a group failover. If Cluster service detects that a single resource has failed on a node, it may then move the whole group to the other node. Cluster service uses resource monitors to track the status of the resources. Cluster service will attempt to restart or migrate resources when they fail or when one of the resources that they depend on fails.

Resource States Cluster service uses five resource states to manage the health of the cluster resources. The resource states are as follows: Offline – A resource is unavailable for use by a client or another resource. Online – A resource is available for use by a client or another resource. Online Pending – The resource is in the process of being brought online. Offline Pending – The resource is in the process of being brought offline. Failed – The service has tried to bring the resource online but it will not start.

Resource States (continue)
Resource state changes can occur either manually (when you use the administration tools to make a state transition) or automatically (during the failover process). When a group is failed over, Cluster service alters the states of each resource according to their dependencies on the other resources in the group.

Resource Dependencies
FS-1 File Share PD-1 Physical Disk IP-1 IP Address NN-1 Network Name Recommended Vertical Dependency Not Recommended Forking Dependency I added this page

A dependency is a relationship between two resources in which one resource depends upon the other to be online before it can be brought online. For example, a network name cannot be brought online before an IP address. This relationship requires that dependent resources reside in the same group on the same node. The administrator establishes resource dependencies within a group to ensure availability of specific resources before other resources attempt to go online. For troubleshooting purposes, it is recommended that you create vertical dependencies for all of the cluster resources. Forking dependencies provide multiple paths to troubleshoot when a resource does not come online. A vertical dependency requires that all of the dependent resources come online in sequence, starting with the resource that is at the bottom of the dependency tree.

Resource Dependency Tree
The dependency tree is a useful diagram for visualizing the dependency relationships between resources and determining how they will interact. For example, if a resource must be taken offline for an administrative task, its location in the dependency tree will show what other resources will be affected. The dependency tree indicates the relative order in which resources will be taken offline and brought online. Note that dependency tree diagrams are a useful tool for designing and documenting a cluster configuration, but you are not required to create these diagrams to manage the server cluster. This is an optional planning activity.

Resource Dependencies in Groups
The resources belonging to a dependency tree must all be contained in the same Cluster service group. All of the resources in a group move between nodes as a unit. Dependent or related resources never span a single group boundary because they cannot be dependent on resources in other groups. If this were possible, then all of the groups contained in a dependency tree would have to fail over to the other node as a unit. Because groups are the basic unit of failover in Cluster service, resources in a dependency tree will always be online on the same node and will fail over together. Note: It is recommended that you do not use the cluster network name and IP address resources, which are created automatically during installation, as part of a user-defined dependency tree or group. These two resources should be left in the default cluster group that is created on installation of Cluster service.

Dependency Rules The only dependency relationships that Cluster service recognizes are relationships between resources. The following rules govern dependency relationships: The resources of a dependency tree are wholly contained in one, and only one, cluster group. A resource can depend on any number of other resources in the same group. Resources in the same dependency tree must all be online on the same node of a cluster. A resource can be active or online on only one node in the cluster at a given time. A resource is brought online only after all of the resources on which it depends are brought online. A resource is taken offline before any resource on which it depends is taken offline.

Virtual Servers Client Access to Virtual Servers
Virtual Server Environment Virtual Server Naming Named Pipe Remapping Registry Replication

Applications run as a resource within a virtual server environment
Applications run as a resource within a virtual server environment. A virtual server environment masks the physical server name to the application that is running on a virtual server. Masking the name of the physical server provides the application with virtual services, a virtual registry, and a virtual name space. When an application migrates to another node, it appears to the application that it restarted on the same virtual server.

The virtual server environment provides applications, administrators, and clients with the illusion of a single, stable environment, even if the resource group migrates to the other node of the cluster. One benefit of virtual servers is that many instances of an application can be executed on a single node, each within its own virtual server environment. The ability to execute many instances of an application allows two SQL Servers or two SAP environments to execute as two virtual servers on one physical node.

Client Access to Virtual Servers
A virtual server resource group requires a network name resource (NetBIOS and Domain Name System (DNS)), and an IP address resource. Together, these provide clients with a consistent name for accessing the virtual server. The virtual server name and virtual IP address migrate among several physical nodes. The client connects to the virtual server by using the virtual server name, without regard to the physical location of the server.

Virtual Server Environment
Each virtual server environment provides a namespace and configuration space that is separated from other virtual servers running on the same node: registry access, service control, named pipe communication, and RPC endpoints. Having a separate namespace and configuration space prevents a conflict over access to configuration data or internal communication patterns between two instances of an application service that are running on the same node but in separate virtual service environments.

Three features provide virtual server transparency:
Virtual server naming Named pipe remapping Registry replication

Virtual server naming. System services (such as GetComputerName) return the network name that is associated with the virtual server instead of the host node name.

Named pipe remapping. When an application service consists of several components that use interprocess communication to access each other’s services, the communication endpoints must be named relative to the virtual server. To achieve named pipe remapping, named pipe names are translated by Cluster service from \\virtual_node\service to \\host\$virtual_node\service.

Registry replication. The Windows 2000 registry stores most application configuration data. To allow applications that run in separate virtual servers to run on the same host node, you must map registry trees to separate virtual server registry trees and store them in the cluster registry on the node. Each unique tree represents a single virtual server and is internally dependent on the name associated with the virtual server. The trees are also stored in registry file format on the quorum device. When the virtual server migrates, the local tree is rebuilt from the registry files on the quorum device.

When encapsulating applications on virtual servers, it is difficult to know all of the dependencies on node-specific resources, as applications often make use of dynamic-link libraries (DLLs) in ways that introduce new naming dependencies. Cluster service masks the complexity of resource dependencies and allows for seamless failovers of virtual servers.

Virtual Server Name Resolution
Node B Group 1 Node A WINS \\VirtualServer = DNS \\VirtualServer.nwtraders.msft = Active Directory Share = \\VirtualServer\Share or Share = \\VirtualServer.mwtraders.msft \\VirtualServer \\VirtualServer.nwtraders.msft \\ Disk 1 Quorum

To support proper failover, it is critical that clients connect by using the virtual server names only, rather than directly to the cluster nodes. You must devise naming conventions to differentiate the different types of server names. Network names associated with virtual servers are registered with WINS and the browser service in the same way as physical servers. For most applications, it is impossible to distinguish between servers that are virtual and servers that are physical. Clients can access a virtual server by using a NetBIOS name, a DNS name, or an IP address. Clients can also access the virtual server by querying Active Directory™ directory service.

Important: You need to publish the virtual server file share in Active Directory in the same mann eras a file share from a physical server.

WINS In a WINS environment, Microsoft Cluster service registers the virtual server names and IP addresses with a WINS server. Clients that are using the virtual server’s NetBIOS name will query a WINS server. The WINS server will answer the query with the IP address of the virtual server as if the WINS server were a virtual server.

DNS In a Windows 2000 environment with a DNS dynamic update protocol Cluster service registers the virtual server names of the cluster in the same zone as the server cluster nodes. Clients querying a common DNS server will resolve the virtual server’s IP address.

Active Directory You can publish clustered resources, such as shared files, in Active Directory. Publish shared folders in an organizational unit by either the NetBIOS name of the virtual server (\\VirtualServer\share) or by the fully qualified domain name (\\VirtualServer.nwtraders.msft\share). Clients can browse or query Active Directory to gain access to file share resources. When Active Directory responds, the client will then need to perform the name resolution to find the IP address of the virtual server where Cluster service has stored the requested resource. Note: When clients access the virtual server directly by an IP address, and this address changes, you must notify all of the clients who want access to the virtual server. When clients access a virtual server by name, if the server’s IP address or subnet changes, the client would still be able to resolve the name to the IP address by using WINS or DNS.

X Failover and Failback Failover Failback Node A Group 1 Node B Quorum
Disk 1 X I added the X to this page

Microsoft Cluster service provides your system with the ability to reassign control of groups of logical resources in the event of a failure. If a resource fails, Cluster service will attempt to restart that service. You configure the failover and failback policies to determine when groups should transfer ownership from one node to another.

Failover Failover occurs when a resource or a node fails. All resources are configured for automatic failover by default. In the case of a node failure, the resources and groups that this node controls fail over to the other node in the cluster. For example, in a cluster where file and print resources are running on Node A, and Node A fails, these services will fail over to Node B of the cluster.

Failover (continue) If a resource fails, Cluster service will attempt to restart the resource. If the resource does not start, it will fail over (with its group) to the other node. If the resource will not start after failover, it will fail over again to the original node and try to start. This process will be repeated six times within ten hours by default. If the resource still does not start, Cluster service fails over the resource and all of the resources that depend on the failed resource.

Failback Failback occurs when a node has failed and its resources have failed over to the other node. Failback is the process of returning a group of resources to the node on which it was running before a failover occurred. Failback is not configured by default. You must set a preferred owner for the group. Using the preceding example, when Node A comes back online, the file and print services can fail back to Node A if Node A is set as the preferred owner. This process can be performed automatically or manually.

Failback (continue) The administrator determines when and if a group should fail back to the preferred owner. You might not want the application to fail back during peak load times. For example, if Node A fails, the resources in a group could take five minutes or more to restart on Node B. To avoid additional delays in responding to client requests, the administrator can choose to fail back this group to Node A during off-peak hours, or leave the ownership of the group with Node B.

Demonstration: Cluster Concepts

In this demonstration, you will view different name resolution capabilities for clients accessing resources from the cluster. The steps of the demonstration are: View the Cluster Group Owner. Create a public folder share from a Terminal Services session. Create a file share resource. Test WINS name resolution for the public share. Test DNS name resolution. Publish a shared folder in Active Directory. Demonstrate a failover of the public share. Test WINS name resolution after failover. Test DNS name resolution after failover. Test Active Directory shared folders after failover.

Choosing a Server Cluster Configuration
Virtual Server Failover Capability Performance Considerations Cluster Configuration No Cluster Needed Single Node Virtual Server Active/Passive I added this page Active/Active

You can configure server clusters to meet specific requirements
You can configure server clusters to meet specific requirements. The configuration that you choose depends on the scalability features of your application and your availability objective for the resources. Each configuration also has a failover policy that dictates when the resources should return to their preferred owner after the failed node has been restored.

The most common configurations are:
Active/Passive configuration Active/Active configuration Hybrid configuration Single node virtual server Note: In this section we will not be discussing active/active or active/passive software configurations.

Active/Passive Configuration
Quorum Disk 1 Node A Node B \\ACCOUNTING Group 1 Node A manages virtual server \\ACCOUNTING. Node B is configured as a hot spare and will take ownership of \\accounting if Node A goes offline. Cluster Service

The active/passive configuration contains one node that is actively providing resources to clients. The other node is passive and standing by in case of a failover. This configuration can provide high performance and failover capacity. One node of the cluster makes all of the resources available. To achieve optimal performance for any failed over group, it is recommended that the passive node have the same specifications, such as CPU speed, as the active node that controlled the resource.

The disadvantage of this configuration is that it is an expensive allocation of hardware. One of the two servers will not be servicing any clients at any time. The advantage of this configuration is that after failover, the applications running on the group that fails over do not interfere with any other applications running on the node, and therefore the application can run at maximum capacity.

In the slide, Node A has control of Group 1
In the slide, Node A has control of Group 1. The administrator has configured Node B as the hot spare with the capability to control Group 1. If Node A goes offline, Node B will control Group 1. When Node A returns to an online state, Node A becomes the passive system and Group 1 remains with Node B. Because failback does not occur, this configuration provides maximum availability by reducing the time that the service or application is unavailable. If Node B does not have equal capacity to Node A, you may need to configure a failback for Group 1 during nonpeak load times.

Considerations for Choosing this Configuration
Choose this configuration when you need to provide critical applications and resources with high availability. For example, an organization that is selling products on the World Wide Web could justify the expense of having an idle server by guaranteeing continuous high performance access to customers.

Considerations for Choosing this Configuration (continue)
Choose the active/passive configuration if your needs meet the following: You only require one group for all of the applications and services. You want failover capability for the applications and services. To avoid downtime after failover, the applications and services may not fail back to the other node. The applications and services support a cluster environment. Note: If you want the applications and services to run at maximum capacity on either node, both nodes will need to provide the same capacity.

Availability This configuration provides very high availability by not failing back when the nodes are of equal capacity, with the added benefit of no performance degradation during failover.

Failover Policy If the passive system provides identical performance to the failed node, you do not need to configure the failback policy. If the passive system does not provide identical performance, you can configure the group to fail back when the preferred node is online.

Active/Active Configuration
Cluster Service Disk 2 Quorum Disk 1 Node A \\ENGINEERING Group 1 Capacity to fail over Group 2 \\ACCOUNTING Node B

The active/active configuration contains two nodes that are providing resources to cluster clients. This configuration provides optimum performance because it balances resources across both nodes in the cluster. Each node controls a different resource group. The active/active configuration can also provide static load balancing, which refers to a failback policy. If one node fails, the other node will temporarily take on all of the groups. When the failed node comes back online, the group fails back to the original node, allowing performance to return to normal. In general, this configuration is the one most often used in server clusters.

Depending on the resources and the capacity of the nodes, performance may suffer when a failover occurs and a single node must run all of the resources. In this slide, Node A is the primary owner of Group 1, and Node B is the primary owner of Group 2. If Node B goes offline, Group 2 will fail over to Node A. When Node B goes back online, Group 2 will fail back to Node B. Performance is restored when the failed node comes back online and the group fails back to its original node.

Choose this configuration when you need to provide multiple resources simultaneously from a single cluster, provided that you can accept reduced performance during a failover. Choose the active/active configuration if your needs meet the following: You require multiple groups for applications and services. You want failover capability for the applications and services. You want all the groups to fail back to their preferred owners when the failed node returns online to redistribute the load. The applications and services support a cluster environment.

Availability This configuration provides high performance until failover. When a single node runs all of the resources, performance will degrade.

Failover Policy Configure all of the groups to fail over, and then to fail back when the original node owner is back online.

Hybrid Configuration Group 1 Node A Node B Node A providing
Quorum Disk 1 Node A Node A providing DNS outside the cluster Node B providing File/Print outside \\ENGINEERING Group 1 Capacity to fail over \\engineering Node B

The hybrid configuration allows either node of a server cluster to perform server duties that are independent of the cluster. In this slide, Nodes A and B are both performing services outside of the cluster. Node A is running Microsoft Domain Name System (DNS) service and Node B is configured as a file/print server. These services will not fail over if their respective nodes fail. But those services within the cluster would fail over if their respective node fails. In the slide example, Group 1 will fail over to Node B if Node A fails. Note: The hybrid configuration could run as an active/passive or active/active configuration.

Choose this configuration when you must install applications or services that Cluster service does not support on one or more nodes of the cluster. Choose the hybrid configuration if your needs meet the following: You need to run applications on a node independent of the cluster. You want failover capability for the clustered applications and services. You can configure a failback policy if it meets your requirements.

Availability The server cluster resources that you configure for failover have high availability. The applications or services that are running independently of Cluster service will not fail over, and therefore do not have high availability.

Failover Policy The failover policy for a hybrid configuration depends on whether it is an active/passive or active/active configuration. Note: Although you can run some services and applications outside of the cluster, it is recommended that you specialize the cluster servers to the applications that run within the cluster. Consider a hybrid solution when you need to perform domain controller functionality from both nodes of a cluster.

Single Node Virtual Server
Not Shared Clients can access any virtual server in the cluster. Disk 2 Quorum Disk 1 Node A Group 2 Cluster Service Group 1 \\ENGINEERING \\ACCOUNTING

The single node configuration allows clients to access resources through a virtual server. Because this configuration uses only one node, resources cannot be failed over to another node. Administrators use this configuration to group resources for organizational or administrative purposes. It is not intended to provide the availability that other server configurations have. The advantage of this configuration is that services grouped together are easier to manage and easier for clients to access. If the administrator or the clients want higher levels of availability, the administrator can add another node to create a two-node cluster. Because the administrator has already created the groups of resources, they will need to configure only the failover policies.

A common use of this configuration is for server consolidation
A common use of this configuration is for server consolidation. For example, if one server has the capacity to replace four existing servers, you can migrate the services and applications running on the old servers to their respective virtual servers. Clients will access the virtual servers without any apparent changes. In this slide, \\engineering is a virtual server in Group 1, and \\accounting is a virtual server in Group 2. Even though users in the Engineering and Accounting departments think that they are accessing different servers, both resources are housed on the same physical server. This configuration provides the flexibility for adding another node to the cluster, which would provide fault tolerance for these applications and services.

Consider choosing this configuration when you need to manage resources by grouping them together with virtual servers. You might also configure a single server as a virtual server if you anticipate adding another server to create a server cluster.

Considerations for Choosing this Configuration (continue)
Choose the single node virtual server configuration if your needs meet the following: You need one or more virtual servers. You do not require failover capability for the applications and services. The applications and services support a cluster environment.

Availability This configuration does not provide availability.

Failover Policy There is no failover policy because the cluster has only one node.

Applications and Services on Server Clusters
File and Print Shares Identifying Performance Limitations

Microsoft Cluster service can provide failover capabilities for file and print services, applications, such as Microsoft Exchange, and services, such as WINS. An administrator in an enterprise environment will need to decide when server clusters are an appropriate solution. For example, if an organization considers Microsoft Exchange to be mission- critical, Microsoft Cluster service’s failover capabilities can provide a high degree of availability. A service, such as Active Directory, has built-in redundancy, and therefore would not benefit from failover capability. An administrator also needs to consider how the applications and services will impact the node during a failover condition. Resources can change ownership dynamically, so you need to consider performance capacity when looking for performance limits.

Applications Cluster-Aware Applications Cluster-Unaware Applications

Applications are either cluster-aware or cluster-unaware
Applications are either cluster-aware or cluster-unaware. Cluster service can more efficiently manage applications that are cluster-aware. Cluster-unaware applications can run on a cluster if they are configured as generic resource types. Cluster service can manage generic applications, but not to the same level of detail as with the cluster-aware applications. Important: For an application to run on a server cluster, the application must use TCP/IP as a network protocol.

Cluster-Aware Applications
You select cluster-aware applications to obtain the best performance and reliability from your system. Common uses of cluster-aware applications are database applications, transaction processing applications, and file and print server applications. You can configure other groupware applications to be cluster-aware. Cluster-aware applications can take advantage of features that Cluster service offers through the cluster application programming interface (API).

Cluster-Aware Applications (continue)
An application is capable of being cluster-aware if it: Maintains data in a configurable location. Supports transaction processing. Supports the Cluster service API. Reports health status upon request to the Resource Monitor. Responds to requests from Cluster service to be brought online or be taken offline without data loss.

A cluster-aware application runs on a node in the cluster and can take advantage of administration and availability features of Cluster service. Typically, a cluster-aware application communicates with Cluster service by using the cluster API or a cluster application-specific resource DLL. Cluster-aware applications must use the cluster API registry functions and control codes, not the standard Microsoft Win32® API functions. To take advantage of the registry replication of the cluster hive between the nodes of the cluster, cluster-aware applications benefit by placing their registry settings in the cluster hive of the registry instead of the System Registry.

When a cluster-aware application restarts on another server following a failure, it does not restart from a completely separate copy of the application. The new server starts the application from the same physical disks as the original server. Ownership of the application's disks on the shared SCSI bus was moved from the failed server to the new owner as one of the first steps in the failover process. This approach assures that the application always restarts from its last known state as recorded on its installation drive, and, optionally, its registry keys.

Cluster-Unaware Applications
Applications that do not use the cluster or resource APIs and cluster control code functions are unaware of clustering and have no knowledge that Cluster service is running. Their relationship with Cluster service is solely through the resource DLL. You must configure cluster-unaware applications as generic resource types if Cluster service is to manage them. Cluster service can poll these generic applications and services to determine whether they are running or have failed, and can fail over the application or resource to the other node if it detects a failure.

Cluster-Unaware Applications (continue)
Cluster-unaware applications use the System Registry instead of the local cluster registry. If an application is configured as a generic resource type, Cluster service can replicate its keys in the System Registry of the other node in the cluster.

Cluster-Unaware Applications (continue)
Using the generic application resource type for cluster-unaware applications has some limitations: When the resource goes offline, Cluster service terminates the process without performing any clean shutdown operations. The application is not configurable via the Cluster Administrator tools. The Cluster service writes a registry checkpoint of changes made to the cluster. Any changes made via a separate administration tool when the resource is not online will not be propagated. Cluster service can monitor only up to 30 registry subtrees. Generic applications and services can report only a limited amount of information to Cluster service because Resource Monitor can report only whether the application is running, not whether it is running properly.

Services DFS DHCP WINS

Services, like applications, are either cluster-aware or cluster-unaware. You can also configure cluster-unaware services as generic resource types. You need to install a cluster-unaware service on both nodes and set it as active in the registry before you can configure it to run in the cluster. Cluster-aware and clusterunaware services have the same advantages and disadvantages as cluster-aware and cluster-unaware applications. The following services included in Windows 2000 Advanced Server are clusteraware.

Distributed File System (DFS)
When you install DFS in a server cluster, the DFS root is fault tolerant. Having a DFS root that is fault tolerant will allow clients to access data that is stored on multiple systems though a \\VirtualServer\Share mapping that either of the nodes in the Windows 2000 server cluster can host. If the node that is currently hosting the DFS root fails, the other node will host the DFS root. Failover is a significant advantage when many enterprise clients need to continuously access data that DFS hosts. Windows 2000 provides a domain DFS root that can provide fault tolerance by replicating the data to other computers in the domain. A nonclustered server running Windows 2000 can provide a stand-alone DFS root. However, if that server becomes inactive, the stand-alone DFS root cannot be accessed by clients. A Windows 2000 clustered DFS root provides a stand-alone DFS root with fault tolerance, making the DFS root available from a virtual server with failover capability.

Dynamic Host Configuration Protocol (DHCP)
You can use the Windows 2000 (Advanced Server only) Cluster service for DHCP servers to provide higher availability, easier manageability, and greater scalability. Windows Clustering allows you to install a DHCP server as a virtual server so that if one of the clustered nodes fails, the DHCP service is transferred to the second node. Failing over the DHCP service means clients can still receive and renew TCP/IP addresses from the virtual server.

Dynamic Host Configuration Protocol (DHCP) (continue)
Clustering uses IP addresses efficiently by removing the need to split scopes. A database stored on a remote disk tracks address assignment and other activity so that if the active cluster node goes down, the second node becomes the DHCP server, using the same database as the original node. Only one node at a time runs as a DHCP server, with the Windows 2000 clustering database providing transparent transition when needed.

Windows Internet Name Service (WINS)
By maintaining and assigning secondary WINS servers for clients, you can reduce, if not fully eliminate, the effects of a single WINS server being offline. In addition, clustering can provide further fault tolerance. In an enterprise WINS environment, you can reduce the number of redundant WINS servers and rely on Microsoft Cluster service for fault tolerance. WINS running on a Microsoft Cluster service cluster will also eliminate any WINS replication traffic. Note: You should not create static network name to IP address mappings for any cluster names in a WINS database. WINS is the only name resolution method that will cause problems when using static mappings, because WINS static mappings use the MAC address of the network card as part of the static mapping.

File and Print Shares Client

If mission-critical file and print shares are on a physical server, the environment has a potential single point of failure. Using Microsoft Cluster service, you can locate file and print queues on one of the nodes in the cluster and access it by means of a virtual server. The failover feature of Cluster service will provide for continuous file and print service to clients.

The following considerations apply when file and print services are on a virtual server:
Both nodes of the cluster must be members of the same domain for the permissions to be available when either node has the resource online. The user account that Cluster service uses must have at least read access to the directory. If the user account does not have at least read access, the Cluster service will be unable to bring the file share online. The administrator must set share permissions so that they fail over with the resource.

Identifying Performance Limitations
Node A Quorum Disk 1 Node B Exchange 2000 File / Print Group 1 Group 2

Windows 2000 Advanced Server uses an adaptable architecture and is largely self-tuning when it comes to performance. Additionally, Advanced Server is able to allocate resources dynamically as needed to meet changing usage requirements. The goal in identifying performance limits is to look at the applications that are running on the cluster to determine which hardware resource will experience the greatest demand, and then adjust the configuration to handle that demand and maximize total throughput.

You need to consider all of the nodes in a cluster when looking for performance. You must consider how a node will run when resources with different performance requirements fail over from the other node. For example, you would fail over Group 2 from Node B to Node A and run a performance benchmark on Node A. You would check RAM, CPU, disk utilization, and network utilization to see if they are beyond capacity limitations. You repeat these steps to check the performance limitations of Node B.

File and Print Services
If the primary role of a cluster is to provide high availability of file and print services, high disk use will be incurred due to the large number of files being accessed. File and print services also cause a heavy load on network adapters because of the large amount of data that is being transferred. It is important to make sure that your network adapter and cluster subnet can handle the load. In this example, RAM typically does not carry a heavy load, although memory usage can be heavy if a large amount of RAM is allocated to the file system cache. Processor utilization is also typically low in this environment. In such cases, memory and processor utilization usually do not need the optimizing that other components need. You can often use memory effectively to reduce the disk utilization, especially for disk read operations.

Applications A server-application environment (such as Microsoft Exchange) is much more processor-intensive and RAM- intensive than a typical file or print server environment, because much more processing is taking place at the server. In these cases, it is best to use high-end multiprocessor servers. Server cluster solutions use little of the system resources, either for host-to-host communications, or for the operation of the cluster itself.

Review Introduction to Server Clusters
Key Concepts of a Server Cluster Choosing a Server Cluster Configuration Applications and Services on Server Clusters

Module 2: Concepts of Server Clusters

Similar presentations

Presentation on theme: "Module 2: Concepts of Server Clusters"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Module 2: Concepts of Server Clusters

Similar presentations

Presentation on theme: "Module 2: Concepts of Server Clusters"— Presentation transcript:

Similar presentations

About project

Feedback