An Empirical Examination of Current High-Availability Clustering Solutions’ Performance Jeffrey Absher DePaul University Research Symposium Presentation.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Experts in OS Recovery and Migration
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
The Case for Drill-Ready Cloud Computing Vision Paper Tanakorn Leesatapornwongsa and Haryadi S. Gunawi 1.
Availability in Globally Distributed Storage Systems
High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Module 8: Concepts of a Network Load Balancing Cluster
Keith Burns Microsoft UK Mission Critical Database.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
High Availability Module 12.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Ten Configuring Windows Server 2008 for High.
ProjectWise Virtualization Kevin Boland. What is Virtualization? Virtualization is a technique for deploying technologies. Virtualization creates a level.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
1 MOLAR: MOdular Linux and Adaptive Runtime support Project Team David Bernholdt 1, Christian Engelmann 1, Stephen L. Scott 1, Jeffrey Vetter 1 Arthur.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.
Implementing Multi-Site Clusters April Trần Văn Huệ Nhất Nghệ CPLS.
Module 12: Designing High Availability in Windows Server ® 2008.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Windows 2000 Advanced Server and Clustering Prepared by: Tetsu Nagayama Russ Smith Dale Pena.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
High-Availability Linux.  Reliability  Availability  Serviceability.
Chapter Fourteen Windows XP Professional Fault Tolerance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Module 9: Configuring Storage
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Chapter 2: Non functional Attributes.  It infrastructure provides services to applications  Many of these services can be defined as functions such.
© 2005 Mt Xia Technical Consulting Group - All Rights Reserved. HACMP – High Availability Introduction Presentation November, 2005.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
High Availability NFS on Linux Winson Wang Hewlett-Packard Company Cupertino, CA Tel:
© 2005 Mt Xia Technical Consulting Group - All Rights Reserved. HACMP – High Availability Testing and Updates November, 2005.
MCTS Guide to Microsoft Windows Vista Chapter 4 Managing Disks.
High-Availability MySQL DB based on DRBD-Heartbeat Ming Yue September 27, 2007 September 27, 2007.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Microsoft Reseach, CambridgeBrendan Murphy. Measuring System Behaviour in the field Brendan Murphy Microsoft Research Cambridge.
Server Performance, Scaling, Reliability and Configuration Norman White.
OSIsoft High Availability PI Replication
 High-Availability Cluster with Linux-HA Matt Varnell Cameron Adkins Jeremy Landes.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
High Availability in DB2 Nishant Sinha
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
WEEK 11 – TOPOLOGIES, TCP/IP, SHARING & SECURITY IT1001- Personal Computer Hardware System & Operations.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
Component 8/Unit 9aHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9a Creating Fault Tolerant.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Planning a Network Upgrade Working at a Small-to-Medium Business or.
© 2010 VMware Inc. All rights reserved Why Virtualize? Beng-Hong Lim, VMware, Inc.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
High Availability Clusters in Linux Sulamita Garcia EDS Unix Specialist
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Scaling Network Load Balancing Clusters
Bentley Systems, Incorporated
Server Upgrade HA/DR Integration
Failover and High Availability
High Availability 24 hours a day, 7 days a week, 365 days a year…
High Availability Linux (HA Linux)
Network Load Balancing
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
VceTests VCE Test Dumps
High Availability/Disaster Recovery Solution
Presentation transcript:

An Empirical Examination of Current High-Availability Clustering Solutions’ Performance Jeffrey Absher DePaul University Research Symposium Presentation November 2003 See actual paper for bibliographical, procedural info, and appropriate academic reference information

HA and Related Technology Distributed OS Distributed OS Load Balancing Load Balancing Disaster Recovery Disaster Recovery Fault Tolerance Fault Tolerance HA clustering HA clustering

HA’s defining traits SPOF avoided by using redundancy SPOF avoided by using redundancy Single image to the outside world using a single virtual IP address and hostname Single image to the outside world using a single virtual IP address and hostname Automated fault management and recovery Automated fault management and recovery Multiple access paths from each cluster node to each resource group (set of HA services) Multiple access paths from each cluster node to each resource group (set of HA services) Simple abstraction for applications and administrators Simple abstraction for applications and administrators Undisrupted (or minimal disrupted) services during failover. Undisrupted (or minimal disrupted) services during failover. “If a computer breaks down, the functions performed by that computer will be handled by some other computer in the cluster.”

A cluster and tester topology

Event/FailureWhat does it Simulate? BaselineNo Events Kill process on Primary serverA simple fault that causes an abend to the HA process but does not take out the server. Kill process on primary server and hold the process down for 30 seconds A core dump that takes a long time or a more complex fault. Kill process on primary, hold down for 30 seconds and fail to start on second node A core dump or more complex fault, as well as a misconfiguration on the secondary server. Kill the cluster/watchdog process on the primary server A bug in the cluster programming that causes an abend or a mistaken shutdown of the cluster processes. Short power failure on primary nodeA single node power failure, technician error, or a loose power-cable, etc. Simultaneous power failure on both nodes, primary/secondary recovers first. A datacenter power failure with the two possible recovery orders For AIX and Linux, Loss of serial communication for 60 seconds. For Windows, the Virtual Shared disk processes were killed and disabled for 60 seconds. A loose serial cable or technician error such as a cable disconnect, a port misconfiguration, or a mistaken command such as echo hello> /dev/tty0. Primary/Secondary Server public network loss for 60 seconds A loose network cable or a technician error such as a cable disconnect, card misconfiguration, or a mistaken command such as ifconfig en0 down. Public/Private network down 60 secondsA power failure on the public hub or MAU, a network storm, or a technican ’ s error such as a VLAN misconfiguration. IP address clash public network for 60 seconds.A situation where another machine on the same VLAN is accidentally brought online with an incorrect IP address.

Inter OS Comparison AIXWin2KLinux Configuration most difficult reasonablesimplest Scripting required? somenonemuch Featuresmanymanyfew OS integration mediumhighlow/none InstallationInterdependentIndependentIndependent Trials with HA resulting in a longer outage 4/142/143/14 Trials requiring manual intervention 011

Subjective Observations HA clustering is difficult to configure properly and the available documentation is lacking HA clustering is difficult to configure properly and the available documentation is lacking Multiple machines must be configured simultaneously, often packages and software must be installed and configured in a specific order. Multiple machines must be configured simultaneously, often packages and software must be installed and configured in a specific order. For what should be a loosely-coupled system, there are many interdependencies. For what should be a loosely-coupled system, there are many interdependencies. Youn et al suggest that the design of “administration of clusters…needs improvement,” – I agree Youn et al suggest that the design of “administration of clusters…needs improvement,” – I agree Vogels et al state, “Users find it difficult to configure clusters with the desired management … properties. It is difficult to configure applications to be automatically launched in an appropriate order. Lacking solutions to these problems, clusters will remain awkward and time-consuming tools.” - I agree Vogels et al state, “Users find it difficult to configure clusters with the desired management … properties. It is difficult to configure applications to be automatically launched in an appropriate order. Lacking solutions to these problems, clusters will remain awkward and time-consuming tools.” - I agree

Objective Conclusions Based on Empirical Evidence HA is not a perfect solution for every environment, and may be a bad solution for some, depending on the expected faults. HA is not a perfect solution for every environment, and may be a bad solution for some, depending on the expected faults. High failover time for some systems contributes to a lower-than- expected performance of HA systems when compared to non-HA systems. High failover time for some systems contributes to a lower-than- expected performance of HA systems when compared to non-HA systems. Failover times need to be significantly smaller than the time required for a reboot or even a restart of a slow-to-start process. Failover times need to be significantly smaller than the time required for a reboot or even a restart of a slow-to-start process. Primary-node negotiation time at boot contributes to poor performance during power outages. Primary-node negotiation time at boot contributes to poor performance during power outages. There were cases where clustering is shown to actually decrease the uptime of a service or site. There were cases where clustering is shown to actually decrease the uptime of a service or site.

Q & A