Introduction to High Availability H6487S I.02 Module 1
What Causes a System to Go Down? Up Planned Unplanned Down H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
© 2003 Hewlett-Packard Development Company, L.P. Causes of Failures Application Failure Hardware 20% 40% 40% IT Processes Operator Errors Source: Gartner Group October 1999 H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
Not a Big Deal? You Tell Me! H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
Average Cost per Hour of Downtime Financial - Brokerage Operations $6.45 Million Financial - Credit Card Sales $2.6 Million Media - Pay per view Retail - Home Shopping (TV) Retail - Home Catalog Sales Transportation Airline reservation Media Teleticket sales Transportation - Package shipping Finance - ATM fees $100,000 $200,000 $300,000 Millions Source: Dataquest Perspective, Sept. 1996 H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
What Is High Availability? 3/14/2018 What Is High Availability? A system is highly available if a single component or resource failure interrupts the system for only a brief time. What is a system? (Computer? Network? Application?) What is a resource? (Hardware? Software? OS? Database?) What is a failure? (Disk crash? Too many packets? Full file system?) What is an interruption? (Reboot? User reconnect? Poor performance?) What is a brief time? (Minutes? Hours? Days?) HIGH AVAILABILITY IS A DESIGN! Depends on the viewpoint . . . . This definition of High Availability (HA) is very general and requires the expansion of the terms: system, resource, failure, interrupts and brief. These will vary depending on the viewpoint taken. For example system and resource tend to mean ‘hardware’ to an administrator. To an application user however, the ‘killing’ of a database process is a failure resulting in loss of availability of a ‘soft’ resource. The ‘system’ is more than the hardware, and includes the operating system, application processes, data AND the ability of users to connect to and use the resources. Some failures can be handled transparently without any interruption (eg. disk failure in a RAID array, single bit memory error) others can result in complete loss of service and a restart. The priority of an HA system is to minimise the the duration of the interruption ( typically tens of minutes) and by design reduce it to zero for many types of failure. The focus is TIME. It should be noted that a well designed HA configuration can also provide an excellent control environment for applications, helping to reduce interruptions to service from system upgrades and backups. H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
Computer System Availability System: Computer Resources: CPU Memory Disk Failures: System crash Disk failure Interruption: System reboots Replace failed hardware Outage Time: Minutes to days H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
© 2003 Hewlett-Packard Development Company, L.P. Network Availability System: Network Resources: Computers, routers, hubs, LAN cables, backbone, Modems, phone lines Failures: Failed network hardware, Bad cables, High packet collision rate Interruption: Slow user response, User reconnects, Replace failed hardware Outage Time: Minutes to days H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
Application Availability System: Application Resources: Computers, networks, operating system resources Failures: System crash, Network component failure, Full file system, performance paralysis Interruption: Slow response time, system reboots, Replace failed hardware Outage Time: Minutes to days H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
Three Pillars of High Availability High Availability Alliances High Availability Alliances Support Partnerships IT Processes & People Support Partnerships IT Processes and People Technology Infrastructure Technology Infrastructure H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
5nines Support Partnerships H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
High Availability Terms Downtime Unplanned Outage Availability Fault Tolerant H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
High Availability Percentages 99.999 99.99 99.95 99.90 99.86 99.73 99.00 98 97 96 95 Total Down Time 5 minutes 50 minutes 4.3 hours 8.8 hours 12 hours 24 hours 3.6 days 7.2 days 10.8 days 14.4 days 18 days Type of System Fault Tolerant Top High Availability Median High Avail HP Standard Avail Most Standard Avail H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.
Availability Continuum Hierarchy Cost $$ continuously available systems highly available systems highly resilient reliable systems systems Availability H6487S I.02 © 2003 Hewlett-Packard Development Company, L.P.