High Availability Design Ram Dantu Hi Slides are adopted from various sources from Cisco and Interwork Inc.,
Agenda Definitions Concepts / Calculations Examples Challenges
Availability as a percentage 1 year = 525960 minutes Availability Unscheduled downtime per year 99% 3 days 15 hours 36 minutes 99.9% 8 hours 45 min. 99.99% 52 min. 36 sec. 99.999% 5 min. 15 sec. 99.9999% 32 sec.
Getting downtime from availability 1 year = 525,960 minutes Uptime = availability * Time Annual Uptime = Availability * 525,960 Annual Downtime = 525,960 – Annual Uptime --------------------------------------------------------- Availability = 0.9999 Annual Uptime = .9999 * 525,960 Annual Uptime = 525,907.4 Annual Downtime = 525,960 – 525,907.4 = 52.596 ----------------------------------------------------------- Downtime = (1-Availability) * Time
Availability vs. Reliability Availability = users’ perception Reliability = individual component failures Reliability impacts maintenance costs but doesn’t necessarily have to impact availability
Defects Per Million Availability DPM 99% 10000 99.9% 1000 99.99% 100 99.999% 10 99.9999% 1
Calculating Availability MTBF MTBF + MTTR Example: 6500 Chassis MTBF = 369897 hours (about 42 years) MTTR = 4 hours Availability = 369897 / ( 369897 + 4 ) = 369897 / 369901 = 0.9999892 = 99.99892%
6500 Availability Module Availability Chassis 99.99892% Power Supplies 99.99873% Supervisor - including software 99.99516% GBIC Line Card 99.99577% GBIC 99.99907% 10/100 Line Card
Availability Formulas Serial availability Availability = AvailA × AvailB × AvailC = 99.999% × 99.999% × 99.999% = 99.997% A C B 99.999%
Availability Formulas 99.9% B A Parallel availability Availability = 1 – ((1 – AvailA) × (1 – AvailB)) = 1 – ((1 – 99.9%) × (1 – 99.9%)) = 99.9999%
Availability Formulas Parallel-series availability 99.9% B A D C F E 99.9999% 99.9999% 99.9999% = 99.9997%
Availability Formulas Series-parallel availability 99.9% B A D C F E 99.7% = 1 – ((1 – 99.7%) × (1 – 99.7%)) = 99.9991%
Core Distribution Access
One Core / Distribution 6500 Chassis 99.99892% Dual power 99.99999% Supervisor + software 99.99516% Dual GBIC Line Cards Dual GBICs Single switch availability 99.99405%
Pair of Core / Distribution 6500’s Series-Parallel Availability 99.99405% series availability each Two switches in parallel 1 – ((1 – 99.99405%) × (1 – 99.99405%)) = 99.999999%
Access Layer 6500 Chassis 99.99892% Dual power 99.99999% Dual Supervisors Dual GBIC Line Cards Dual GBICs 10/100 Line Card 99.99577% Switch availability 99.99888% Access port availability 99.99465%
Challenges Improving availability at the access layer “NIC Teaming” in servers Reduce MTTR MTBF = 369897 hours Availability with MTTR of 4 hours = 99.99892% Availability with MTTR of 2 hours = 99.99945%
Challenges Long convergence times Spanning tree Eliminate layer-2 links where possible Avoid layer-2 loops Use STP enhancements where appropriate Routing protocols Use a link-state (or EIGRP) routing protocol Use routing convergence enhancements Minimize routing table sizes Limit convergence scope