Availability of IP/MPLS networks

Slides:



Advertisements
Similar presentations
ITU-T Workshop on Security Seoul (Korea), May 2002 Telecommunication network reliability Dr. Chidung LAC.
Advertisements

IT’S HERE Bandwidth Technologies. Agenda Technologies for Bandwidth –Single Location DSL/Cable T1/Bonded T1 DS3/OC-N Ethernet Over Copper (EoC, EoFM)
ONE PLANET ONE NETWORK A MILLION POSSIBILITIES Barry Joseph Director, Offer and Product Management.
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
Business Continuity Section 3(chapter 8) BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT1.
CP Networking1 WAN and Internet Access. CP Networking2 Introduction What is Wide Area Networking? What is Wide Area Networking? How Internet.
LAN solutions. 4 Reasons to buy Nortel Networks LANs Provides Business continuity with no single point of failure at the hardware level and faster recovery.
Hi High Availability Design Ram Dantu Slides are adopted from various sources from Cisco and Interwork Inc.,
© 2009 EMC Corporation. All rights reserved. Introduction to Business Continuity Module 3.1.
1 © 2001, Cisco Systems, Inc. IOS Update for SwiNOG 4th Chris Martin Systems Engineer Cisco Switzerland Chris Martin Systems Engineer Cisco Switzerland.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6660 Availability, Survivability, Protection/Restoration, Fast Re- Route
Telecommunications Project Management Quality Management PERT.
Kae Hsu Communication Network Dept. Redundant Internet service provision - customer viewpoint.
Nortel CS1000 Branch Office Solutions
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
Campus Networking Best Practices Session 2: Layer 3 Dale Smith University of Oregon & NSRC
Mr. Mark Welton.  Three-tiered Architecture  Collapsed core – no distribution  Collapsed core – no distribution or access.
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-1 Implementing a Highly Available Network Understanding High Availability.
CECS 5460 – Assignment 3 Stacey VanderHeiden Güney.
Failure Spread in Redundant UMTS Core Network n Author: Tuomas Erke, Helsinki University of Technology n Supervisor: Timo Korhonen, Professor of Telecommunication.
PART 2: Product Line. Tenor Switches & Gateways Tenor AX Series Solution For Medium to Large Enterprises  Available in 8, 16, 24 and 48 port Available.
Lecture Note on Survivability. Impact of Outages Service Outage Impact 50msec0200msec2sec10sec 5min 30min "Hit" TriggerChange- over of CCS Links FCCReportable.
Chapter 2: Non functional Attributes.  It infrastructure provides services to applications  Many of these services can be defined as functions such.
The Role of High Availability Software in Quality of Service Joe McFadden Vice President, Marketing, Nuasis.
Protocol implementation Next-hop resolution Reliability and graceful restart.
1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio.
LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor
OBJECTIVE: o Describe various network topologies o Discuss the role of network devices o Understand Network Configuration Factors to deploy a new network.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
© 1999, Cisco Systems, Inc. 1-1 Chapter 2 Overview of a Campus Network © 1999, Cisco Systems, Inc.
IP Routing Principles. Network-Layer Protocol Operations Each router provides network layer (routing) services X Y A B C Application Presentation Session.
Unit-3 Reliability concepts Presented by N.Vigneshwari.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Stracener_EMIS 7305/5305_Spr08_ Systems Availability Modeling & Analysis Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7305/5305.
Lecture 11. Switch Hardware Nowadays switches are very high performance computers with high hardware specifications Switches usually consist of a chassis.
WHAT’S A WIRELESS AP? AND WHY DO I NEED ONE? Network Components & How They Work.

Chapter 1: Explore the Network
On-Site PBX Vs Hosted PBX.
Instructor Materials Chapter 1: LAN Design
Sources of Failure in the Public Switched Telephone Network
Chapter 1: Exploring the Network
Point-to-Point Network Switching
Embracing Failure: A Case for Recovery-Oriented Computing
Large Distributed Systems
Software Reliability PPT BY:Dr. R. Mall 7/5/2018.
Wide Area Network.
WAN technologies.
Chapter 5: Inter-VLAN Routing
Introduction to Networking
Physical Architecture Layer Design
Switched Multi-megabit Data Service (SMDS)
Chapter 1: WAN Concepts Connecting Networks
Asynchronous Transfer Mode
LESSON 2.1_B Networking Fundamentals Understand Switches.
인터넷 구조 2002년 2학기 장주욱.
Software Defined Networking (SDN)
The Business Value of MPLS VPNs
SpiraTest/Plan/Team Deployment Considerations
“Uptime” at IXPs - and NIS Directive
COS 561: Advanced Computer Networks
Data collection methodology and NM paradigms
Ethernet Network Network Interface: Heavy or Light?
Chapter 11. Frame Relay Background Frame Relay Protocol Architecture
Label Switched VPNs – Scalability and Performance Analysis
COS 461: Computer Networks
High Availability Design
High Availability Design
Part I. Overview of Data Communications and Networking
Presentation transcript:

Availability of IP/MPLS networks Sanjay Kalra October 2002

Agenda Introduction How to measure Availability Network Design example One Router vs. Two Routers Software Dependability Summary 2

Reliability + Recovery Definition of Availability Availability is the probability that an item will be able to perform its designed functions at the stated performance level, within the stated conditions and in the stated environment when called upon to do so. Availability = Reliability Reliability + Recovery 11/13/2018

Quantification Percent Availability N-Nines Downtime Time Minutes/Year 99% 2-Nines 5,000 Min/Yr 99.9% 3-Nines 500 Min/Yr 99.99% 4-Nines 50 Min/Yr 99.999% 5-Nines 5 Min/Yr 99.9999% 6-Nines .5 Min/Yr To deploy dependable networks and devices, it is important to define a mechanism for quantifying dependability. The “9’s” terminology is the most familiar to the industry and is widely used to measure specifically the availability of network devices. The 9’s imply the amount of inherent downtime. Downtime is typically specified in Telcordia requirements such as GR-1110-CORE, Broadband Switching System (BSS) Generic Requirements. The 9’s provide an operational target to which networks and devices can be managed. 11/13/2018

PSTN End-2-End Availability 99.94% PSTN : The Yardstick ? Individual elements have an availability of 99.99% One Cut off call in 8000 calls (3 min for average call). Five ineffective calls in every 10,000 calls. PSTN End-2-End Availability 99.94% NI NI 0.005 % 0.005 % AN 0.01 % AN 0.01 % LE LE Facility Entrance Facility Entrance NI : Network Interface LE : Local Exchange LD : Long Distance AN : Access Network LD 0.005 % 0.005 % 0.02 % 11/13/2018 Source : http://www.packetcable.com/downloads/specs/pkt-tr-voipar-v01-001128.pdf

Services affect on Network Availability In IP Network Availability is a function of the Service being offered. Source : www.t1.org 11/13/2018

IP Network Expectations Service Delay Jitter Loss Availability Real Time Interactive (VOIP, Cell Relay ..) L H Layer 2 & Layer 3 VPN’s (FR/Ethernet/AAL5) M Internet Service Video Services L L H L : Low M : Medium H : High 11/13/2018

Agenda Introduction How to measure Network Availability Network Design example One Router vs. Two Routers Software Dependability Summary 11/13/2018 8

(Total number of Ports x sample period) The Port Method Based on Port count in Network Does not take into account the Bandwidth of ports e.g. OC-192 and 64k are both ports Good for dedicated Access service because ports are tied to customers. (Total # of Ports X Sample Period) - (number of impacted port x outage duration) x 100 (Total number of Ports x sample period) 11/13/2018

The Port Method Example 10,000 active access ports Network An Access Router with 100 access ports fails for 30 minutes. Total Available Port-Hours = 10,000*24 = 240,000 Total Down Port-Hours = 100*.5 = 50 Availability for a Single Day = (240000-50/240,000)*100 = 99.979166 % 11/13/2018

(Total amount of BW in network x sample period) The Bandwidth Method Based on Amount of Bandwidth available in Network Takes into account the Bandwidth of ports Good for Core Routers (Total amount of BW X Sample Period) - (Amount of BE impacted x outage duration) x 100 (Total amount of BW in network x sample period) 11/13/2018

The Bandwidth Method Example Total capacity of network 100 Gigabits/sec An Access Router with 1 Gigabits/sec BW fails for 30 minutes. Total BW available in network for a day = 100*24 = 2400 Gigabits/sec Total BW lost in outage = 1*.5 = 0.5 Availability for a Single Day = ((2400-0.5)/2,400)*100 = 99.979166 % 11/13/2018

] x 10-6 Defects Per Million Used in PSTN networks, defined as number of blocked calls per one million calls averaged over one year. DPM = [ (number of impacted customers x outage duration) (total number of customers x sample period) ] x 10-6 11/13/2018

Defects Per Million Example 10,000 active access ports Network An Access Router with 100 access ports fails for 30 minutes. Total Available Port-Hours = 10,000*24 = 240,000 Total Down Port-Hours = 100*.5 = 50 Daily DPM = (50/240,000)*1,000,000 = 208 11/13/2018

Agenda Introduction How to measure Availability Network Design example One Router vs. Two Routers Software Dependability Summary 11/13/2018 15

Calculating Availability: Series Multiplicative method: E1 x E2 x E3= As .999999 x .999999 .999991 x = .9999890 Additive method of UA (unavailability) .000001 + .000001 + .000009 = .0000110 This calculation shows that less elements in the system = more reliability. This is why collapsing the layers out of a PoP makes it a more reliable system. Juniper uses the Markov model to calculate failures: “Markov Model A probability model that uses state transit diagrams to support reliability prediction calculations of complex relationships and dependencies. Is memoryless and uses exponential distribution.” Total Availability of a system (As) is always less than the least available element. One Weak Link Significantly Weakens This Chain! 11/13/2018

Calculating Availability: Parallel For 1 out of 2 redundancy.. Additive Rule: As = E1+ E2 – E1 E2 As = .999999+.999999-(.999999*.999999) As = .999999999999 This is a probability calculation, showing one element working when the others fails. It assumes that both elements will not fail similtaneously. References: Standard Methods: Mil-HDBK-217 Telcordia SR-332 Other methods and databases: NSWC-94/L07: Navy mechanical reliability method CNET 93 / 98: France Telecom mechanical reliability method HRD5: British Telecom mechanical reliability method IEEE 1413: New standard for reliability prediction, 1998 rev. 1 NPRD/EPRD: Reliability Analysis Center (RAC) failure rate data Multiplicative Rule: As = 1–[(1-E1)(1-E2)] Not for Parallel Systems Where Both Elements Are Required Assumption is that Switchover Time is zero 11/13/2018

System Calculation: Series Simple E-3 Network, With One E-3 Trunk E-3 Server 1 2 ATM 3 4 ATM 5 99.98 99.99 99.992 99.992 99.95 99.9959 99.9959 99.9959 99.9959 99.9959 Availability 99.8835% Yearly downtime = (1-Availability) * 525600 minutes/year 11/13/2018

System Calculation: Parallel (1) System 1 availability 99.6341 Systems 2 availability 99.4311 99.9750 99.9563 99.9831 99.9845 99.8200 99.9932 99.95 99.975 99.82 Internet Gateway Data Centre Core Edge CPE E-3 Edge ATM Hub Core Server STM-16 STM-1 Core Availability, Data Centre to Customer CPE 99.9661% E-3 ATM Hub Core Edge Data Centre Core Core S1 & S2 network 99.9979 11/13/2018

System Calculation: Parallel (2) System 1 Availability 99.6958 99.9845 was 99.6341 Internet Gateway 99.9831 99.9831 99.9831 99.9831 99.9932 Data Center Core Edge NxE-1 Edge Core 99.999 99.8200 99.975 99.9850 Server CPE STM-16 STM-1 Core E-3 Availability, Data Centre to Customer CPE 99.9974% System 2 Availability 99.4828 99.9850 99.975 99.82 99.82 NxE-1 Edge Core Edge 99.999 Data Center 99.9831 99.9932 99.9831 99.9831 99.9831 Core Core was 99.4311 99.9831 99.9831 was 99.9661 !!! 3 9’s to 4 9’s 11/13/2018

Agenda Introduction How to measure Availability Network Design example One Router vs. Two Routers Software Dependability Summary 11/13/2018 21

Do we still need two routers or Router Redundancy Typical Network Designs have 2 routers for Redundancy Capacity Planning Redundancy in routers Power Supply Fans Routing Engines Switching Planes Forwarding plane Do we still need two routers or one is enough? 11/13/2018

One Router Versus two Routers Redundant Control Plane Forwarding Plane Power Supply FAN Line Card Link Availability = Router Availability 99.99979 Router Full Internal Redundancy (99.99979) HW Cost of two Router Configuration is 110%of one router configuration OC-48 LH No Redundancy at Router Level (99.99015) Link Availability = Parallel System Availability 99.999999 11/13/2018

One Router Advantages Cost Savings Lower OPEX Faster convergence For some PE Routers Single Router might be the only option!! As Service State is maintained on per flow basis for some network based services (e.g. Firewall, NAT) TDM links are usually connected to a single edge router A lot of customers terminate on a single router 11/13/2018

One Router Disadvantages Single Point of failure Configuration and Upgrade has to be exact Capacity Management has to be exact Main cost of a router is line cards and not chassis What if there is a DOS attack against the router ? 11/13/2018

One Router Disadvantages Physical Maintenance is not possible without downtime (Location Change) Still need protection against link failure Physical separation to prevent against natural disasters is not possible Networks have been always designed with two routers !!! 11/13/2018

Agenda Introduction How to measure Availability Network Design example One Router vs. Two Routers Software Dependability Summary 11/13/2018 27

SW to HW Reliability Differences Software reliability is not a function of manufacturing Software does not degrade over time Physical Environmental changes have no affect All software failures are the result of design/user errors 11/13/2018

SW to HW Reliability Differences Software can only be repaired by redesign MTTR is not measurable since code must be rewritten to fix a bug. Software bugs can be highly contagious The science of software correctness is still immature and is difficult to apply to software as complex and quickly changing as IP routing 11/13/2018

Agenda Introduction How to measure Availability Network Design example One Router vs. Two Routers Software Dependability Summary 11/13/2018 30

Summary No standard way to measure IP Availability Availability in IP networks depends on the Service being offered One vs. two Routers choice depends on requirements Lot of development happening in IP networks to improve Availability Graceful Restart, NSF, Fast Reroute … IP Dependability is a broad subject and there are challenges to its implementation, including: Conflict between Device Availability and Network Availability Devices still fails the single point of failure analysis So, device-level availability alone is not enough Network-level considerations (such as VRRP, MPLS fast reroute, etc.) are important Conflict between Availability and Reliability. Simple devices are inherently more reliable, but… A system with redundancies, backup, alternate paths and switching are not inherently simple This means redundant systems require design expertise Conflict between Cost of Downtime and Cost of Availability. Downtime is expensive Available systems and networks can also be expensive Where is the intersection between these two? 11/13/2018