DS - IX - NFT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 9 NETWORK FAULT TOLERANCE Wintersemester 99/00 Leitung:

Slides:



Advertisements
Similar presentations
Data Communications and Networking
Advertisements

TELE202 Lecture 7 X.25 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »Routing in WAN »Source: chapter 10 ¥This Lecture »X.25 »Source: chapter 10 ¥Next.
Jaringan Komputer Lanjut Packet Switching Network.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
9. Fault Modeling Reliable System Design 2011 by: Amir M. Rahmani.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
Module 3.4: Switching Circuit Switching Packet Switching K. Salah.
CS 582 / CMPE 481 Distributed Systems Communications.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
Chapter 9 - Control in Computerized Environment ATG 383 – Spring 2002.
DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
DS - IV - TT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 4 Topological Testing Wintersemester 2000/2001 Leitung:
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Internetworking Fundamentals (Lecture #2) Andres Rengifo Copyright 2008.
Data/Link Layer Issues Protocol & Services Topology Error Detection & Recovery.
1 25\10\2010 Unit-V Connecting LANs Unit – 5 Connecting DevicesConnecting Devices Backbone NetworksBackbone Networks Virtual LANsVirtual LANs.
Copyright 2003 CCNA 1 Chapter 6, part 2 Ethernet Switching By Your Name.
Gursharan Singh Tatla Transport Layer 16-May
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Switching Techniques Student: Blidaru Catalina Elena.
Data Communications and Networking
Semester 1 Module 8 Ethernet Switching Andres, Wen-Yuan Liao Department of Computer Science and Engineering De Lin Institute of Technology
CIS 725 Wireless networks. Low bandwidth High error rates.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
The University of New Hampshire InterOperability Laboratory Introduction To PCIe Express © 2011 University of New Hampshire.
Section 4 : The OSI Network Layer CSIS 479R Fall 1999 “Network +” George D. Hickman, CNI, CNE.
Data Comm. & Networks Instructor: Ibrahim Tariq Lecture 3.
Networks for Distributed Systems n network types n Connection-oriented and connectionless communication n switching technologies l circuit l packet.
1 Module 15: Network Structures n Topology n Network Types n Communication.
Connectivity Devices Hakim S. ADICHE, MSc
CS3502: Data and Computer Networks DATA LINK LAYER - 1.
 Communication Tasks  Protocols  Protocol Architecture  Characteristics of a Protocol.
Networked & Distributed Systems TCP/IP Transport Layer Protocols UDP and TCP University of Glamorgan.
Module 8: Ethernet Switching
Circuit & Packet Switching. ► Two ways of achieving the same goal. ► The transfer of data across networks. ► Both methods have advantages and disadvantages.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Data Link Layer Part I – Designing Issues and Elementary.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Error Coding Transmission process may introduce errors into a message.  Single bit errors versus burst errors Detection:  Requires a convention that.
Part 2: Packet Transmission Packets, frames Local area networks (LANs) Wide area networks (LANs) Hardware addresses Bridges and switches Routing and protocols.
AS Computing Data Transmission and Networks. Transmission error Detecting errors in data transmission is very important for data integrity. There are.
CS3505: DATA LINK LAYER. data link layer  phys. layer subject to errors; not reliable; and only moves information as bits, which alone are not meaningful.
Sem1 - Module 8 Ethernet Switching. Shared media environments Shared media environment: –Occurs when multiple hosts have access to the same medium. –For.
CprE 458/558: Real-Time Systems
1 Lecture 24: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
FTC (DS) - V - TT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 5 FAULT RECOVERY AND TOLERANCE TECHNIQUES (SYSTEM.
1 Taxonomy and Trends Dan Siewiorek Carnegie Mellon University June 2012.
SYSTEM ADMINISTRATION Chapter 2 The OSI Model. The OSI Model was designed by the International Standards Organization (ISO) as a structural framework.
Wireless and Mobile Networks (ELEC6219) Session 4: Efficiency of a link. Data Link Protocols. Adriana Wilde and Jeff Reeve 22 January 2015.
Introduction Computer networks: – definition – computer networks from the perspectives of users and designers – Evaluation criteria – Some concepts: –
Operating Systems Network Structures. Topics –Background –Motivation –Topology –Network Types –Communication –Design Strategies Topics –Background –Motivation.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
2: Transport Layer 11 Transport Layer 1. 2: Transport Layer 12 Part 2: Transport Layer Chapter goals: r understand principles behind transport layer services:
Ch 3. Transport Layer Myungchul Kim
COMPUTER NETWORKS Lecture-8 Husnain Sherazi. Review Lecture 7  Shared Communication Channel  Locality of Reference Principle  LAN Topologies – Star.
Ch 3. Transport Layer Myungchul Kim
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 Muhammad Waseem Iqbal Lecture # 20 Data Communication.
OSI Model OSI MODEL. Communication Architecture Strategy for connecting host computers and other communicating equipment. Defines necessary elements for.
OSI Model OSI MODEL.
Overview Parallel Processing Pipelining
Lecturer, Department of Computer Application
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Transport Layer Our goals:
Data Link Issues Relates to Lab 2.
COS 561: Advanced Computer Networks
Switching Techniques.
OSI Model OSI MODEL.
Building A Network: Cost Effective Resource Sharing
Seminar on Enterprise Software
COE 342: Data & Computer Communications (T042) Dr. Marwan Abu-Amara
Presentation transcript:

DS - IX - NFT - 0 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 9 NETWORK FAULT TOLERANCE Wintersemester 99/00 Leitung: Prof. Dr. Miroslaw Malek

DS - IX - NFT - 1 NETWORK FAULT TOLERANCE OBJECTIVES: –TO INTRODUCE FAULT TOLERANCE TECHNIQUES USED IN COMPUTER NETWORKS CONTENTS: –COMPUTER NETWORKS –BASIC TECHNIQUES –EXAMPLE-MULTISTAGE NETWORKS

DS - IX - NFT - 2 COMPUTER NETWORKS PACKET SWITCHING VS. CIRCUIT SWITCHING POINT-TO-POINT VS. INDIRECT STATIC VS. DYNAMIC SINGLE PATH VS. MULTIPATH EXAMPLES: –BUS –RING –MULTISTAGE (e.g., BANYAN) –CUBE –STAR –TREE

DS - IX - NFT - 3 BASIC TECHNIQUES RETRY (RETRANSMISSION) COMPLEMENTED RETRY WITH CORRECTION REPLICATION (e.g., dual bus) CODING SPECIAL PROTOCOLS (single handshake, double handshake, etc.) TIMING CHECKS REROUTING RETRANSMISSION with SHIFT (INTELLIGENT RETRY)

DS - IX - NFT - 4 EXAMPLE MULTICOMPUTER NETWORKS (1) OBJECTIVE: –RELIABLE AND TIMELY, HIGH BANDWIDTH DATA TRANSFER ISSUES: –FAULT IMPACT –RELIABILITY EVALUATION –TESTING –FAULT DIAGNOSIS –RECOVERY –FAULT TOLERANCE

DS - IX - NFT - 5 EXAMPLE MULTICOMPUTER NETWORKS (2) LEVEL: –SWITCH LEVEL CODES PROTOCOLS CONTROL DATA TIME –SYSTEM LEVEL CODES PROTOCOLS CONTROL DATA TIME

DS - IX - NFT - 6 MULTICOMPUTER NETWORK FAULT CLASSES AND THEIR IMPACT FAULT CLASS I - DATA LINK OR DATA REGISTERS –STUCK AT 0 –STUCK AT 1 –OR-BRIDGE –AND-BRIDGE FAULT CLASS II - CONTROL LINES –(DATA VALID LINE) STUCK AT VALID –(REQUEST/ACK) STUCK-AT-0, STUCK-AT-1 –(DATA STROBE) STUCK-AT-1, STUCK-AT-0

DS - IX - NFT - 7 FAULT IMPACT DATA BIT ERROR –NO IMMEDIATE IMPACT, BUT ERROR WILL SHOW UP IN HIGHER LEVELS LATER. MAY BE OUT OF THE SPHERE OF CONTROL WHEN DETECTED. ADDRESS TAG ERROR –DATA PACKET CANNOT REACH THE INTENDED DESTINATION. THIS MAY CAUSE WRONG DATA TO BE RETRIEVED. STUCK AT SOME VALID CONFIGURATION –DATA PACKET WILL BE MISDIRECTED OPEN CONNECTION –COMPLETE DATA LOSS SHORT CONNECTION –MAY CAUSE BROADCASTING EFFECT, DATA PACKET MISDIRECTED

DS - IX - NFT - 8 THE FAULT IMPACTS CAN BE GROUPED INTO: 1.CORRUPTED DATA 2.LOST DATA 3.UNEXPECTED DATA THESE FAULTS CAN BE EXTRACTED FROM THE SWITCH AND MODELED BY A FAULTY CHANNEL THAT WILL CORRUPT, LOSE, DELAY DATA TRANSMITTED THROUGH IT.

DS - IX - NFT - 9 WHERE TO DETECT AND RECOVER THERE ARE THREE LEVELS WHERE WE CAN PERFORM ERROR DETECTION AND RECOVERY 1.SWITCH LEVEL 2.PME LEVEL 3.SOFTWARE LEVEL

DS - IX - NFT - 10 SWITCH LEVEL COSTS THE LEAST (IN TERMS OF COMPUTATION) TO RECOVER HAS HIGHEST COVERAGE, MOST ERRORS ARE WITHIN "SPHERE OF CONTROL“ NEEDS EXTRA HARDWARE THE DESIGN OF DETECTION/CORRECTION MECHANISM NEEDS TO CONSIDER IMPLEMENTATION LIMITS SUCH AS LOGIC COMPLEXITY AND I/O PIN USAGE

DS - IX - NFT - 11 LOCALIZED RECOVERY SINCE 99 PERCENT OF ERRORS ARE "SOFT“, RETRY IS AN EFFECTIVE WAY TO RECOVER FROM FAULTS 100 PERCENT COVERAGE OF SINGLE MESSAGE LOSS REQUIRES ONLY MODEST NUMBER OF PINS ERROR CORRECTING CODES HAVE PROHIBITIVE PINOUT (62% OVERHEAD FOR 8-BIT DATA CHANNEL).

DS - IX - NFT - 12 FAULT TOLERANCE TECHNIQUES FOR GLOBAL RECOVERY 1.DYNAMIC FULL ACCESS (DFA) –IF THE NETWORK GRAPH IS MAXIMALLY CONNECTED THE RECOVERY IS FEASIBLE 2.MULTIPLE NETWORKS (FAULT TOLERANCE + IMPROVED PERFORMANCE) –WITH OR WITHOUT BRIDGES 3.REDUNDANT SWITCHES 4.EXTRA-STAGE 5.CODING

DS - IX - NFT - 13 PME LEVEL THERE ARE 8 BYTES IN ONE REQUEST, THEREFORE 3 EXTRA BITS MAY BE NEEDED FOR SEQUENCING. ON A 4X4 UNIDIRECTIONAL SWITCH, THIS MEANS 24 MORE PINS. FOR REQUESTS WHOSE RELATIVE ORDER NEEDS TO BE KEPT, SOME EXTRA BITS ARE NEEDED OR ELSE SEQUENTIAL CONSISTENCY MAY BE VIOLATED. ANOTHER WAY TO GET AROUND THIS IS TO ALLOW ONLY ONE OUTSTANDING REQUEST FOR SHARED DATA. HOWEVER, NOT ALL SHARED DATA MAY BE USED FOR SYNCHRONIZING, SO A FENCE COUNTER SHOULD BE PROVIDED TO LET THE PROGRAMMER DECIDE ON THE NUMBER OF ALLOWED OUTSTANDING REQUESTS.

DS - IX - NFT - 14 SOFTWARE LEVEL WHEN AN ERROR IS DETECTED, IT MAY BE TOO LATE TO RECOVER. EVEN IF IT IS STILL POSSIBLE, IT IS OFTEN EXPENSIVE (IN TERMS OF COMPUTATION REQUIRED). TO BE ABLE TO ROLL BACK, CHECKPOINT INFORMATION HAS TO BE SAVED FREQUENTLY. THIS INCREASES SYSTEM OVERHEAD. RESTART (OR GLOBAL RESET) IS VERY EXPENSIVE IN TERMS OF TIME.

DS - IX - NFT - 15 OBSERVATIONS THE IMPACT OF A FAULT ON A MULTISTAGE NETWORK MAY BE SEVERE. THE FAULT IMPACT DEPENDS ON A FAULT LOCATION (LEVEL). A SWITCH FAULT IS OBVIOUSLY MORE SEVERE THAN A LINE FAULT. EXTRA-STAGE WILL NOT HELP IF INSTANTANEOUS RECOVERY IS NOT ASSURED. USE RETRY FOR TRANSIENT AND INTERMITTENT FAULTS. USE LOCALIZED REROUTING FOR PERMANENT FAULTS. DFA AND EXTRA-STAGE COMBINED MAY PROVIDE A VERY EFFECTIVE SOLUTION IN CASE OF THE MULTIPLE FAULTS. FAULT-TOLERANT SWITCHING ELEMENT PROTOCOL AND MINIMIZATION OF ERROR LATENCY ARE CRUCIAL TO SATISFACTORY SYSTEM OPERATION.