Network Management Lecture 3. Network Faults Hardware Software.

Slides:



Advertisements
Similar presentations
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Advertisements

HP OpenView Network Node Manager
Chapter 19: Network Management Business Data Communications, 5e.
CIS : Network Management. Introduction Network, associated resources and distributed applications indispensable Complex systems —More things can.
Telecommunications Management /635 Network Management.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
Introduction to Network Analysis and Sniffer Pro
Architecting the Network Part 4 Geoff Huston Chief Scientist, Internet
Chapter 19: Network Management Business Data Communications, 4e.
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Internetworking.
1 Fall 2005 Hardware Addressing and Frame Identification Qutaibah Malluhi CSE Department Qatar University.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Fault, Configuration, Performance Management
DS -V - FDT - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
Fault Management IACT 418/918 Autumn 2005 Gene Awyzio SITACS University of Wollongong.
Managing Agent Platforms with the Simple Network Management Protocol Brian Remick Thesis Defense June 26, 2015.
Chapter 12: Troubleshooting Networking Problems Network+ Guide to Networks Third Edition.
Network Management Management Tools –Desirable features Management Architectures Simple Network Management Protocol.
Ethernet Frame PreambleDestination Address Source Address Length/ Type LLC/ Data Frame Check Sequence.
1 AI Approaches to Network Fault Management Andrew Learn 29 Nov 2001.
Michael Over.  Which devices/links are most unreliable?  What causes failures?  How do failures impact network traffic?  How effective is network.
1 25\10\2010 Unit-V Connecting LANs Unit – 5 Connecting DevicesConnecting Devices Backbone NetworksBackbone Networks Virtual LANsVirtual LANs.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Network Management 1 School of Business Eastern Illinois University © Abdou Illia, Spring 2006 (Week 15, Friday 4/21/2006) (Week 16, Monday 4/24/2006)
Remote Monitoring and Desktop Management Week-7. SNMP designed for management of a limited range of devices and a limited range of functions Monitoring.
Network Topologies.
Emanuele Pasqualucci Extending AppManager Monitoring with the SNMP Toolkit.
SNMP ( Simple Network Management Protocol ) based Network Management.
Hands-on Networking Fundamentals
ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.
Fault Management * * Mani Subramanian “Network Management: Principles and practice”, Addison-Wesley, 2000.
Robert E. Meyers CCNA, CCAI Youngstown State University Manager, Cisco Regional Academy Cisco Networking Academy Program Semester 4, v Chapter 7:
Page 19/13/2015 Chapter 8 Some conditions that must be met for host to host communication over an internetwork: a default gateway must be properly configured.
Common Devices Used In Computer Networks
1. There are different assistant software tools and methods that help in managing the network in different things such as: 1. Special management programs.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Oppenheimer.
Week 4 Lecture Part 3 of 3 Database Design Samuel ConnSamuel Conn, Faculty Suggestions for using the Lecture Slides.
Connectivity Devices Hakim S. ADICHE, MSc
© 2002, Cisco Systems, Inc. All rights reserved..
 Communication Tasks  Protocols  Protocol Architecture  Characteristics of a Protocol.
Chapter 6 – Connectivity Devices
1 Network Monitoring Mi-Jung Choi Dept. of Computer Science KNU
1 Network Management: SNMP The roots of education are bitter, but the fruit is sweet. - Aristotle.
Chapter 19: Network Management Business Data Communications, 4e.
Cisco – Semester 4 – Chapter 7
Fault Detection and Diagnosis. Outline Fault management functionality Event correlations concept Techniques.
Network Troubleshooting
Cisco 2 - Routers Perrine. J Page 112/19/2015 Chapter 8 TCP/IP Error Message Some of the conditions that must be met in order for host to host communication.
Network Management Lecture 4. Performance Management The practice of optimizing network service response time. It also entails managing the consistency.
RMON 1. RMON is a set of standardized MIB variables that monitor networks. Even if RMON initially referred to only the RMON MIB, the term RMON now is.
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
HP Openview NNM: Scalability and Distribution. Reference  “HP Openview NNM: A Guide to Scalability and Distribution”,
Manajemen Jaringan, Sukiswo ST, MT 1 Network Monitoring Sukiswo
Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.
Chapter 19: Network Management
Instructor Materials Chapter 8: Network Troubleshooting
Chapter 6 Database Design
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Network Administration CNET-443
Computer Simulation of Networks
Packetizing Error Detection
Packetizing Error Detection
Lecture 5- Data Link Layer
Packetizing Error Detection
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
SNMP (Simple Network Management Protocol) based Network Management
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Presentation transcript:

Network Management Lecture 3

Network Faults Hardware Software

Gathering Information to Identify a problem The two methods are: Critical network events are transmitted by a network device when a fault condition occurs.e.g failure of a link, restart of a device. A completely failed device can not send critical network events. Occasional polling of network devices can help find faults in a timely manner. There is a tradeoff between the bandwidth used for polling versus the notification time.

Fault Management on a network management system A simple tool can point out the existence of a problem but can not indicate its cause. E.g. ping A more complex tool can inform you when it detects a problem, by logging network events or by polling; provided the network devices are sophisticated enough to report network events.

Fault Management of a network Management System continued…. An Advance Tool performs quite a bit of fault management but it doesn’t perform the final step: correcting the problem. If the basic steps will not find the fault for us than we have to isolate the issues. Example: A mail not reaching the destination.

Impact of Fault on the Network A fault management tool must be capable of analyzing how a fault can affect other areas of the data network. Only then could it provide you with a complete fault analysis. E.g. “LINK FAILURE between Europe Node and United States Node.”

Impact of Fault on the Network E.g. “LINK FAILURE between Europe Node and United States Node. STOPS DECnet and IBM SNA traffic between Europe and United States.” E.g. “LINK FAILURE between Europe Node and United States Node. IMPACT ON DECnet and IBM SNA traffic between Europe and United States.”

Fault Management Process 1. Collect alarms / Detection 2. Filter and correlate alarms / Verification 3. Diagnose faults / Isolation 4. Restoration and repair 5. Evaluate effectiveness

1. Collect Alarms Types of alarms Physical: Failure in communication e.g. loss of signal, CRC failure Logical: Statistical values exceed threshold e.g. number of packets dropped Communication with components Control protocol: Simple Network Management Protocol (SNMP) Data format: Management Information Base (MIB- II, 1990) has ~170 manageable objects

Fault Detection CC (Continuity Check) Heartbeat message sent periodically Sender does not expect acknowledgement Receiver starts timer to expect periodic CC from sender Loss of n consecutive CCs results in failure detection Failures detected include: Hard and soft failures

2. Filter and Correlate Alarms Filter Eliminate redundant alarms Suppress noncritical alarms Inhibit low-priority alarms in presence of high-priority alarms Correlate Analyze and interpret multiple alarms to assign new meaning (derived alarm)

Fault Verification Non-intrusive Unicast Loopback Verify the detected fault Sender sends a request to receiver and expects a response Receiver will typically be the one from whom CCs stop Verification is done via the response

3. Diagnose Faults May require additional tests/diagnostics on circuits or components Automated or manual Analyze all info from alarms, tests, performance monitoring Identify smallest system module that needs to be repaired or replaced

4. Restoration and Repair Restoration: Continue service in presence of fault Switch over to spares Reroute around trouble spot Restore software or data from backup Repair Replace parts Repair cables Debug software Retest to verify fault is eliminated

5. Evaluate Effectiveness Questions to answer : How often do faults occur? How many faults affect service? How long is service interrupted? How long to repair? Provides assessment of: Performance of fault management system Reliability of equipment

Event Correlation Techniques Basic elements Detection and filtering of events Correlation of observed events using AI Localize the source of the problem Identify the cause of the problem Techniques Rule-based reasoning Model-based reasoning Case-based reasoning Codebook correlation model State transition graph model Finite state machine model

Rule-Based Reasoning

Rule-based paradigm is an iterative process RBR is “brittle” if no precedence exists An exponential growth in knowledge base poses problem in scalability Problem with instability if packet loss 10% 15%alarm red Solution using fuzzy logic

Configuration for RBR Example

RBR Example

Model-Based Reasoning Object-oriented model Model is a representation of the component it models Model has attributes and relations to other models Relationship between objects reflected in a similar relationship between models

MBR Event Correlator Example: Recognized by Hub 1 model Hub 1 model queries router model Hub 1 fails Router model declares failure Hub 1 model declares NO failure Router model declares no failure Hub 1 model declares Failure

Case-Based Reasoning Unit of knowledge RBRrule CBRcase CBR based on the case experienced before; extend to the current situation by adaptation Three adaptation schemes Parameterized adaptation Abstraction / re-specialization adaptation Critic-based adaptation

CBR: Matching Trouble Ticket Example: File transfer throughput problem

CBR: Parameterized Adaptation A = f(F) A’ = f(F’) Functional relationship f(x) remains the same

CBR: Abstraction / Re-specialization Two possible resolutions A = f(F)Adjust network load level B = g(F)Adjust bandwidth Resolution based on constraint imposed

CBR: Critic-Based Adaptation Human expertise introduces a new case N (network load) is an additional parameter added to the functional relationship

CBR-Based Critter

Codebook Correlation Model: Generic Architecture Yemini, et.al. proposed this model Monitors capture alarm events Configuration model contains the configuration of the network Event model represents events and their causal relationships Correlator correlates alarm events with event model and determines the problem that caused the events

Codebook Approach Correlation algorithms based upon coding approach to even correlation Problem events viewed as messages generated by a system and encoded in sets of alarms Correlator decodes the problem messages to identify the problems Approach: Two phases: 1. Codebook selection phase: Problems to be monitored identified and the symptoms they generate are associated with the problem. This generates codebook (problem-symptom matrix) 2. Correlator compares alarm events with codebook and identifies the problem.

Causality Graph Each node is an event An event may cause other events Directed edges start at a causing event and terminate at a resulting event Picture causing events as problems and resulting events as symptoms

Labeled Causality Graph Ps are problems and Ss are symptoms P1 causes S1 and S2 Note directed edge from S1 to S2 removed; S2 is caused directly or indirectly (via S1) by P1 S2 could also be caused by either P2 or P3

Codebook Codebook is problem-symptom matrix It is derived from causality graph after removing directed edges of propagation of symptoms Number of symptoms => number of problems 2 rows are adequate to identify uniquely 3 problems

Correlation Matrix Correlation matrix is reduced codebook

Generalized Causality Graph Causality graph has 11 events - problems and symptoms Mark all nodes that have only emerging directed edges as problems - Nodes 1, 2, and 11 Other nodes are symptoms

P-S Causality Graph To reduce causality graph to correlation graph: Symptoms 3, 4, and 5 are cyclical: replace with one symptom, say 3 S7 and S10 are caused by S3 and S5 and hence ignored S8 causes S9. Keep S9 and eliminate S8; reason for this would be more obvious if we go through reduction of codebook to correlation matrix

Correlation Graph and Matrix Note that problems 1 and 11 produce identical symptoms Correlation Matrix

State Transition Model Used in Seagate’s NerveCenter correlation system Integrated in NMS, such as OpenView Used to determine the status of a node

State Transition Model Example NMS pings hubs every minute Failure indicated by the absence of a response

State Transition Graph

Finite State Machine Model Finite state machine model is a passive system; state transition graph model is an active system An observer agent is present in each node and reports abnormalities, such as a Web agent A central system correlates events reported by the agents Failure is detected by a node entering an illegal state

Reporting Goals of data collections and reporting: operational management trend analysis of traffic volumes monitor levels of delivered service monitor usage patterns

Reporting Balance of cost of data collection and analysis against benefit of resultant data sets Data collection points affect ability to gather data

Network Reports weekly report of 15 minute link load levels

Network Reports monthly reports quarterly trend reports and projections