Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung University.

Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung University

Outline Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring

Introduction Network monitoring  Observes and analyzes the status and behavior of the end systems, intermediate systems and subnetworks that make up the configuration to be managed Three major design areas for network monitoring  Access to monitored information How to define monitoring information How to get that information from a resource to a manager  Design of monitoring mechanisms How best to obtain information from resources  Application of monitored information How the monitored information is used in various management functional areas

Network-Monitoring Information Static information  Characterizes the current configuration and the elements in the current configuration The number & identification of ports on a router  Is typically generated by the element involved  The information is available to a manager by an agent or a proxy Dynamic information  Is related to events in the network A change of state of a protocol machine Transmission of a packet on a network  Is collected and stored by the network element responsible for the underlying events

Network-Monitoring Information (cont.) Statistical information  Is derived from dynamic information Average no. of packets transmitted per unit time  Is generated by any system that has access to the underlying dynamic information

Monitoring Real-Time System

Network-Monitoring System Monitoring application  Includes the functions of network monitoring that are visible to user  Performance monitoring, fault monitoring, accounting monitoring Manager function  Is the module at network monitor  Performs the basic monitoring function of retrieving information from other elements Agent function  Gathers and records management information for one or more network elements  Communicates the information to the monitor Managed objects  Is the management information that represents resources and their activities Monitoring agent  An additional module concerned with statistical information  Generates summaries and statistical analyses of management information

Network Monitoring Configurations

Network monitor  Includes agent software and a set of managed objects To assure that the monitor continues to perform function Monitor the load on itself and on the network Monitor the status and behavior of the network monitor  Monitors the amount of network management traffic into and out of the network monitor  External monitors (remote monitors) Includes one or more agents that monitor traffic on a network  Proxy agent If network elements do not share a common network management protocol with the network monitor

Two-Tier Management Communication Model Database } Network Elements Network Queries Unsolicited Events { Manager Unmanaged Element Managed Element Agent Managed Element Agent Managed Element Agent Network Management System

Two-Tier Management Communication } Network Elements Network Queries Unsolicited Events Router The Real World CiscoWorks HP-OpenView } Network Management System Call Manager PrinterRouter Switch

Unmanaged Element Proxy Agent Three-Tier Management Communication } Network Elements RMON Probe The Model MDB { Manager Managed Element Agent NMS

Three-Tier Management Communication The Real World CiscoWorks Concord eHealth } Network Management System SwitchProbe Switch { Managed Element

Polling and Event Reporting Information that is useful for network monitoring is collected and stored by agents and made available to one or more managers systems  Polling Is a request-response interaction between a manager and agent The manager queries any agent and request the values of various information elements Is used to generate a report on behalf of a user and to respond to specific user queries

Event Reporting Agent may generate a report  Periodically to give the manager its current status  When a significant event or an unusual event occurs Manager  Is a listener waiting for incoming information  Preconfigure or set the reporting period Benefits  Be useful for detecting problems as soon as they occur  More efficient than polling for monitoring objects whose states or values change relatively infrequently

Polling Manager  Queries any agent and request the values of various information elements  Learns about the configuration it is managing  Obtains periodically an update of conditions  Investigates an area in detail after being alerted to a problem Agent  Responds with information from its MIB  Reports information matching certain criteria  Supplies the manager with information about the structure of the MIB at the agent

Polling vs. Event Reporting Factors of choices  The amount of network traffic generated by each methods  Robustness in critical situations  The time delay in notifying the network manager  The amount of processing in managed devices  The tradeoffs of reliable versus unreliable transfer  The network-monitoring applications being supported  The contingencies required in case a notifying device fails before sending a report In general  SNMP approach: polling  Telecommunications management systems: both

Performance Indicators Difficulties in selection and use of the indicators  There are too many indicators in use  The meanings of most indicators are not yet clearly understood  Some indicators are supported by some manufacturers only  Most indicators are not suitable for comparison with each other  Indicators are accurately measured but incorrectly interpreted  The calculation of indicators takes too much time, and the final results can hardly be used for controlling the environment

Performance Indicators Service-oriented measures  the highest priority  Availability  Response time  Accuracy Efficiency-oriented measures  Throughput  Utilization

Availability The percentage of time that a network system, a component, or an application is available for a user Availability is based on the reliability of the individual components of a network  MTBF: mean time between failures  MTTR: mean time to repair  Availability = MTBF / (MTBF+MTTR) Availability of a system depends on the availability of its individual components plus the system organization  Redundant components

A = 0.98 A(serial)=0.98x0.98 =0.96 Unavailabily=1-A=0.02 Unavailability of parallel =0.02x0.02=0.0004 A(parallel) = 1-0.0004 =0.9996

Availability (cont.) Functional availability for a dual link system  Nonpeak periods accounts for 40% of requests, ether link can handle the traffic load  During peak periods, both links are required to handle the full load, but one link can handle 80% of the peak load  A f = (capability when 1 link is up) * Pr[1 link up] + (capability when 2 links are up) * Pr[2 links up]  A f (nonpeak) = 1 * [A(1-A) + (1-A)A] + 1 * (A*A) = 0.99  A f (peak) = 0.8 * [A(1-A) + (1-A)A] + 1 * (A)(A) = 0.954  A f = 0.6 * A f (peak) + 0.4 * A f (nonpeak)  If A = 0.9, A f = 0.9684

Base Requirements for Availability Secure facilities Power systems Circuit diversity Intra-chassis redundancy  Dual power supplies  Online Insertion and Removal  Multi-processor design

Response Time The time it takes for a response to appear at a user’s terminal after a user action calls for it The cost for shorter response time  Computer processing power Increased processing power means increased cost  Competing requirements Provides rapid response time to some processes may penalized other processes Productivity increases as rapid response times are achieved  Up to 2 seconds response time is acceptable for most interactive applications

System Response Time

Elements of Response Time

Accuracy The percentage of time that no errors occur in the transmission and delivery of information  Built-in error correction mechanisms in protocols Data link and TCP protocols  Monitors the rate of errors Indicates an intermittent faulty line Exists a source of noise or interference

Throughput The rate at which application-oriented events occur Is an application-oriented measure  No. of transactions of a given type for a period of time  No. of customer sessions for a given applications during a period of time  No. of calls for a circuit-switched environment Is useful to track these measures over time  Performance trouble spots

Utilization The percentage of the theoretical capacity of a resource (e.g., multiplexer, transmission line, switch) that is being used Is a more fine-grained measure than throughput Used to search for potential bottlenecks and areas of congestion Response time usually increases exponentially as the utilization of a resource increases

Simple Efficiency Analysis

Performance-Monitoring Function Three components for performance monitoring  Performance measurement Gathers statistics about network traffic and timing Accomplished by agent modules to observe the behavior of nodes  No. of connections, the traffic per connection External (remote) monitor  Be able to unload the processing requirement from operational nodes to a dedicated system  Performance analysis Consists of software for reducing and presenting the data  Synthetic traffic generation Permits the network to be observed under a controlled load

Performance Measurement Reports Host communication matrix Group communication matrix Packet type histogram Data packet size histogram Throughput-utilization distribution Packet interarrival time histogram Channel acquisition delay histogram Communication delay histogram Collision count histogram Transmission count histogram

Inquiry Concerns Possible Errors and Inefficiencies Are there S-D pairs with unusually heavy traffic Are some packet types of unusually high frequency, indicating an error or an inefficient protocol? What is the distribution of data packet size? What are the channel acquisition and communication delay distribution? Are collisions a factor in getting packets transmitted? What is the channel utilization and throughput?

Inquiry Concerns Increasing Traffic Load What is the effect of traffic load on utilization, throughput and time delay? When does traffic load start to degrade system performance? What is the tradeoff among stability, throughput and delay? What is the max capacity of the channel under normal operating conditions? How many active users are necessary to reach this maximum?

Inquiry Concerns Varying Packet Sizes Do larger packets increase or decrease throughput and delay? How does constant packet size affect utilization and delay?

Statistical versus Exhaustive measurement When an agent is monitoring a heavy load of traffic, it may not be practical to collect exhaustive data  Monitors the total number of packets in a given time period between each S-D pair on the LAN Samples the traffic stream to estimate the value of the random variable  Statistical methods: probabilities

Fault Monitoring Objective  Identify faults as quickly as possible after they occur and identify the cause of the fault so that remedial action may be taken Problems of fault observation – locate and diagnose faults  Unobservable faults Certain faults are inherently unobservable locally  The existence of a deadlock between cooperating distributed processes may not be observable locally  Partially observable faults A node failure may be observable but insufficient to pinpoint the problem  The failure of low-level protocol  Uncertainty in observation Lack of response from a remote device may mean that the device is stuck, the network is partitioned, congestion caused the response to be delayed, or the local timer is faulty

Fault Monitoring (cont.) Problems in fault isolation  Multiple potential causes Multiple technologies will cause the potential point of failure and the types of failures increase  Too many related observations A single failure may generate many secondary failures  Interference between diagnosis and local recovery procedures Local recovery procedures may destroy important evidence concerning the nature of the fault, disabling diagnosis  Absence of automated testing tools Testing to isolate faults is difficult and costly to administer

Fault Monitoring

Fault-Monitoring Functions Detect faults Agent reports errors independently to one or more managers Agent maintains a log of significant events and errors Criteria for issuing a fault report  Avoids overloading Anticipate faults  Set up thresholds  Packet loss rate An effective user interface

Test a Fault Monitoring System Connectivity test Data integrity test Protocol integrity test Data saturation test Connection saturation test Response-time test Loopback test Function test Diagnostic test

Accounting Monitoring Keep track of users’ usage of network resources  An internal accounting system assesses the overall usage of resources and determines the cost of shared resource to each department  System offers a public services Resources that may be subjected to accounting  Communications facilities LANs, WANs, leased lines, dial-up lines, and PBX system  Computer hardware Workstations and servers  Software and systems Applications and utility software in servers, a data center, and end-user sites  Services Includes all commercial communication and information services

Collect Accounting Data Based on the requirements of the organization Communications-related accounting data might be gathered and maintained on each user  User identification  Receiver  No. of packets  Security level Identifies the transmission and processing priorities  Time stamps Associated with each transmission and processing event Transaction start and stop times  Network status codes Indicates the nature of any detected errors or malfunctions  Resources used

Summary Network monitoring is the most fundamental aspect of automated network management  Gathers information about the status and behavior of network elements Static information Dynamic information Statistical information  Agent collects local management information and transmits to one or more NMS  Each NMS includes network management application software plus software for communication with agents

Summary Performance monitoring  Availability  Response time  Accuracy  Throughput  Utilization Fault monitoring  Identifies faults as quickly as possible  Identifies the cause of the fault to take corrective action  Fault monitoring function is complicated Accounting monitoring  Gathers usage information for each resources

Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung University.

Similar presentations

Presentation on theme: "Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung University.

Similar presentations

Presentation on theme: "Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung University."— Presentation transcript:

Similar presentations

About project

Feedback