Remote Monitoring (RMON) RMON specification is primarily a definition of a MIB RFC 1757/2819 Remote network monitoring management information base (RMON) RFC 2021 Remote network monitoring management information base II (RMON2) Remote network monitoring devices, often called monitors or probes, are instruments that exist for the purpose of managing a network.
Goals (RFC2819 /obsolete 1757 ) Off-line operation – reduce polling from manager Proactive monitoring – Monitor can run diagnostics and log network performance (if sufficient resources) Problem detection and reporting – Active probing of the network – The consumption of network resources – Passively recognize certain error conditions such as congestion on the traffic that it observes – Log the condition and attempt to notify the management station
Goals (RFC1757) con’t Value-added-data – Monitor can perform analyses specific to the data collected on its subnetwork – Analyse subnetwork traffic to determine which hosts generate the most traffic or errors on the subnetwork Multiple managers – Support more than one manager – To improve reliability, to perform different functions
Control of remote monitors RMON MIB contains features that support extensive control from the management station 2 categories of RMON MIB features – Configuration – Action invocation
Configuration & Active invocation Configuration – Each MIB group consists of one or more control tables and data tables Control table – read/write contains parameter that describe the data in data table Data table – read only contains information that is defined by control table Action invocation – Some objects in this MIB provide a mechanism to execute an action on the remote monitoring device. – These objects may execute an action as a result of a change in the state of the object.
Multiple Manager - Problems Concurrent requests for resources could exceed the capability of the monitor to supply those resources A management station could capture and hold monitor resources for long period of time Resources could be assigned to management station that crashes without releasing the resources
Multiple Manager – Solution Ownership label is used for a particular row of the table – A management station may recognize resources it owns and no longer need – A network operator can identify and negotiate the management station to free the resources – A network operator may have the authority unilaterally to free resources another network operator has reserved – If a management station experiences a reinitialization, it can recognize resources it had reserved in the past and free those it no longer needs
Ownership concept Ownership label contains one or more of the following: – IP address, management station name, network manager’s name, location or phone number However, the ownership label does not act as a password or access-control mechanism Therefore, a row can be read-write by the management station who does not own the row
Row Addition (1) A problem can arise when multiple management stations attempt to set configuration information simultaneously using SNMP. To guard against these collisions, each such control entry contains a status object with special semantics that help to arbitrate among the managers. If an attempt is made with the row addition mechanism to create such a status object and that object already exists, an error is returned. When more than one manager simultaneously attempts to create the same conceptual row, only the first can succeed. The others will receive an error.
Row Addition (2) When a manager wishes to create a new control entry, it needs to choose an index for that row. It may choose this index in a variety of ways, hopefully minimizing the chances that the index is in use by another manager. – If the index is in use, the mechanism mentioned previously will guard against collisions. Examples of schemes to choose index values include random selection or scanning the control table looking for the first unused index. – Because index values may be any valid value in the range and they are chosen by the manager, the agent must allow a row to be created with any unused index value if it has the resources to create a new row.
Fig 8.3
Good and Bad Packets (1) RFC 2819 Good packets are error-free packets that have a valid frame length. For example, on Ethernet, good packets are error-free packets that are between 64 octets long and 1518 octets long.
Good and Bad Packets (2) Bad packets are packets that have proper framing and are therefore recognized as packets, but contain errors within the packet or have an invalid length. For example, on Ethernet, bad packets have a valid preamble and SFD, but have a bad CRC, or are either shorter
The RMON MIB RMON (v1) MIB is incorporated into MIB-II with a subtree identifier of 16 (10 groups) statistics: maintains low-level utilization and error statistics for each subnetwork monitored by the agent History: record periodic statiscal samples from information available in the statistic group
RMON MIB Group (1) alarm: allow the management console user to set a sampling interval and alarm threshold for any counter or integer recorded by the RMON probe host:contains counter for various types of traffic to and from hosts attached to the subnetwork hostTopN: contains sorted host statistics that report that top a list based on some parameter in the host table
RMON MIB Group (2) matrix: show error and utilization information in matrix form filter:allow the monitor to observe packet that match a filter (Packet) capture: governs how data is sent to a management console event: gives a table of all events generated by RMON probe tokenRing:maintains statistics and configuration information for token ring subnetworks
Important note 1 All groups in the RMON MIB are optional but there are some dependencies The alarm group require the implementation of the event group The hostTopN group requires the implementation of the host group The packet capture group require the implementation of the filter group
Important note 2 Collection of traffic statistics for one or more subnetworks – statistics, history, host, hostTopN, matrix, tokenRing Various alarm conditions and filtering with user-defined – alarm, filter, capture, event
Statistics Group (1) Fig 8-6
Statistics Group (2) Table 8.2
Statistics Group (3)
Statistics Group (4) The statistics group provides useful information about the load and overall health of the subnetwork Various error conditions are counted such as CRC or alignment error, collision, undersized and oversized packets
History Group The history group is used to define sampling functions for one or more of the interfaces of the monitor 2 tables historyControltable – specify the interface and detail of sampling function etherHistorytable – record data
Fig 8.7
historyControlTable historyControlIndex: index of entry which is the same number as used in etherhistoryTable historyControlDataSource: identify interface to be sampled historyControlBucketsRequested: the requested number of discrete sampling interval, a default value is 50 historyControlBucketsGranted: the actual number of discrete sampling interval historyControlInterval: interval in second, maximum is 3600 (1 hour),default value is 1800
Sampling scheme Consider by historyControlBucketGranted and historyControlInterval Ex. Use the default value of both – the monitor would take a sample once every 1800 seconds ( 30 min) each sample is stored in a row of etherHistoryTable – The most 50 rows are retained
Utilization It calculates on the two counters :ehterStatsOctets and etherStatsPkts Utilization=100% x [(Packets x (96+64)))+(Ocetsx8)/interval x 10 7 ] 64 bit – preamble 96 bit – interframe gap Assume data rate 10Mbps
Host Group To gather statistics about specific hosts on the LAN by observing the source and destination MAC addresses in good packets Consists of 3 tables: – one control table (HostControlTable) – two data tables (hostTable,hostTimeTable) same information but index differently
hostControlTable hostControlIndex: – identify a row in the hostControlTable,refering to a unique interface of the monitor hostControlDatasource: – identify the interface (the source of the data) hostControlTablesize: – the number of rows in hostTable (hostTimeTable) hostControlLastDeleteTime: the last time that an entry (hostTable) was deleted
Fig 8.9
A simple RMON configuration Fig8.10
hostTable hostAddress: MAC address of this host hostCreationOrder: an index that defines the relative ordering of the creation time of hosts (index takes on a value 1-N) hostIndex : the same number as hostControlIndex
Counter in hostTable
hostTopN Group To maintain statistics about the set of hosts on one subnetwork that top a list based on some parameters Statistics that are generated for this group are derived from data in the host group The set of statistics for one object collected during one sampling interval is referred as report
hostTopNControlTable (1) hostTopNControlIndex : – identify row in hostTopNControlTable,defining one top-N report for one interface hostTopNHostIndex: – match the value of hostControlIndex,specifying a particular subnetwork hostTopNRateBase: – specify one of seven variables from hostTable
hostTopNControlTable (2) Variable in hostTopNRate – INTEGER { hostTopNInPkts (1), hostTopNOutPkts (2), hostTopNInOctets (3), hostTopNOutOctets (4), hostTopNOutErrors (5), hostTopNOutBroadcastPkts (6), hostTopNOutMulticastPkt (7), }
hostTopNControlTable (3) hostTopNTimeRemaining: – time left during report currently being collected hostTopNDuration: – sampling interval hostTopNRequestedSize: – maximum number of requested hosts for the top-N report hostTopNGrantedSize: – maximum number of hosts for the top-N report hostTopNStartTime: – the last start time
hostTopNTable hostTopNReport: – same value as hostToNControlIndex hostTopNIndex: – uniquely identify a row hostTopNAddress: – MAC address hostTopNRate: – the amount of change in selected variable during sampling interval
Report preparation (1) A management station creates a row of the control table to specify a new report. This control entry instructs the monitor to measure the difference between the beginning and ending values of a particular host group variable over a specific sampling period The sampling period value is stored in both hostTopNDuration and hostTopNTimeRemaining
Report preparation (2) The value in hostTopNDuration is static and the value in hostTopNTimeRemaining counts second down while preparing report When hostTopNTimeRemaining reaches 0 The monitor calculates the final results and creates a set of N data rows To generate additional report for a new time period, get the old report and reset hostTopNTimeRemaining to the value of hostTopNDuration
Fig 8.12
Matrix group To record information about the traffic between pairs of hosts on a subnetwork The information is stored in the form of a matrix Consists of 3 tables – One control table - matrixControlTable – Two data table – matrixSDTable (traffic from one host to all others), matrixDSTable (traffic from all hosts to one particular host
matrixControlTable matrixControlIndex: – identify a row in the matrixControlTable matrixControlDataSource: – identify interface matrixControlTableSize: – the number of rows in the matrixSDTable matrixControlLastDeleteTime: – the last time that an entry was deleted
Fig 8.14
matrixSDTable (matrixDSTable) matrixSDSourceAddress: the source MAC Address matrixSDDestAddress: the destination MAC Address matrixSDIndex: same value as matrixControlIndex matrixSDPkts: number of packets transmitted from this source add. to destination add. including bad packet matrixSDOctets: number of octets contained in all packets matrixSDErrors:number of bad packets transmitted from this source add. to destination add.
matrixSDTable - operation Indexed first by matrixSDIndex then source address then by destination address,for matrixDSTable the source address is the last The matrixSDTable contains 2 rows for every pair of hosts – One row per direction
RMON (alarms and filtering) W.lilakiatsakun
Alarm group It is used to define a set of threshold for network performance. If a threshold is crossed in the appropriate direction An alarm is generated and sent to the central console Ex. An alarm could be generated if there are more than 500 CRC errors in any 5 minutes interval
Alarm table (1) Each entry specifies a particular variable to be monitored, a sampling interval, threshold parameter The single entry of a variable contains the most sampled value (last sampling interval) – The new value will be stored, so the old is lost Objects in the alarmTable: alarmIndex : an integer that uniquely identifies a row in alarmTable – Each row specifies a sample at a particular interval for a particular object in the monitor’s MIB
Alarm table (2) alarmInterval: interval in seconds over which data are sampled and compared with the rising and falling threshold alarmVariable: the object identifier of the particular variable in the RMON MIB to be sampled – Object type :INTEGER, counter, gauge, TimeTicks – Ex. etherstatsUndersizePkts alarmSampleType: the method of calculating the value to be compared to the threshold – absoluteValue(1) – the value of variable will be compared with the threshold – deltaValue(2) – (the current value – the last value),then compare to the threshold
Alarm table (3) alarmValue: the value of the statistic during the last sampling period alamStartupAlarm: this dictates whether an alarm will be generated if the first sample is greater than or equal to the risingThreshold, less than or equal to the fallingThreshold or both – risingAlarm(1), fallingAlarm(2), risingOrFalling Alarm(3)
Alarm table (4) alarmRisingThreshold: the rising threshold for the sampled statistic alarmFallingThreshold: the falling threshold for the sampled statistic alarmRisingEventIndex: index of the eventEntry that is used when the rising threshold is crossed alarmFallingEventIndex: index of the eventEntry that is used when the falling threshold is crossed
Alarm operation (1) The monitor or a management station can define a new alarm by creating a new row in the alarmTable The combination of variable, sampling interval and threshold parameters is unique to a given row. Two thresholds are provided: a rising threshold and a falling threshold – The rising threshold is crossed if the current sampled value is greater or equal to and the last sampling value was less than the threshold
Alarm operation (2) – Similarly, the falling threshold is crossed if the current sampled value is less than and equal to and the last sampling value was greater than the threshold Two types of values are calculated for alarms – absoluteValue: the value of an object at the time of sampling Counter, this value is never crossed falling threshold / crossed rising threshold at most once – deltaValue: the difference in values for the object over two successive sampling period Counter/guage,this can cross both thresholds any number of times
Rules for rising-alarm generation 1 (a) if the first sampled value is less than the rising threshold, then a rising alarm is generated the first time that the sample value become greater or equal to the rising threshold (b) if the first sampled value is greater than or equal to the rising threshold and if the value of alarmStartupAlarm is risingAlarm(1) or risingOrFallingAlarm(3), then a rising-alarm event is generated
First alarm event generation
Rules for rising-alarm generation (cont’) (c) if the first sampled value is greater than or equal to the rising threshold and if the value of alarmStartupAlarm is fallingAlarm(2) then a rising-alarm event is generated the first time that the sample value again become greater than or equal to the rising threshold after the fallen below the rising threshold 2 After a rising alarm event is generated, another such event will not be generated until the sampled value has fallen below the rising threshold, reached the falling threshold, and then reached the rising threshold again
Generation of alarm events Fig 9.2
Hysteresis mechanism The mechanism by which small fluctuations are prevented from causing alarms
Filter Group (1) Provide a mean by which a management station can instruct a monitor to observe selected packets on a particular interface Data filter – allow the monitor to screen observed packets on the basis of a bit pattern that a portion of the packet matches (or fail to match) Status filter – allow the monitor to screen observed packets on the basis of their status (CRC error) These filters can be combined using logical AND and OR operations
Filter Group (2) The stream of packets that pass the test is referred to as a channel. – A count of such packets is maintained In addition, the channel can be configured to generate an event (defined in the event group) Finally, the packets passing through a channel can be captured if the mechanism is defined in the capture group
Filter logic - variables input = the incoming portion of the packet to be filtered filterPktData = the bit pattern to be tested for filterPktDataMask = the relevant bits to be tested for filterPktDataNotMask = indication of whether to test for a match or a mismatch
EX. 1 match & mismatch If (( input = ^ filterPktData) == 0) filterResult = match; We take the bitwise exclusive OR of input and filterPktData All bits of input and filterPktData have to be the same, the result is all 0s If (( input = ^ filterPktData) != 0) filterResult = mismatch; Test for mismatch
Ex2. match + Don’t care if (((input =^ filterPktData) & filterPktDataMask) == 0) filterResult = match_on_relevant_bits; else filterResult = mismatch_on_relevant_bits; The XOR operation produces a result that has a 1- bit in every position where there is a mismatch The AND operation produces a result as a don’t care
Ex.3 more complex (1) Use filterPktDataNotMask 0-bits in filterPktDataNotMask – indicate the positions where an exact match is required between the relevant bits of input and filterPktData (all bits match) 1-bits in filterPktDataNotMask - indicate the positions where a mismatch is required between the relevant bits of input and filterPktData (at least one bit does not match)
Ex.3 more complex (2) Definition for relevant relevant_bits_different = (input ^ filterPktData) & filterPktDataMask Incorporating with filterPktDataNotMask for a match If ((relevant_bits_different & ~filterPktDataNotMask)=0) filterResult = successful_match; Incorporating with filterPktDataNotMask for a mismatch If ((relevant_bits_different & filterPktDataNotMask)!=0) filterResult = successful_mismatch;
Filter Operations (1) TEST1 – the packet must be long enough so that there are at least as many as bits in the filterPktData (otherwise fails to filter) TEST2 – each bit set to 0 in filterPktDataNotMask indicates a bit position in which the relevant bits of the packet portion should match filterPktData. – If there is a match in every desired bit position, test is passed otherwise test is failed
Filter Operations (2) TEST3: Each bit set to 1 in filterPktDataNotMask indicates a bit position in which the relevant bit of the packet portion should not match filterPktData – The test is passed if there is a mismatch in at least one desired bit position A packet passes this filter if it passes all three tests Ex. If we wish to accept all Ethernet packet that have destination address of 0xA5 and do not have a source address of 0xBB
Filter Operations (3) filterPktDataOffset = 0 filterPktData = 0x A BB filterPktDataMask = 0xFFFFFFFFFFFF FFFFFFFFFFFF filterPktDataNotMask = 0x FFFFFFFFFFFF filterPktDataOffset indicates that the pattern matching should start with the first bit of the packet filterPktData indicates that the pattern of interest consists of 0xA5 and 0xBB filterPktDataMask indicates that all of the first 96 bits are relevant filterPktDataNotMask indicates that the test is for a match on the first 48 bits and a mismatch on the second 48 bits
Filter status Bit#Error 0 Packet is longer than 1,518 octets 1 Packet is shorter than 64 octets 2 Packet experienced a CRC or alignment error EX. An Ethernet fragment would have the status value of 6 ( )
Channel definition A channel is defined by a set of filters We define a pass as a logical 1 and a fail as a logical 0 – Data filter & status filter have to be all passed (AND logic) – The overall result for a channel is the OR of all the filters (at least one of the filter is passed)
Fig 9.5
Channel operation If the packet is accepted – The counter channelMatches is incremented – Associate several controls will be changed channelDataControl – determine whether the channel is on or off, if off no event is generated and no packet is captured channelEventStatus – indicate whether the channel is enabled to generate an event when a packet is matched channelEventIndex – specify an associated event
Filter group (1) Consists of 2 control tables – filterTables define the associated filter – channelTable define a unique channel channelIfIndex – identifies the monitor interface to which the associated filters are applied to allow data into this channel
Fig9.7
Filter group (2) channelAcceptType – controls the action of filters associated with this channel. – acceptedMatched (1) packet will be accepted to this channel if they pass both the packet data match and packet status matches of at least one of associated filter – acceptedFailed (2) packet will be accepted to this channel if they fail either the packet data match or packet status matches of every associated filter
Filter group (3) channelDataControl – on(1) the data, status and events will flow through this channel – off(2) the data, status and event will not flow through this channel channelEventStatus: the event status of this channel – If the channel is configured to generate events when packets are matched
Filter group (4) – eventReady(1) a single event will be generated for a packet match – eventFired(2) no event are generated – eventAlwaysReady(3) every packet match generates an event channelMatches: a counter that records the number of packet matches channelDescription: a text description of the channel
Packet Capture Group (1) It is used to set up a buffering scheme for capturing packets from one of the channels in the filter group bufferControlTable – define one buffer that is used to capture and store packets from one channel captureBufferTable – data buffered
bufferControlTable (1) bufferControlFullStatus – spaceAvailable(1) : the buffer has room to accept new packets – full(2) : depend on the value of bufferControlFullAction bufferControlFullAction – lockwhenFull(1) not accept more packet when buffer is full – wrapWhenFull(2) act as circular buffer, delete the oldest packets
bufferControlTable (2) bufferControlCaptureSliceSize - The maximum number of octets of each packet that will be saved in this capture buffer. – If a 1500-octet packet is received by the probe and this object is set to 500, then only500 octets of the packet will be stored – If this variable is set to 0 the capture buffer will save as many octets as is possible.
bufferControlTable (3) bufferControlDownloadSlicesize - The maximum number of octets of each packet in this capture buffer that will be returned in an SNMP retrieval of that packet. bufferControlCapturedPackets: the number of packets currently in this buffer
Event group An event is triggered by a condition located elsewhere in the MIB – Alarm from risingThreshold (alarm group) An event can trigger an action defined elsewhere in the MIB – Trigger turning a channel ON or OFF (filter group) 2 tables – eventTable and logTable
Fig 9.10
eventTable & logTable eventType: none(1) log(2) snmp-trap(3) log-and- trap(4) – log will be an entry in the log table – Snmp-trap, an SNMP trap is sent to one or more management station eventCommunity : specify community of management stations to receive the trap logTime: time when this log entry was created logDescription: description
Practical issues Packet capture overload – RMON is very real danger of overloading the monitor – Some tests resulted in bad performance Network inventory – RMON is useful for this purpose Hardware platform – Dedicated or non-dedicated host Interoperability – Unreliable in a multivendor environment
RMON probe performance Fig 9.11