Download presentation
Presentation is loading. Please wait.
Published byVeronika Hanna Biró Modified over 5 years ago
1
The CALgorithm for Detecting Bandwidth Changes
Connie Logg SLAC
2
Motivation Throughput to a target host decreases dramatically - questions arise: Why? When did it start? What was the duration? Is the decrease periodic? Was it associated with a route change? Could we have known and avoided being affected? What other destinations are affected?
3
The Challenge Lightweight detection techniques
We do not want to consume bandwidth to measure bandwidth Quick response detection requires frequent measurements Must automate the detection and generate alerts automatically
4
Measurement Tools IEPM-BW framework ABWE Traceroute Ping
5
Methodology - I Every few minutes (10 currently) run the tests to target hosts Ping See if we can ping the host. If not, it is not the end of the world unless we get “unknown host” response Results are logged to a flat file
6
Methodology - II Traceroute
Run forward, and if possible, reverse traceroutes Need to be able to ssh to target to run reverse traceroute (not always possible) Depending upon traceroute restrictions in the route, may need to run ICMP traceroute instead of UDP traceroute Record in a flat file the traceroute results
7
Methodology - III ABWE is running continuously, every minute. The data is put into an Oracle database, one point per minute
8
Analysis “triganal” analyzes the ABWE data for decreases in throughput (can also do throughput increases, but we are not concerned about those now) Note the ABWE data is once a minute “triganal” parameters depend on data frequency and how long you want drop to exist before alerting on it
9
“triganal” Methodology - I
There are two data buffers: A history buffer – histbuf – where the processed data is stored. It has a minimum size histmin which is the minimum amount of data required to “prime the pump”. This data is loaded into the history buffer when the algorithm is invoked. It has a maximum size histmax which is the maximum amount of data allowed in histbuf. As new values are added to histbuf, if size(histbuf)>histmax, the oldest values are removed
10
“triganal” – Methodology - II
A trigger buffer – trigbuf where the data which is considered possible “trigger data” is stored for ongoing analysis. trigdur is the number of points which when loaded into trigbuf, trigger the alert analysis. This also, in this case, represents the amount of time the throughput must be depressed before the trigger buffer data is evaluated to see if it constitutes an “alert” situation
11
“triganal” – Methodology - III
The algorithm is controlled by various parameters which must be tuned to the nature of the data histmax, histmin, and trigur which were discussed previously Sensitivity ($sens) – which is a multiplicative factor applied to the standard deviation for determining whether the new data point goes into the trigger buffer or whether it is an outlier
12
“triganal” – Methodology - IV
Threshold parameters – are % change values which determine whether the contents of the full trigger buffer is a major alert or a minor alert. Major and minor alerts are implemented to allow for “tuning” of the algorithm. Generally we set the equal (disable minor alerts): minorthresh = majorthresh Assume they are: majorthresh(40%) and minorthresh(40%)
13
“triganal: - Methodology V
As the data is processed in time order, there are two functions (qtrigger and qoutlier) which are applied to each new data point to determine which buffer (histbuf or trigbuf) and what state the data is to be loaded into the appropriate buffer in. “Outlier” data is loaded into the buffer as negative data, and the script which calculates the mean and standard deviation (calcstats) does not include the negative data in its calculations.
14
“triganal” – Methodology - VI
qtrigger – determines whether a value qualifies for the trigger buffer. $val = value currently being examined histmean – mean of data in the history buffer histsd – standard deviation of the data in the history buffer if (($val > histmean+sens*histsd) or ($val < histmean-sens*histsd)) then qtrigger = true else qtrigger = false
15
“triganal” – Methodology - VII
qoutlier – determines whether a value is so out of range that it is an outlier and should not be included in the mean and standard deviation calculations if (($val > histmean+sens*histsd*2) or ($val < histmean-sens*histsd*2)) then qoutlier = true else qoutlier = false (note that one might use variance instead of 2*histsd*, but we find that the variance does not work well)
16
“triganal” – Startup Prime histbuf with histmin values and then calculate the histmean and histsd value. Initialize the data direction: $curdir = none
17
The Master “triganal” Loop - I
Loop over the values in the data set with the following algorithm (start of data input loop) Is $val NOT a trigger value? Then If (abs(($val - histmean)/histmean) < .1) add value to histbuf but do not include it in the stats ($val = -$val). This is to avoid flatlining the distribution and making the histsd very small. Calculate the stats histmean and histsd If (size(trigbuf)>0) then remove oldest trigbuf value We want to age out the data in the trigger buffer if we are recovering from a bandwidth drop Go to start of data input loop
18
The Master “triganal” Loop - II
$val is a trigger value: direction of change is important. $curdir = current direction of data from histmean which is in trigbuf If ($val > histmean) then “direction” $valdir = up else $valdir = down If trigbuf is empty, $curdir = $valdir If qoutlier($val) then $val = -$val Save $val in trigbuf
19
The Master “triganal” Loop - III
If ($curdir ne “none” and $curdir ne $valdir) the data has changed direction and we abort the trigger state Save the absolute value of the trigbuf data into histbuf Calculate new histbuf stats Clear the trigger buffer trigbuf Increment $aborttrigger $curdir = “none” Go to start of the data input loop
20
The Master “triganal” Loop - IV
If Trigger buffer is not full: Go to start of data input loop If the Trigger buffer is full: Calculate the trigmean and trigsd of the absolute values of all the trigbuf data Calculate the percent change $perchange = 100*(histmean-trigmean)/histmean
21
The Master “triganal” Loop - V
If this is NOT drop in throughput That is (trigmean >= histmean) Add abs(trigbuf values) to histbuf Clear trigbuf Recalculate the histbuf stats Go to start of the data input loop
22
The Master “triganal” Loop - VI
It IS a drop in throughput does $perchange exceed the majorthresh? Compare it to the previous still active alert if there is one ($alertmean ne 0) $trigchange = 100*($alertmean-$trigmean)/$alertmean If ($trigchange > majorthresh) then Generate a major alert Add absolute value (trigbuf values) to histbuf Clear trigbuf, reset its stats to 0, and calculate new histbuf stats Set $alertmean = trigmean (preserve alert status) Go to start of the data input loop
23
The Master “triganal” Loop - VII
There was no previous alert ($alertmean = 0) Generate the alert $alertmean = trigmean Add abs(trigbuf values) to histbuf Recalcualte histbuf stats Clear trigbuf Go to start of the data input loop END OF DATA INPUT LOOP – all data is processed
24
Final Steps Create the plots for the html pages That’s all Folks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.