The CALgorithm for Detecting Bandwidth Changes

Slides:

Advertisements

Similar presentations

24-1 Chapter 24. Congestion Control and Quality of Service (part 1) 23.1 Data Traffic 23.2 Congestion 23.3 Congestion Control 23.4 Two Examples.

Advertisements

Andrew Courter Texas Tech University CS5331.  PKS Why PKS? STRIPS The Databases Inference Algorithm Extended Features  PKS Examples  Conclusion and.

Evaluating Search Engine

1 Correlating Internet Performance & Route Changes to Assist in Trouble- shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil.

1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.

Introduction. 2 What Is SmartFlow? SmartFlow is the first application to test QoS and analyze the performance and behavior of the new breed of policy-based.

Workflow Manager and General Tuning Tips. Topics to discuss… Working with Workflows Working with Tasks General Tuning Tips.

Google’s MapReduce Connor Poske Florida State University.

2000 년 11 월 20 일 전북대학교 분산처리실험실 TCP Flow Control (nagle’s algorithm) 오 남 호 분산 처리 실험실

DataGrid Wide Area Network Monitoring Infrastructure (DWMI) Connie Logg February 13-17, 2005.

IEPM-BW: Bandwidth Change Detection and Traceroute Analysis and Visualization Connie Logg, Joint Techs Workshop February 4-9, 2006.

PHP Constructs Advance Database Management Systems Lab no.3.

1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.

BOF Discussion: Uploading IEPM-BW data to MonALISA Connie Logg SLAC Winter 2006 ESCC/Internet2 Joint Techs Workshop ESCCInternet2ESCCInternet2 February.

Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.

Lecture 3: Uninformed Search

JavaScript, Sixth Edition

The Effect of the 2016 Presidential Election on Humana Stock

Project Management: Messages

Computer Organization

3.1 Fundamentals of algorithms

Current FRRS Language & Explanation of Posted Data

Topics discussed in this section:

Updating SF-Tree Speaker: Ho Wai Shing.

The CALgorithm for Detecting Bandwidth Changes

SQL and SQL*Plus Interaction

Applying Control Theory to Stream Processing Systems

Using Partitions and Fragments

ITAU Credit Cards Project

Lecture 25 More Synchronized Data and Producer/Consumer Relationship

BOF Discussion: Uploading IEPM-BW data to MonALISA

Algorithm Analysis CSE 2011 Winter September 2018.

STREAMS failover and resynchronization

Research methods Lesson 2.

Analyzing One-Variable Data

Assembler Design Options

Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.

TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.

Chapter 12: Automated data collection methods

Connie Logg, Joint Techs Workshop February 4-9, 2006

What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.

Predictive Performance

Connie Logg February 13 and 17, 2005

© 2002, Cisco Systems, Inc. All rights reserved.

Crash Recovery, Part 2 R&G - Chapter 18

End-to-end Anomalous Event Detection in Production Networks

JULIE McLAIN-HARPER LINKEDIN: JM HARPER

Experiences in Traceroute and Available Bandwidth Change Analysis

Building Web Applications

© 2002, Cisco Systems, Inc. All rights reserved.

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Experiences in Traceroute and Available Bandwidth Change Analysis

Cash and Cash Management

Significant models of claim number Introduction

Introduction Previous lessons have demonstrated that the normal distribution provides a useful model for many situations in business and industry, as.

Overview of Query Evaluation

Product moment correlation

SLAC monitoring Web Services

Chapter 5: Control Structures II (Repetition)

Performing Database Recovery

Final Design Authorization

Correlating Internet Performance & Route Changes to Assist in Trouble-shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil SLAC.

PubMed/Limits and Advanced Search (module 4.2)

Chapter 4: Simulation Designs

Banafsheh Hajinasab Based on presentation by K. Strnisa, Cosylab

COMP755 Advanced Operating Systems

Modeling and Evaluating Variable Bit rate Video Steaming for ax

University of Wisconsin-Madison Presented by: Nick Kirchem

Finding Statistics from a frequency table

Finding Statistics from a Grouped frequency table

Presentation transcript:

The CALgorithm for Detecting Bandwidth Changes Connie Logg SLAC

Motivation Throughput to a target host decreases dramatically - questions arise: Why? When did it start? What was the duration? Is the decrease periodic? Was it associated with a route change? Could we have known and avoided being affected? What other destinations are affected?

The Challenge Lightweight detection techniques We do not want to consume bandwidth to measure bandwidth Quick response detection requires frequent measurements Must automate the detection and generate alerts automatically

Measurement Tools IEPM-BW framework ABWE Traceroute Ping

Methodology - I Every few minutes (10 currently) run the tests to target hosts Ping See if we can ping the host. If not, it is not the end of the world unless we get “unknown host” response Results are logged to a flat file

Methodology - II Traceroute Run forward, and if possible, reverse traceroutes Need to be able to ssh to target to run reverse traceroute (not always possible) Depending upon traceroute restrictions in the route, may need to run ICMP traceroute instead of UDP traceroute Record in a flat file the traceroute results

Methodology - III ABWE is running continuously, every minute. The data is put into an Oracle database, one point per minute

Analysis “triganal” analyzes the ABWE data for decreases in throughput (can also do throughput increases, but we are not concerned about those now) Note the ABWE data is once a minute “triganal” parameters depend on data frequency and how long you want drop to exist before alerting on it

“triganal” Methodology - I There are two data buffers: A history buffer – histbuf – where the processed data is stored. It has a minimum size histmin which is the minimum amount of data required to “prime the pump”. This data is loaded into the history buffer when the algorithm is invoked. It has a maximum size histmax which is the maximum amount of data allowed in histbuf. As new values are added to histbuf, if size(histbuf)>histmax, the oldest values are removed

“triganal” – Methodology - II A trigger buffer – trigbuf where the data which is considered possible “trigger data” is stored for ongoing analysis. trigdur is the number of points which when loaded into trigbuf, trigger the alert analysis. This also, in this case, represents the amount of time the throughput must be depressed before the trigger buffer data is evaluated to see if it constitutes an “alert” situation

“triganal” – Methodology - III The algorithm is controlled by various parameters which must be tuned to the nature of the data histmax, histmin, and trigur which were discussed previously Sensitivity ($sens) – which is a multiplicative factor applied to the standard deviation for determining whether the new data point goes into the trigger buffer or whether it is an outlier

“triganal” – Methodology - IV Threshold parameters – are % change values which determine whether the contents of the full trigger buffer is a major alert or a minor alert. Major and minor alerts are implemented to allow for “tuning” of the algorithm. Generally we set the equal (disable minor alerts): minorthresh = majorthresh Assume they are: majorthresh(40%) and minorthresh(40%)

“triganal: - Methodology V As the data is processed in time order, there are two functions (qtrigger and qoutlier) which are applied to each new data point to determine which buffer (histbuf or trigbuf) and what state the data is to be loaded into the appropriate buffer in. “Outlier” data is loaded into the buffer as negative data, and the script which calculates the mean and standard deviation (calcstats) does not include the negative data in its calculations.

“triganal” – Methodology - VI qtrigger – determines whether a value qualifies for the trigger buffer. $val = value currently being examined histmean – mean of data in the history buffer histsd – standard deviation of the data in the history buffer if (($val > histmean+sens*histsd) or ($val < histmean-sens*histsd)) then qtrigger = true else qtrigger = false

“triganal” – Methodology - VII qoutlier – determines whether a value is so out of range that it is an outlier and should not be included in the mean and standard deviation calculations if (($val > histmean+sens*histsd*2) or ($val < histmean-sens*histsd*2)) then qoutlier = true else qoutlier = false (note that one might use variance instead of 2*histsd*, but we find that the variance does not work well)

“triganal” – Startup Prime histbuf with histmin values and then calculate the histmean and histsd value. Initialize the data direction: $curdir = none

The Master “triganal” Loop - I Loop over the values in the data set with the following algorithm (start of data input loop) Is $val NOT a trigger value? Then If (abs(($val - histmean)/histmean) < .1) add value to histbuf but do not include it in the stats ($val = -$val). This is to avoid flatlining the distribution and making the histsd very small. Calculate the stats histmean and histsd If (size(trigbuf)>0) then remove oldest trigbuf value We want to age out the data in the trigger buffer if we are recovering from a bandwidth drop Go to start of data input loop

The Master “triganal” Loop - II $val is a trigger value: direction of change is important. $curdir = current direction of data from histmean which is in trigbuf If ($val > histmean) then “direction” $valdir = up else $valdir = down If trigbuf is empty, $curdir = $valdir If qoutlier($val) then $val = -$val Save $val in trigbuf

The Master “triganal” Loop - III If ($curdir ne “none” and $curdir ne $valdir) the data has changed direction and we abort the trigger state Save the absolute value of the trigbuf data into histbuf Calculate new histbuf stats Clear the trigger buffer trigbuf Increment $aborttrigger $curdir = “none” Go to start of the data input loop

The Master “triganal” Loop - IV If Trigger buffer is not full: Go to start of data input loop If the Trigger buffer is full: Calculate the trigmean and trigsd of the absolute values of all the trigbuf data Calculate the percent change $perchange = 100*(histmean-trigmean)/histmean

The Master “triganal” Loop - V If this is NOT drop in throughput That is (trigmean >= histmean) Add abs(trigbuf values) to histbuf Clear trigbuf Recalculate the histbuf stats Go to start of the data input loop

The Master “triganal” Loop - VI It IS a drop in throughput does $perchange exceed the majorthresh? Compare it to the previous still active alert if there is one ($alertmean ne 0) $trigchange = 100*($alertmean-$trigmean)/$alertmean If ($trigchange > majorthresh) then Generate a major alert Add absolute value (trigbuf values) to histbuf Clear trigbuf, reset its stats to 0, and calculate new histbuf stats Set $alertmean = trigmean (preserve alert status) Go to start of the data input loop

The Master “triganal” Loop - VII There was no previous alert ($alertmean = 0) Generate the alert $alertmean = trigmean Add abs(trigbuf values) to histbuf Recalcualte histbuf stats Clear trigbuf Go to start of the data input loop END OF DATA INPUT LOOP – all data is processed

Final Steps Create the plots for the html pages That’s all Folks