Download presentation
Presentation is loading. Please wait.
Published byClaude Green Modified over 8 years ago
1
Multi-domain Internet Performance Measurement: Sampling and Analysis Prasad Calyam, Ph.D. (PI) pcalyam@osc.edu pcalyam@osc.edu Project Website: http://www.oar.net/initiatives/research/projects/multidomain_sampling http://www.oar.net/initiatives/research/projects/multidomain_sampling Summer ESCC Meeting, Fairbanks, Alaska July 14 th, 2011
2
Topics of Discussion Project Overview and Research Context –Multi-domain measurement federations and challenges “OnTimeDetect” Tool –Correlated and uncorrelated network anomaly detection and notification in perfSONAR deployments –Tool experiences with world-wide perfSONAR data sets “OnTimeSample” Tool –Meta-scheduling network status sampling for accurate SLA monitoring and network weather forecasting –Tool relevance for scalability and programmability in perfSONAR “OnTimeDetect” Anomaly Detection Integration within DOE –E-Center for DOE enterprise monitoring –ESnet perfSONAR Nagios-plugin for network operations 2
3
Context of our Research Application demands ISP Delivers Tools from our Research 3 Measurement Infrastructures that could benefit from Tools Integration Application communities that could benefit from Tools integration
4
Multi-domain network status sampling Applications need precisely timed measurements across multiple network domains for bottleneck troubleshooting and consequent adaptation –Measurement sampling and analysis requirements – technical issues Strict periodicity for accurate network weather forecasting Adaptive random sampling for rapid anomaly detection Stratified random sampling for routine network monitoring –Multi-domain measurement federation requirements - policy issues Sharing measurement topologies, AAA, measurement policies, measurement data exchange formats, …(e.g., ESnet, Internet2, GEANT) 4 Sampling time interval pattern chosen should depend on the monitoring accuracy objectives
5
perfSONAR Limitations that motivate our Research 5 Measurement Points Data Services Measurement Archives Transformations Service Configuration AAA Services Infrastructure Information Services Topology Service Lookup Analysis/Visualization User GUIs Web Pages NOC Alarms Measurement points cannot handle diverse sampling requirements Programmability of measurement schedules is needed to control inter- sampling times desired in applications Meta-scheduler to control measurement points is not developed Current set of 3 tools (Ping, Traceroute, Iperf) will conflict if another tool is added (e.g., pchar, pathload) Policies for regulation and semantic priorities cannot be enforced Measurement archives have large data sets but lack automated sampling and analysis techniques and tools Anomaly detection, weather forecasting, SLA monitoring and automated fault diagnosis tools are needed along with easy-to-use GUIs Integration with other measurement frameworks for important events correlation needs improvement
6
OnTimeDetect Tool 6 Uncorrelated (APD scheme) and Correlated Anomaly Detection (PCA scheme) gLS/hLS, E-Center DRS Anomaly annotated graphs (implemented) (Work-in-progress) - ESnet dynamic Nagios thresholds E-Center “Anomaly Detection Service” that works seamlessly with DRS Integration with US ATLAS community SC10 SCinet Demo Dashboard
7
OnTimeDetect GUI Tool 7
8
OnTimeDetect Tool (2) Conducted the “first” study to sample and analyze worldwide perfSONAR measurements (480 paths, 65 sites) to detect network anomaly events –Developed an adaptive anomaly detection (APD) algorithm that is more accurate (lower false alarms) than existing schemes (e.g., NLANR/SLAC plateau detector) –Demonstrated how adaptive sampling can reduce anomaly detection times from several days to only a few hours in perfSONAR deployments –Developing a principal component analysis (PCA) based correlated anomaly detection algorithm to localize events on SCinet paths with common links –Paper with APD results published in 2010 IEEE MASCOTS conference Released sampling and analysis algorithms and toolkit for network anomaly notification to perfSONAR users/developers –GUI tool and Command-line tools with web-interfaces developed –Tools have been developed to leverage perfSONAR web-service interfaces for BWCTL, and OWAMP measurements –Twitter interface developed for “ground truth” correlation (e.g., NetAlmanac, logs) with detected network anomaly events in perfSONAR community Software downloads, demos, manuals are at - 8 http://ontimedetect.oar.net http://www.perfsonar.net/download.html
9
Correlated Anomaly Detection in OnTimeDetect Recently developed a network-wide “correlated” anomaly detection scheme using principal component analysis (PCA) in OnTimeDetect –To localize network events affecting paths that have temporal (common monitoring period) and spatial (common intermediate links) correlation –Combined Adaptive Plateau Detector (APD) with PCA scheme to detect anomalies on OWAMP measurements collected at SCinet, Supercomputing 2010 Software development status: –Integrated prototype PCA scheme in latest version of OnTimeDetect Tool (Beta) –Integrating E-Center’s Data Retrieval Service (DRS) data query mechanisms for correlated anomaly detection with topology information 9
10
Correlated Anomaly Detection 10 PCA with APD Anomaly Detection Steps Detection accuracy of correlated and uncorrelated anomalies by SPD scheme Detection accuracy of correlated and uncorrelated anomalies by APD scheme
11
OnTimeSample Tool Context in perfSONAR 11
12
OnTimeSample Tool (2) OnTimeDetect: Meta-scheduler and Policy inference services software for orchestrating perfSONAR active measurements –Benefit is that measurement collection in perfSONAR can be targeted to meet network monitoring objectives of users (e.g., adaptive sampling) –Provides scalability to perfSONAR framework If more tools are added, it allows for conflict-free measurements On-demand measurement requests served with low response times –Provides programmability to perfSONAR framework Enables enforcement of multi-domain policies and semantic priorities to initiate measurements – mitigates unnecessary oversampling –Measurement regulation; e.g., only (1-5) % of probing traffic permitted –Measurement requests from users with higher credentials (e.g., backbone network engineer) get higher priority than other users (e.g., casual tester) Developed “OnTimeSample” tool prototype for several use cases –Routine network monitoring, rapid anomaly detection, accurate network weather forecasting, real-time SLA validation –Evaluated meta-scheduler in terms of “satisfaction ratio” and “stretch fairness” with variety of measurement tasks, policies and topologies –Paper with preliminary results published in 2010 IEEE CNSM conference 12
13
13 Meta-scheduler Algorithms in OnTimeSample Tool Algorithms based on real-time systems scheduling principles that we developed and evaluated are (improvement over existing round-robin (RR) schemes): –HBP: Heuristic bin packing based on execution time e ij Effective for routine network monitoring, but is rigid to handle on-demand measurement requests and diverse sampling patterns –EDF: Earliest Deadline First based on deadline d ij Caters measurement periodicity and flexible for on-demand measurements, but cannot inherently support semantic priorities –SPS: Semantic Priority p ij and Deadline based; d ij + w* f (p ij ) Uses ontologies, priorities with weight w and inference engine Recommended scheme for perfSONAR
14
Use Case of Resource Protection Service in E-Center 14
15
“OnTimeDetect” Integration in E-Center’s Anomaly Detection Service To detect anomalies in E-Center’s perfSONAR data cache REST-based stand alone service designed to work with DRS –ADS request is very similar to DRS request –Output of DRS (in json format) can be directly parsed by ADS –ADS analyzes OWAMP or BWCTL data for each source/destination pair of IPs separately and returns the results Different anomaly detection algorithms implemented as individual detectors –Adaptive Plateau Detection (APD) –Static Plateau Detection (SPD) Novice (default) mode and Expert mode implemented –Expert mode allows user to change anomaly detector’s parameters Social Interface for ADS – “Anomaly Detection Group” https://ecenter.fnal.gov/content/anomaly-detection https://ecenter.fnal.gov/content/anomaly-detection ADS documentation at - https://cdcvs.fnal.gov/redmine/attachments/3487/ADSdoc_1.3.pdf https://cdcvs.fnal.gov/redmine/attachments/3487/ADSdoc_1.3.pdf 15
16
ADS Integration Architecture in E-Center 16 Figure Authors: Maxim Grigoriev, David Eads, Phil DeMar - Fermilab 16
17
Example ADS query GET http://ecenter.fnal.gov:9055/ads/spd.json?sensitivity=2&ele vation1=20&data_type=owamp&src_ip=131.243.24.11&dst _ip=198.32.44.130&start=2010-06-05 06:01:02&end=2010- 06-05 07:02:01 Plateau Detector type i.e., SPD or APD Detector specific parameters (optional) Path parameters Type of measurement data analyzed 17
18
Example ADS Response { 192.12.15.23: { 134.79.104.209: { src_hub: "BNL", dst_hub: "SLAC", metaid: "1234", sensitivity: 2, status: "OK", } 134.79.104.209: { 192.12.15.23: { src_hub: "SLAC", dst_hub: "BNL", metaid: "123434", sensitivity: 2, status: { critical: { 1304479767: { anomaly_type: "plateau", value: 730181000, } }, warning: { 1304472510: { anomaly_type: "plateau", value: 301539000, } }, elevation1: 0.2, elevation2: 0.4, plateau_size: } APD/SPD could return multiple anomalies in each dataset 18
19
E-Center User Interface Integration ADS Expert Mode 19
20
E-Center User Interface Integration (2) 20 Anomaly Annotated Graph 20
21
Future Work for ADS in E-Center User selects an end-point pair of a project/community (e.g., US ATLAS) and queries data over a start time and end time Graph of measurement time series along with annotated anomalies if present will appear, along with anomaly statistics –E.g., multiple metric graphs appear on same page for visual correlation Implement both “uncorrelated” and “correlated” anomaly detectors –Across forward and reverse paths –Across multiple metrics on a path –Across multiple paths centered at a hub Develop an anomaly visualization engine in E-Center for DOE networks Anomaly event notifications are submitted as “issues” in E-Center Build a knowledge base in the “Anomaly Detection Group” of E-Center for discussing anomaly events - project/theme oriented 21
22
“OnTimeDetect” Integration in ESnet’s perfSONAR Nagios Plugins Two APD-based plugins developed in the prototype module that are compatible with current ESnet Nagios Plugins – easy to deploy! –‘check_apd_owdelay.pl’ for OWAMP and ‘check_apd_throughput.pl’ for BWCTL Plugins produce OK, WARNING and CRITICAL messages –Information messages are added to the notification outputs if there are multiple anomaly events or impending events; output code is set to CRITICAL if atleast one anomaly is detected –UNKNOWN message is notified if there is insufficient data for analysis Plugin features –Detects plateau anomalies in BWCTL and OWAMP data collected by querying perfSONAR measurement archives –Option to write analyzed data to files in tuple format for graphing or further analysis –Options to analyze data in both forward and reverse directions –Support for expert configuration of APD parameters 22
23
“OnTimeDetect” Integration in ESnet’s perfSONAR Nagios Plugins (2) Usage: check_apd_throughput.pl -u|--url -s|--source -d|--destination -b|--bidirectional - r -z|-- sensitivity -W|--swc -w|--elevation1 -c|-- elevation2 -a|--algorithm -o|--output-file Sample Output:./check_apd_throughput.pl -u http://bnl- pt1.es.net:8085/perfSONAR_PS/services/pSB -r 36000000 -w 0.2 -c 0.5 -s 198.124.238.38 -d 198.129.254.58 PS_CHECK_THROUGHPUT CRITICAL - Metric is Throughput | Source:198.124.238.38 Destination:198.129.254.58 {Critical{1304565397:1.65043e+08Gbps};Warning{1304531930:1.7202 5e+08Gbps};} | TotalDatum(ForwardDirection)=200;; OK=178;; WARNING=1;; CRITICAL=1;; 23
24
References Project Website Presentations –“Experiences from developing analysis techniques and GUI tools for perfSONAR users”, perfSONAR Workshop, Arlington, VA, 2010.Experiences from developing analysis techniques and GUI tools for perfSONAR users –“Multi-domain Internet Performance Sampling and Analysis Tools”, Internet2/ESCC Joint Techs, Columbus, OH, 2010.Multi-domain Internet Performance Sampling and Analysis Tools –“OnTime Detect Tool Tutorial”, Internet2 Spring Member Meeting, Arlington, VA, 2010.OnTime Detect Tool Tutorial –“Multi-domain Internet Performance Sampling and Analysis”, Internet2/ESCC Joint Techs, Salt Lake City, 2010.Multi-domain Internet Performance Sampling and Analysis Peer-reviewed Papers –P. Calyam, J. Pu, W. Mandrawa, A. Krishnamurthy, "OnTimeDetect: Dynamic Network Anomaly Notification in perfSONAR Deployments", IEEE Symposium on Modeling, Analysis & Simulation of Computer & Telecommn. Systems (MASCOTS), 2010. [Poster]OnTimeDetect: Dynamic Network Anomaly Notification in perfSONAR DeploymentsPoster –P. Calyam, L. Kumarasamy, F. Ozguner, “Semantic Scheduling of Active Measurements for meeting Network Monitoring Objectives”, IEEE Conference on Network and Service Management (CNSM) (Short Paper), 2010. [Poster]Semantic Scheduling of Active Measurements for meeting Network Monitoring ObjectivesPoster Software Downloads –OnTimeDetect: Offline and Online Network Anomaly Notification Tool for perfSONAR Deployments [Web-interface Demo] [SC10 Demo] [Twitter Demo]OnTimeDetect: Offline and Online Network Anomaly Notification Tool for perfSONAR Deployments Web-interface DemoSC10 DemoTwitter Demo –OnTimeSample: Meta-scheduler Tool for perfSONAR Deployments (Alpha software version available upon request) ESnet Blog on our project accomplishments (Link on Homepage of ESnet)ESnet Blog on our project accomplishments 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.