Multi-domain Internet Performance Measurement: Sampling and Analysis Prasad Calyam, Ph.D. (PI) Project Website:

Slides:



Advertisements
Similar presentations
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Paul Schopis, (Co-PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student)
Advertisements

1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Overview of network monitoring development at AMRES Slavko Gajin.
1 Enhanced EDF Scheduling Algorithms for Orchestrating Network-wide Active Measurements Prasad Calyam, Chang-Gun Lee Phani Kumar Arava, Dima Krymskiy OARnet,
Kansei Connectivity Requirements: Campus Deployment Case Study Anish Arora/Wenjie Zeng, GENI Kansei Project Prasad Calyam, Ohio Supercomputer Center/OARnet.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Automated Analysis and Code Generation for Domain-Specific Models George Edwards Center for Systems and Software Engineering University of Southern California.
MAGGIE NIIT- SLAC On Going Projects Measurement & Analysis of Global Grid & Internet End to end performance.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
CASE Tools CIS 376 Bruce R. Maxim UM-Dearborn. Prerequisites to Software Tool Use Collection of useful tools that help in every step of building a product.
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Maintaining and Updating Windows Server 2008
Network Performance Measurement Atlas Tier 2 Meeting at BNL December Joe Metzger
Reading Report 14 Yin Chen 14 Apr 2004 Reference: Internet Service Performance: Data Analysis and Visualization, Cross-Industry Working Team, July, 2000.
1 ESnet Network Measurements ESCC Feb Joe Metzger
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
1 Session Number Presentation_ID © 2001, Cisco Systems, Inc. All rights reserved. Using the Cisco TAC Website for IP Routing Issues Cisco TAC Web Seminar.
Software-defined Networking Capabilities, Needs in GENI for VMLab ( Prasad Calyam; Sudharsan Rajagopalan;
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Connect communicate collaborate perfSONAR MDM updates: New interface, new possibilities Domenico Vicinanza perfSONAR MDM Product Manager
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
An Approach To Automate a Process of Detecting Unauthorised Accesses M. Chmielewski, A. Gowdiak, N. Meyer, T. Ostwald, M. Stroiński
OnTimeMeasure-GENI: Centralized and Distributed Measurement Orchestration Software Prasad Calyam, Ph.D. (PI) Paul Schopis, (Co-PI) Weiping Mandrawa (Network.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
PiPEs Server Discovery – Adding NDT testing to the piPEs architecture Rich Carlson Internet2 April 20, 2004.
TOSCA Monitoring Reference Architecture Straw-man Roger Dev CA Technologies March 18, 2015 PRELIMINARY.
DataGrid Wide Area Network Monitoring Infrastructure (DWMI) Connie Logg February 13-17, 2005.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
13-Oct-2003 Internet2 End-to-End Performance Initiative: piPEs Eric Boyd, Matt Zekauskas, Internet2 International.
Jeremy Nowell EPCC, University of Edinburgh A Standards Based Alarms Service for Monitoring Federated Networks.
Workforce Scheduling Release 5.0 for Windows Implementation Overview OWS Development Team.
January 16 GGF14 NMWG Chicago (June 05) Jeff Boote – Internet2 Eric Boyd - Internet2.
Internet2 Joint Techs Workshop, Feb 15, 2005, Salt Lake City, Utah ESnet On-Demand Secure Circuits and Advance Reservation System (OSCARS) Chin Guok
FNAL E-Center Project Phil DeMar Aug 23, /1/11 Project Overview Network Path Weather Map Service  User-friendly tools to help isolate network.
PerfSONAR-PS Working Group Aaron Brown/Jason Zurawski January 21, 2008 TIP 2008 – Honolulu, HI.
Virtualized Execution Realizing Network Infrastructures Enhancing Reliability Application Communities PI Meeting Arlington, VA July 10, 2007.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
An Active Security Infrastructure for Grids Stuart Kenny*, Brian Coghlan Trinity College Dublin.
DICE: Authorizing Dynamic Networks for VOs Jeff W. Boote Senior Network Software Engineer, Internet2 Cándido Rodríguez Montes RedIRIS TNC2009 Malaga, Spain.
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
Use-cases for GENI Instrumentation and Measurement Architecture Design Prasad Calyam, Ph.D. (PI – OnTimeMeasure, Project #1764) March 31.
Connect communicate collaborate perfSONAR MDM News Domenico Vicinanza DANTE (UK)
Sampling and Analysis Tools for E-Center for Multi-domain Internet Performance Measurement Prasad Calyam, Ph.D. Winter.
Maintaining and Updating Windows Server 2008 Lesson 8.
DOE Award # DE-SC : Sampling Approaches for Multi-Domain Internet Performance Measurement PI: Prasad Calyam, Ph.D.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
CIS 375 Bruce R. Maxim UM-Dearborn
Experience Report: System Log Analysis for Anomaly Detection
Networking for the Future of Science
Robert Szuman – Poznań Supercomputing and Networking Center, Poland
PerfSONAR: Development Status
Internet2 Performance Update
Model-Driven Analysis Frameworks for Embedded Systems
ExaO: Software Defined Data Distribution for Exascale Sciences
E2E piPEs Overview Eric L. Boyd Internet2 24 February 2019.
Performance Measuring & Monitoring
MAGGIE NIIT- SLAC On Going Projects
Interoperable Measurement Frameworks: Internet2 E2E piPEs and NLANR Advisor Eric L. Boyd Internet2 17 April 2019.
“Detective”: Integrating NDT and E2E piPEs
Internet2 E2E piPEs Project
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Presentation transcript:

Multi-domain Internet Performance Measurement: Sampling and Analysis Prasad Calyam, Ph.D. (PI) Project Website: Summer ESCC Meeting, Fairbanks, Alaska July 14 th, 2011

Topics of Discussion Project Overview and Research Context –Multi-domain measurement federations and challenges “OnTimeDetect” Tool –Correlated and uncorrelated network anomaly detection and notification in perfSONAR deployments –Tool experiences with world-wide perfSONAR data sets “OnTimeSample” Tool –Meta-scheduling network status sampling for accurate SLA monitoring and network weather forecasting –Tool relevance for scalability and programmability in perfSONAR “OnTimeDetect” Anomaly Detection Integration within DOE –E-Center for DOE enterprise monitoring –ESnet perfSONAR Nagios-plugin for network operations 2

Context of our Research Application demands ISP Delivers Tools from our Research 3 Measurement Infrastructures that could benefit from Tools Integration Application communities that could benefit from Tools integration

Multi-domain network status sampling Applications need precisely timed measurements across multiple network domains for bottleneck troubleshooting and consequent adaptation –Measurement sampling and analysis requirements – technical issues Strict periodicity for accurate network weather forecasting Adaptive random sampling for rapid anomaly detection Stratified random sampling for routine network monitoring –Multi-domain measurement federation requirements - policy issues Sharing measurement topologies, AAA, measurement policies, measurement data exchange formats, …(e.g., ESnet, Internet2, GEANT) 4 Sampling time interval pattern chosen should depend on the monitoring accuracy objectives

perfSONAR Limitations that motivate our Research 5 Measurement Points Data Services Measurement Archives Transformations Service Configuration AAA Services Infrastructure Information Services Topology Service Lookup Analysis/Visualization User GUIs Web Pages NOC Alarms Measurement points cannot handle diverse sampling requirements Programmability of measurement schedules is needed to control inter- sampling times desired in applications Meta-scheduler to control measurement points is not developed Current set of 3 tools (Ping, Traceroute, Iperf) will conflict if another tool is added (e.g., pchar, pathload) Policies for regulation and semantic priorities cannot be enforced Measurement archives have large data sets but lack automated sampling and analysis techniques and tools Anomaly detection, weather forecasting, SLA monitoring and automated fault diagnosis tools are needed along with easy-to-use GUIs Integration with other measurement frameworks for important events correlation needs improvement

OnTimeDetect Tool 6 Uncorrelated (APD scheme) and Correlated Anomaly Detection (PCA scheme) gLS/hLS, E-Center DRS Anomaly annotated graphs (implemented) (Work-in-progress) - ESnet dynamic Nagios thresholds E-Center “Anomaly Detection Service” that works seamlessly with DRS Integration with US ATLAS community SC10 SCinet Demo Dashboard

OnTimeDetect GUI Tool 7

OnTimeDetect Tool (2) Conducted the “first” study to sample and analyze worldwide perfSONAR measurements (480 paths, 65 sites) to detect network anomaly events –Developed an adaptive anomaly detection (APD) algorithm that is more accurate (lower false alarms) than existing schemes (e.g., NLANR/SLAC plateau detector) –Demonstrated how adaptive sampling can reduce anomaly detection times from several days to only a few hours in perfSONAR deployments –Developing a principal component analysis (PCA) based correlated anomaly detection algorithm to localize events on SCinet paths with common links –Paper with APD results published in 2010 IEEE MASCOTS conference Released sampling and analysis algorithms and toolkit for network anomaly notification to perfSONAR users/developers –GUI tool and Command-line tools with web-interfaces developed –Tools have been developed to leverage perfSONAR web-service interfaces for BWCTL, and OWAMP measurements –Twitter interface developed for “ground truth” correlation (e.g., NetAlmanac, logs) with detected network anomaly events in perfSONAR community Software downloads, demos, manuals are at

Correlated Anomaly Detection in OnTimeDetect Recently developed a network-wide “correlated” anomaly detection scheme using principal component analysis (PCA) in OnTimeDetect –To localize network events affecting paths that have temporal (common monitoring period) and spatial (common intermediate links) correlation –Combined Adaptive Plateau Detector (APD) with PCA scheme to detect anomalies on OWAMP measurements collected at SCinet, Supercomputing 2010 Software development status: –Integrated prototype PCA scheme in latest version of OnTimeDetect Tool (Beta) –Integrating E-Center’s Data Retrieval Service (DRS) data query mechanisms for correlated anomaly detection with topology information 9

Correlated Anomaly Detection 10 PCA with APD Anomaly Detection Steps Detection accuracy of correlated and uncorrelated anomalies by SPD scheme Detection accuracy of correlated and uncorrelated anomalies by APD scheme

OnTimeSample Tool Context in perfSONAR 11

OnTimeSample Tool (2) OnTimeDetect: Meta-scheduler and Policy inference services software for orchestrating perfSONAR active measurements –Benefit is that measurement collection in perfSONAR can be targeted to meet network monitoring objectives of users (e.g., adaptive sampling) –Provides scalability to perfSONAR framework If more tools are added, it allows for conflict-free measurements On-demand measurement requests served with low response times –Provides programmability to perfSONAR framework Enables enforcement of multi-domain policies and semantic priorities to initiate measurements – mitigates unnecessary oversampling –Measurement regulation; e.g., only (1-5) % of probing traffic permitted –Measurement requests from users with higher credentials (e.g., backbone network engineer) get higher priority than other users (e.g., casual tester) Developed “OnTimeSample” tool prototype for several use cases –Routine network monitoring, rapid anomaly detection, accurate network weather forecasting, real-time SLA validation –Evaluated meta-scheduler in terms of “satisfaction ratio” and “stretch fairness” with variety of measurement tasks, policies and topologies –Paper with preliminary results published in 2010 IEEE CNSM conference 12

13 Meta-scheduler Algorithms in OnTimeSample Tool Algorithms based on real-time systems scheduling principles that we developed and evaluated are (improvement over existing round-robin (RR) schemes): –HBP: Heuristic bin packing based on execution time e ij Effective for routine network monitoring, but is rigid to handle on-demand measurement requests and diverse sampling patterns –EDF: Earliest Deadline First based on deadline d ij Caters measurement periodicity and flexible for on-demand measurements, but cannot inherently support semantic priorities –SPS: Semantic Priority p ij and Deadline based; d ij + w* f (p ij ) Uses ontologies, priorities with weight w and inference engine Recommended scheme for perfSONAR

Use Case of Resource Protection Service in E-Center 14

“OnTimeDetect” Integration in E-Center’s Anomaly Detection Service To detect anomalies in E-Center’s perfSONAR data cache REST-based stand alone service designed to work with DRS –ADS request is very similar to DRS request –Output of DRS (in json format) can be directly parsed by ADS –ADS analyzes OWAMP or BWCTL data for each source/destination pair of IPs separately and returns the results Different anomaly detection algorithms implemented as individual detectors –Adaptive Plateau Detection (APD) –Static Plateau Detection (SPD) Novice (default) mode and Expert mode implemented –Expert mode allows user to change anomaly detector’s parameters Social Interface for ADS – “Anomaly Detection Group” ADS documentation at

ADS Integration Architecture in E-Center 16 Figure Authors: Maxim Grigoriev, David Eads, Phil DeMar - Fermilab 16

Example ADS query GET vation1=20&data_type=owamp&src_ip= &dst _ip= &start= :01:02&end= :02:01 Plateau Detector type i.e., SPD or APD Detector specific parameters (optional) Path parameters Type of measurement data analyzed 17

Example ADS Response { : { : { src_hub: "BNL", dst_hub: "SLAC", metaid: "1234", sensitivity: 2, status: "OK", } : { : { src_hub: "SLAC", dst_hub: "BNL", metaid: "123434", sensitivity: 2, status: { critical: { : { anomaly_type: "plateau", value: , } }, warning: { : { anomaly_type: "plateau", value: , } }, elevation1: 0.2, elevation2: 0.4, plateau_size: } APD/SPD could return multiple anomalies in each dataset 18

E-Center User Interface Integration ADS Expert Mode 19

E-Center User Interface Integration (2) 20 Anomaly Annotated Graph 20

Future Work for ADS in E-Center User selects an end-point pair of a project/community (e.g., US ATLAS) and queries data over a start time and end time Graph of measurement time series along with annotated anomalies if present will appear, along with anomaly statistics –E.g., multiple metric graphs appear on same page for visual correlation Implement both “uncorrelated” and “correlated” anomaly detectors –Across forward and reverse paths –Across multiple metrics on a path –Across multiple paths centered at a hub Develop an anomaly visualization engine in E-Center for DOE networks Anomaly event notifications are submitted as “issues” in E-Center Build a knowledge base in the “Anomaly Detection Group” of E-Center for discussing anomaly events - project/theme oriented 21

“OnTimeDetect” Integration in ESnet’s perfSONAR Nagios Plugins Two APD-based plugins developed in the prototype module that are compatible with current ESnet Nagios Plugins – easy to deploy! –‘check_apd_owdelay.pl’ for OWAMP and ‘check_apd_throughput.pl’ for BWCTL Plugins produce OK, WARNING and CRITICAL messages –Information messages are added to the notification outputs if there are multiple anomaly events or impending events; output code is set to CRITICAL if atleast one anomaly is detected –UNKNOWN message is notified if there is insufficient data for analysis Plugin features –Detects plateau anomalies in BWCTL and OWAMP data collected by querying perfSONAR measurement archives –Option to write analyzed data to files in tuple format for graphing or further analysis –Options to analyze data in both forward and reverse directions –Support for expert configuration of APD parameters 22

“OnTimeDetect” Integration in ESnet’s perfSONAR Nagios Plugins (2) Usage: check_apd_throughput.pl -u|--url -s|--source -d|--destination -b|--bidirectional - r -z|-- sensitivity -W|--swc -w|--elevation1 -c|-- elevation2 -a|--algorithm -o|--output-file Sample Output:./check_apd_throughput.pl -u pt1.es.net:8085/perfSONAR_PS/services/pSB -r w 0.2 -c 0.5 -s d PS_CHECK_THROUGHPUT CRITICAL - Metric is Throughput | Source: Destination: {Critical{ : e+08Gbps};Warning{ : e+08Gbps};} | TotalDatum(ForwardDirection)=200;; OK=178;; WARNING=1;; CRITICAL=1;; 23

References Project Website Presentations –“Experiences from developing analysis techniques and GUI tools for perfSONAR users”, perfSONAR Workshop, Arlington, VA, 2010.Experiences from developing analysis techniques and GUI tools for perfSONAR users –“Multi-domain Internet Performance Sampling and Analysis Tools”, Internet2/ESCC Joint Techs, Columbus, OH, 2010.Multi-domain Internet Performance Sampling and Analysis Tools –“OnTime Detect Tool Tutorial”, Internet2 Spring Member Meeting, Arlington, VA, 2010.OnTime Detect Tool Tutorial –“Multi-domain Internet Performance Sampling and Analysis”, Internet2/ESCC Joint Techs, Salt Lake City, 2010.Multi-domain Internet Performance Sampling and Analysis Peer-reviewed Papers –P. Calyam, J. Pu, W. Mandrawa, A. Krishnamurthy, "OnTimeDetect: Dynamic Network Anomaly Notification in perfSONAR Deployments", IEEE Symposium on Modeling, Analysis & Simulation of Computer & Telecommn. Systems (MASCOTS), [Poster]OnTimeDetect: Dynamic Network Anomaly Notification in perfSONAR DeploymentsPoster –P. Calyam, L. Kumarasamy, F. Ozguner, “Semantic Scheduling of Active Measurements for meeting Network Monitoring Objectives”, IEEE Conference on Network and Service Management (CNSM) (Short Paper), [Poster]Semantic Scheduling of Active Measurements for meeting Network Monitoring ObjectivesPoster Software Downloads –OnTimeDetect: Offline and Online Network Anomaly Notification Tool for perfSONAR Deployments [Web-interface Demo] [SC10 Demo] [Twitter Demo]OnTimeDetect: Offline and Online Network Anomaly Notification Tool for perfSONAR Deployments Web-interface DemoSC10 DemoTwitter Demo –OnTimeSample: Meta-scheduler Tool for perfSONAR Deployments (Alpha software version available upon request) ESnet Blog on our project accomplishments (Link on Homepage of ESnet)ESnet Blog on our project accomplishments 24