MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg.

Slides:



Advertisements
Similar presentations
1/20 Cloud Computing SLAs in FP7 Bruxelles, May 27, 2013 mPlane – an Intelligent Measurement Plane for Future Network and Application Management MPLANE.
Advertisements

Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Imbalanced data David Kauchak CS 451 – Fall 2013.
G-RCA: A Generic Root Cause Analysis Platform for Service Quality Management in Large IP Networks He Yan, Lee Breslau, Zihui Ge, Dan Massey, Dan Pei, Jennifer.
RB-Seeker: Auto-detection of Redirection Botnet Presenter: Yi-Ren Yeh Authors: Xin Hu, Matthew Knysz, Kang G. Shin NDSS 2009 The slides is modified from.
G. Alonso, D. Kossmann Systems Group
XProtect® Expert 2013 Product presentation
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
Windows Server 2008 Chapter 8 Last Update
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Network Planète Chadi Barakat
MPlane – Building an Intelligent Measurement Plane for the Internet Maurizio Dusi – NEC Laboratories Europe NSF Workshop on perfSONAR.
MPlane – Building an Intelligent Measurement Plane for the Internet A quick overview.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Oasis: Anycast for Any Service Michael J. Freedman Karthik Lakshminarayanan David Mazières in NSDI 2006 Presented by: Sailesh Kumar.
1 Automated Fault diagnosis in VoIP 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne.
Exploring VoD in P2P Swarming Systems By Siddhartha Annapureddy, Saikat Guha, Christos Gkantsidis, Dinan Gunawardena, Pablo Rodriguez Presented by Svetlana.
mPlane – Building an Intelligent Measurement Plane for the Internet
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
Physical Layer Informed Adaptive Video Streaming Over LTE Xiufeng Xie, Xinyu Zhang Unviersity of Winscosin-Madison Swarun KumarLi Erran Li MIT Bell Labs.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Oracle9i Performance Tuning Chapter 1 Performance Tuning Overview.
Advanced Networking Lab. Given two IP addresses, the estimation algorithm for the path and latency between them is as follows: Step 1: Map IP addresses.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
An Intelligent Measurement Plane for the Internet Pedro Casas – Senior FTW Vienna Traffic Monitoring & Analysis.
Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
MPlane – an Intelligent Measurement Plane for Future Network and Application Management Grant Agreement n Heidelberg mPlane – Demo.
An Adaptive Video Streaming Control System: Modeling, Validation, and Performance Evaluation PRESENTED BY : XI TAO AND PRATEEK GOYAL DEC
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Emir Halepovic, Jeffrey Pang, Oliver Spatscheck AT&T Labs - Research
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Development of a QoE Model Himadeepa Karlapudi 03/07/03.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Role Of Network IDS in Network Perimeter Defense.
Performance Limitations of ADSL Users: A Case Study Matti Siekkinen, University of Oslo Denis Collange, France Télécom R&D Guillaume Urvoy-Keller, Ernst.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
MPlane Use Case Demonstrations Erhan Kahveci, FASTWEB mPlane Use case Demonstrations Heidelberg November 30 th, 2015.
IETF 62 NSIS WG1 Porgress Report: Metering NSLP (M-NSLP) Georg Carle, Falko Dressler, Changpeng Fan, Ali Fessi, Cornelia Kappler, Andreas Klenk, Juergen.
Lab A: Planning an Installation
Purdue University, Georgia Institute of Technology, AT&T Labs Research
University of Maryland College Park
Content Distribution Networks
Jian Wu (University of Michigan)
PROTEAN: A Scalable Architecture for Active Networks
CFA: A Practical Prediction System for Video Quality Optimization
Vengatanathan Krishnamoorthi, Niklas Carlsson
DDoS Attack Detection under SDN Context
Pong: Diagnosing Spatio-Temporal Internet Congestion Properties
AWS Cloud Computing Masaki.
Scrumium NetBrain Thursday, May 09, 2019.
Presentation transcript:

mPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Supervisor WP4 mPlane itervative measurement Measurement Layer mInterface mInterface e mProbe 1mProbe 2mProbe NlegacyProbe 1legacyProbe 2legacyProbe N WP2 Raw data Coordination Automation Analysis Coordination Automation Analysis WP3 Repository and Analysis Layer legacyDB 1 legacyDB 2 legacyDB N mPlane Repository DBStream Blockmon Data collection & processing Data collection & processing Intelligent Reasoner Module 1 Module 2 Module N Analysis Modules

Outline The Useful – Coordination and Analysis The mPlane Reasoner(s) Analysis Modules

WP4 Overview Intelligent Reasoner for Iterative and Adaptive Analysis  Guides and automates the iterative measurement and exploration, diagnosis process Monitoring Data Analysis Modules  Complex data analysis, high visibility, filter data accessed at Repos, very specific data (low volume) from probes Supervisor  The glue of the mPlane protocol  Provides centralized control of distributed measurement framework

The Reasoner is responsible for driving the measurement analysis process, which by nature is iterative, and ideally adaptive (learning). Depending on the use-case, the Reasoner has different roles:  In the case of troubleshooting support  iteratively find the Root Causes of the associated problems  In the case of generic measurement analysis  automate the iterative process Each use case defines/instantiates a specific Reasoner addressing its goals Still, generic design rules of a specific Reasoner can be reused in other use cases The mPlane Reasoner

The Reasoner – Components The Reasoner consists of 3 different blocks: The Knowledge Structure:  The memory or knowledge of the system  Initially based on expert domain knowledge (diagnosis rules)  Extended by learning from past experiences (knowledge discovery) The Reasoning/Diagnosis Process:  Automates/structures the iterative analysis The Knowledge Discovery Process:  Enriches the knowledge structure and the reasoning process  Based on learning (supervised/unsupervised)

The Reasoner – The Overall Picture Reasoning/Diagnosis Process Reasoning/Diagnosis Process The “Knowledge” of the Reasoner Knowledge Discovery What I Know Learning (un)supervised Automate Analysis, based on what I know

The Diagnosis Process (1/2) The Reasoner does not work on raw data, but on events An event captures a particular type of network conditions  E.g., link congestion, YouTube throughput drop, overloaded cell, Google CDN load- balancing, etc. Events are extracted from raw measurements through a retrieval process (actual algorithms at WP2, WP4, queries, etc. ) Events are defined as m-tuples including the following fields:  event name:  e.g., link overload.  location type:  e.g., Gn downlink interface.  time span:  e.g., :30:00, :35:00.  retrieval process:  e.g., Simple Link Congestion Detection Algorithm – SLCDA (with utilization threshold Cth).  additional diagnosis features:  e.g., number of flows, number of bytes, list of server IPs originating the flows, etc.

The Diagnosis Process (2/2) Some examples of events related to Root Cause Analysis (RCA) 1.A congested Gn interface in a mobile ISP during 5’: 2.An anomaly detected in YouTube traffic, impacting users’ QoE for 5’:

Diagnosis Graph (1/4) Relates problems/issues with events and root causes, exploring the temporal and spatial relationships between events Which type of diagnosis graph reasoning?  Rule-based reasoning (decision- tree like graph)  Easier to implement and configure (easy to add domain knowledge)  Gives simple and direct association between the diagnosed root cause and the evidence(s) for better interpretation  It is very effective in the practice  Other types of Iterative Reasoning can be implemented in such a way (not only RCA, but generic iterative measurement processes) Using per use-case graphs, the Reasoner looks for the presence of events, and identifies the root cause as the leaf with the highest probability

Example: Who to blame when YouTube is not working? AS 2 AS 1 ISP Network Devices?ISP?Internet?YouTube? G-CDN

Diagnosis Graph (2/4) An example of a Diagnosis Graph (DG) associated to the detection and RCA of QoE- relevant anomalies in YouTube: In the example, the DG is structured in 5 different macro-blocks: ① QoE-relevant Anomaly Detection block ② End-device Diagnosis block ③ ISP Diagnosis block ④ Internet paths Diagnosis block ⑤ CDN servers Diagnosis block Example of root causes and the associated rules’ description

ISP Diagnosis block Purpose: detect QoE degradation BASIC PROCESS: 1)Continuous passive monitoring 2)Trigger of active monitoring in case of alarms Diagnosis Graph (3/4)

High level Diagnosis Graph for ISP (simplified from D4.2): Triggers Internet Active Probe Alarms from different POPs? Issue external to SP domain Alarms from different BRAS? Issue in SP Core Network Issue on BRAS Issue on DSLAM Issue on Access Lines Triggers POP Active Probe Inter-domain measurements check Triggers DSLAM Active Probe Diagnosis Graph (4/4)

Knowledge Discovery Domain knowledge and operational experience is incomplete (just using domain-based diagnosis graphs limits the system capabilities) Therefore, the specification of an initial diagnosis graph can be rather under- performing, both in accuracy and completeness The role of Automatic Knowledge Discovery  correlate all the events that occur at the same time and are spatially related to the service problem under investigation… …And learn new diagnosis rules (new knowledge) from past experiences  Supervised learning in case of labeled data  Unsupervised learning in the general case Some mPlane techniques : Automatic Rule Mining, Sub-Space Clustering, Decision–Trees Learning Final expert intervention to validate the identified diagnosis rules, which are added to the Knowledge Structure

Multiple mPlane Reasoners A mPlane Reasoner is an extended mPlane client, which performs sequential tasks based on intermediate analysis results, actuating through the mPlane Supervisor interfaces In the practice, we implemented different Reasoners following the aforementioned principles, but tailored to the specific needs of each use case: 1.Reasoner in nodejs: basic mPlane Reasoner 2.Reasoner for Content Popularity Estimation 3.Reasoner for Content Curation 4.Reasoner for Web browsing QoE 5.Reasoner for Mobile Network RCA 6.Reasoner for Anomaly Detection and RCA 7.Reasoner for SLA Verification 8.Reasoner for Multimedia Content Delivery Analysis 9.Reasoner for GLIMPSE

Analysis Modules or Algorithms  further evaluate the measurements gathered and pre-analyzed by the lower layers of mPlane They operate on low amounts of data (as compared with the data available on WP3 or eventually gathered at WP2) Analysis Modules

Per-use case algorithms The main Analysis Modules are linked to the proposed use cases: Find the cause of Quality of Experience (QoE) degradations Estimate the future popularity trends of services and contents for network optimization Classify and promote interesting web content to end-users Assess and troubleshoot performance and quality of multimedia stream delivery Diagnose performance issues in web and identify the segment that is responsible for the QoE degradation Find root cause of problems related to connectivity and poor QoE on mobile devices Detect and diagnose anomalies in Internet-scale services (e.g., CDN-based services) Verify SLAs …but there is more

QoE  QoE-based monitoring for YouTube: metrics to detect playback stallings  Relate OWD variation to QoE, for generic class of applications Topology  Detect Anycast Services: determine if a service uses IP anycast  Reverse Traceroute – DisNETPerf: find probes near some point of interest in the network to launch active measurements  Topology discovery: identification of middle boxes, TCP proxies and NATs  MPLS transit tunnel analysis: Classification of MPLS tunnels based on their usage/purpose (mono-path, ECMP, multi-FEC, etc.) Topology/Performance  Analyze dynamics of forwarding and routing paths : determine whether routing paths follow perturbations experienced by forwarding paths or vice versa  Prediction of Unmeasured Paths: Inference of path properties (RTT, Available Bandwidth, etc.) on unmeasured network paths Some Extended Analysis Modules

Partial Mapping of Analysis Modules to Use Cases Reasoners and Analysis Modules (as well as everything presented so far during the day) is available at the mPlane website as soft tools:

seconds of stalling On the real mobile network Lab studies  1 single stalling event heavily deteriorates the experience of the end-user  2 or more stallings already means bad quality  Duration of the stallings is less critical, but also has an important impact on QoE  Stallings are the impairments perceived by the end- user (independently of the video resolution, or even DASH) MOS = F( N, L ) Selected examples I: YouTube QoE

 We introduced a simple KPI to monitor YouTube QoE from passive network measurements  Buffer depletion generally occurs because the downlink bandwidth is lower than the video bitrate Ex: std 360p YouTube videos VBR=600 kbps  DBW > 750 kbps Stallings and Download Throughput

Selected examples II: Anomaly Detection and Diagnosis (1)Reference-Set identification: find past traffic distributions which are a suitable reference of normality (2)AD test: use a normalized variant of the Kullback-Leibler divergence to decide if current distribution is compatible with the reference-set feature CDF x 1 and x 2 are similar → L(x 1,x 2 ) is small x 1 and x 3 are dissimilar → L(x 1,x 3 ) is large x1x1 x2x2 x3x3  We conceived a statistical AD tool which works with full feature distributions  AD algorithm consists of two phases:

Using ADTool for Detecting and Diagnosing Anomalies Many interesting service anomalies are observed as abrupt changes in the DNS counts Reasoner approach: correlate observations from multiple metrics revealing service-related and/or device related anomalies:  Fully Qualified Domain Name  Device OS  Device manufacturer (TAC number in mobile devices)  HTTP response code  and so on… Example: service/device related real anomaly in mobile devices

Selected examples: Anomaly Detection and Diagnosis DNS queries counts in a mobile network  Periodic spikes  daily synchronization events  Peak hour utilization  Traffic anomaly, what’s that?  easy to detect, not so easy to diagnose  Similar behavior in tablets  The anomaly is only observable for Apple devices akadns.net (Akamai DNS) push.apple.com (Apple Push Notification Service) Connection issues to Apple push notification servers

Problem solved: Anycast enumeration and geolocation Iterative methodology based on geographically distributed VPs  Determine if a service uses IP anycast  Enumerate replicas sharing the same IP address  Geolocate those replicas The iterative workflow is lightweight O(100) pkts, and fast O(100) ms Shall support RIPE, mPlane/Planetlab probes (RIPE integration in mPlane) Selected examples III: Anycast Detection

Selected examples IV: DisNETPerf Problem solved  Reverse Traceroute (no IP spoofing nor IP record):  find the mPlane probe that is closest to a given Point of Interest (PoI)  to enable troubleshooting on the path from that PoI to some user  without control on the PoI side (e.g., YouTube server) Neighborhood model:  combined topology- and delay-based distance (BGP same AS + min RTT) Main idea:  we rely on a large set of probes widely spread (e.g., RIPE Atlas)  Given IPs (eg YouTube) and IPd (eg, locate IPdisnet  IPdisnet “mimics” IPs in terms of IPs  IPd path similarity  Run traceroute measurements from IPc to IPd  Collect data for troubleshooting-purposes

DisNETPerf in a Nutshell mPlane – 2 nd Review Meeting Brussels, February 10 th, 2015 Reverse Traceroute IPs  IPd?

Backup slides

Selected examples I: Content Popularity Early detection of contents which will receive attention mPlane Cache

How mPlane can make it happen Probes (passive) Repository Analysis Modules Popularity Modeler Popularity Predictor Reasoner Detect devices and caches close to location Supervisor Notify popular contents HTTP requests CDN supervisor Caching strategies based on future popular contents

Preliminary Results Popularity Modeler and Predictor modules  Topic models: GMM + LDA  Maximum likelihood Caching policy based on content popularity  vs. LRU and LFU (Least Recently/Frequently Used) We improve the SotA algorithms by obtaining the similar RMSE for a much smaller observation window (30’ vs. 4 hs) RMSE

Selected examples II: Passive Media Curation A new way of helping users finding, fast, relevant content in the web mPlane User clicks are a good measure of Interest (users don’t click randomly) Curated (relevant) content

WP2 WP3 WP4.1 – Analysis Modules Portals vs Contents Content popularityStatistics Classify Contents Elect Content to promote Supervisor Publish Content How can mPlane make it happen User URLs Interesting URLs WP3- Scalable data analysis Orchestrate (prototype running since few months) Up to 5M requests/hour WP2’ (active)

Content versus Portal classifier module Features:  Hostname  URL length  Frequency as hostname  Request Arrival Process cross-correlation  1-day periodicity « Feed » a naive bayes classifier  Tested different combinations  The best is: URL length+period: As accurate as the 5 features together Accuracy  Tested on manually verified ground truth traces  Used 2/3 for training and 1/3 for “prediction”  Overall 96% accuracy of the classifier  94% precision, 100% recall in detecting content-URLs Content-URLWeb portal

Content promotion module (in progress) Three types of promoted content-URLs so far  Live stream: News/Videos/Blogs currently attracting the attention of the crowd  Top : Most popular (last day, week, month etc) over all content-URLs.  Hot: A « mixture » of popularity and freshness (adapted from reddit’s hot algorithm) First users like them! Timestamp of first view Absolute reference: start date of Netcurator A freshness constant period (12 hours) Relevant 28% Very relevant 42% Extremely relevant 13% Poor 6% Not that relevant 8%