MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg.

mPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Supervisor WP4 mPlane itervative measurement Measurement Layer mInterface mInterface e mProbe 1mProbe 2mProbe NlegacyProbe 1legacyProbe 2legacyProbe N WP2 Raw data Coordination Automation Analysis Coordination Automation Analysis WP3 Repository and Analysis Layer legacyDB 1 legacyDB 2 legacyDB N mPlane Repository DBStream Blockmon Data collection & processing Data collection & processing Intelligent Reasoner Module 1 Module 2 Module N Analysis Modules

Outline The Useful – Coordination and Analysis The mPlane Reasoner(s) Analysis Modules

WP4 Overview Intelligent Reasoner for Iterative and Adaptive Analysis  Guides and automates the iterative measurement and exploration, diagnosis process Monitoring Data Analysis Modules  Complex data analysis, high visibility, filter data accessed at Repos, very specific data (low volume) from probes Supervisor  The glue of the mPlane protocol  Provides centralized control of distributed measurement framework

The Reasoner is responsible for driving the measurement analysis process, which by nature is iterative, and ideally adaptive (learning). Depending on the use-case, the Reasoner has different roles:  In the case of troubleshooting support  iteratively find the Root Causes of the associated problems  In the case of generic measurement analysis  automate the iterative process Each use case defines/instantiates a specific Reasoner addressing its goals Still, generic design rules of a specific Reasoner can be reused in other use cases The mPlane Reasoner

The Reasoner – Components The Reasoner consists of 3 different blocks: The Knowledge Structure:  The memory or knowledge of the system  Initially based on expert domain knowledge (diagnosis rules)  Extended by learning from past experiences (knowledge discovery) The Reasoning/Diagnosis Process:  Automates/structures the iterative analysis The Knowledge Discovery Process:  Enriches the knowledge structure and the reasoning process  Based on learning (supervised/unsupervised)

The Reasoner – The Overall Picture Reasoning/Diagnosis Process Reasoning/Diagnosis Process The “Knowledge” of the Reasoner Knowledge Discovery What I Know Learning (un)supervised Automate Analysis, based on what I know

The Diagnosis Process (1/2) The Reasoner does not work on raw data, but on events An event captures a particular type of network conditions  E.g., link congestion, YouTube throughput drop, overloaded cell, Google CDN load- balancing, etc. Events are extracted from raw measurements through a retrieval process (actual algorithms at WP2, WP4, queries, etc. ) Events are defined as m-tuples including the following fields:  event name:  e.g., link overload.  location type:  e.g., Gn downlink interface.  time span:  e.g., 2013-10-21-12:30:00, 2013-10-21-12:35:00.  retrieval process:  e.g., Simple Link Congestion Detection Algorithm – SLCDA (with utilization threshold Cth).  additional diagnosis features:  e.g., number of flows, number of bytes, list of server IPs originating the flows, etc.

The Diagnosis Process (2/2) Some examples of events related to Root Cause Analysis (RCA) 1.A congested Gn interface in a mobile ISP during 5’: 2.An anomaly detected in YouTube traffic, impacting users’ QoE for 5’:

Diagnosis Graph (1/4) Relates problems/issues with events and root causes, exploring the temporal and spatial relationships between events Which type of diagnosis graph reasoning?  Rule-based reasoning (decision- tree like graph)  Easier to implement and configure (easy to add domain knowledge)  Gives simple and direct association between the diagnosed root cause and the evidence(s) for better interpretation  It is very effective in the practice  Other types of Iterative Reasoning can be implemented in such a way (not only RCA, but generic iterative measurement processes) Using per use-case graphs, the Reasoner looks for the presence of events, and identifies the root cause as the leaf with the highest probability

Example: Who to blame when YouTube is not working? AS 2 AS 1 ISP Network Devices?ISP?Internet?YouTube? G-CDN

Diagnosis Graph (2/4) An example of a Diagnosis Graph (DG) associated to the detection and RCA of QoE- relevant anomalies in YouTube: In the example, the DG is structured in 5 different macro-blocks: ① QoE-relevant Anomaly Detection block ② End-device Diagnosis block ③ ISP Diagnosis block ④ Internet paths Diagnosis block ⑤ CDN servers Diagnosis block Example of root causes and the associated rules’ description

ISP Diagnosis block Purpose: detect QoE degradation BASIC PROCESS: 1)Continuous passive monitoring 2)Trigger of active monitoring in case of alarms Diagnosis Graph (3/4)

High level Diagnosis Graph for ISP (simplified from D4.2): Triggers Internet Active Probe Alarms from different POPs? Issue external to SP domain Alarms from different BRAS? Issue in SP Core Network Issue on BRAS Issue on DSLAM Issue on Access Lines Triggers POP Active Probe Inter-domain measurements check Triggers DSLAM Active Probe Diagnosis Graph (4/4)

Knowledge Discovery Domain knowledge and operational experience is incomplete (just using domain-based diagnosis graphs limits the system capabilities) Therefore, the specification of an initial diagnosis graph can be rather under- performing, both in accuracy and completeness The role of Automatic Knowledge Discovery  correlate all the events that occur at the same time and are spatially related to the service problem under investigation… …And learn new diagnosis rules (new knowledge) from past experiences  Supervised learning in case of labeled data  Unsupervised learning in the general case Some mPlane techniques : Automatic Rule Mining, Sub-Space Clustering, Decision–Trees Learning Final expert intervention to validate the identified diagnosis rules, which are added to the Knowledge Structure

Multiple mPlane Reasoners A mPlane Reasoner is an extended mPlane client, which performs sequential tasks based on intermediate analysis results, actuating through the mPlane Supervisor interfaces In the practice, we implemented different Reasoners following the aforementioned principles, but tailored to the specific needs of each use case: 1.Reasoner in nodejs: basic mPlane Reasoner 2.Reasoner for Content Popularity Estimation 3.Reasoner for Content Curation 4.Reasoner for Web browsing QoE 5.Reasoner for Mobile Network RCA 6.Reasoner for Anomaly Detection and RCA 7.Reasoner for SLA Verification 8.Reasoner for Multimedia Content Delivery Analysis 9.Reasoner for GLIMPSE

Analysis Modules or Algorithms  further evaluate the measurements gathered and pre-analyzed by the lower layers of mPlane They operate on low amounts of data (as compared with the data available on WP3 or eventually gathered at WP2) Analysis Modules

Per-use case algorithms The main Analysis Modules are linked to the proposed use cases: Find the cause of Quality of Experience (QoE) degradations Estimate the future popularity trends of services and contents for network optimization Classify and promote interesting web content to end-users Assess and troubleshoot performance and quality of multimedia stream delivery Diagnose performance issues in web and identify the segment that is responsible for the QoE degradation Find root cause of problems related to connectivity and poor QoE on mobile devices Detect and diagnose anomalies in Internet-scale services (e.g., CDN-based services) Verify SLAs …but there is more

QoE  QoE-based monitoring for YouTube: metrics to detect playback stallings  Relate OWD variation to QoE, for generic class of applications Topology  Detect Anycast Services: determine if a service uses IP anycast  Reverse Traceroute – DisNETPerf: find probes near some point of interest in the network to launch active measurements  Topology discovery: identification of middle boxes, TCP proxies and NATs  MPLS transit tunnel analysis: Classification of MPLS tunnels based on their usage/purpose (mono-path, ECMP, multi-FEC, etc.) Topology/Performance  Analyze dynamics of forwarding and routing paths : determine whether routing paths follow perturbations experienced by forwarding paths or vice versa  Prediction of Unmeasured Paths: Inference of path properties (RTT, Available Bandwidth, etc.) on unmeasured network paths Some Extended Analysis Modules

Partial Mapping of Analysis Modules to Use Cases Reasoners and Analysis Modules (as well as everything presented so far during the day) is available at the mPlane website as soft tools: https://www.ict-mplane.eu/public/software

- 21 - 4 seconds of stalling On the real mobile network Lab studies  1 single stalling event heavily deteriorates the experience of the end-user  2 or more stallings already means bad quality  Duration of the stallings is less critical, but also has an important impact on QoE  Stallings are the impairments perceived by the end- user (independently of the video resolution, or even DASH) MOS = F( N, L ) Selected examples I: YouTube QoE

 We introduced a simple KPI to monitor YouTube QoE from passive network measurements  Buffer depletion generally occurs because the downlink bandwidth is lower than the video bitrate Ex: std 360p YouTube videos VBR=600 kbps  DBW > 750 kbps Stallings and Download Throughput

Selected examples II: Anomaly Detection and Diagnosis (1)Reference-Set identification: find past traffic distributions which are a suitable reference of normality (2)AD test: use a normalized variant of the Kullback-Leibler divergence to decide if current distribution is compatible with the reference-set feature CDF x 1 and x 2 are similar → L(x 1,x 2 ) is small x 1 and x 3 are dissimilar → L(x 1,x 3 ) is large x1x1 x2x2 x3x3  We conceived a statistical AD tool which works with full feature distributions  AD algorithm consists of two phases:

Using ADTool for Detecting and Diagnosing Anomalies Many interesting service anomalies are observed as abrupt changes in the DNS counts Reasoner approach: correlate observations from multiple metrics revealing service-related and/or device related anomalies:  Fully Qualified Domain Name  Device OS  Device manufacturer (TAC number in mobile devices)  HTTP response code  and so on… Example: service/device related real anomaly in mobile devices

Selected examples: Anomaly Detection and Diagnosis DNS queries counts in a mobile network  Periodic spikes  daily synchronization events  Peak hour utilization  Traffic anomaly, what’s that?  easy to detect, not so easy to diagnose  Similar behavior in tablets  The anomaly is only observable for Apple devices akadns.net (Akamai DNS) push.apple.com (Apple Push Notification Service) Connection issues to Apple push notification servers

Problem solved: Anycast enumeration and geolocation Iterative methodology based on geographically distributed VPs  Determine if a service uses IP anycast  Enumerate replicas sharing the same IP address  Geolocate those replicas The iterative workflow is lightweight O(100) pkts, and fast O(100) ms Shall support RIPE, mPlane/Planetlab probes (RIPE integration in mPlane) Selected examples III: Anycast Detection

Selected examples IV: DisNETPerf Problem solved  Reverse Traceroute (no IP spoofing nor IP record):  find the mPlane probe that is closest to a given Point of Interest (PoI)  to enable troubleshooting on the path from that PoI to some user  without control on the PoI side (e.g., YouTube server) Neighborhood model:  combined topology- and delay-based distance (BGP same AS + min RTT) Main idea:  we rely on a large set of probes widely spread (e.g., RIPE Atlas)  Given IPs (eg YouTube) and IPd (eg, PoP @Heidelberg), locate IPdisnet  IPdisnet “mimics” IPs in terms of IPs  IPd path similarity  Run traceroute measurements from IPc to IPd  Collect data for troubleshooting-purposes

DisNETPerf in a Nutshell mPlane – 2 nd Review Meeting Brussels, February 10 th, 2015 Reverse Traceroute IPs  IPd?

Backup slides

Selected examples I: Content Popularity Early detection of contents which will receive attention mPlane Cache

How mPlane can make it happen Probes (passive) Repository Analysis Modules Popularity Modeler Popularity Predictor Reasoner Detect devices and caches close to location Supervisor Notify popular contents HTTP requests CDN supervisor Caching strategies based on future popular contents

Preliminary Results Popularity Modeler and Predictor modules  Topic models: GMM + LDA  Maximum likelihood Caching policy based on content popularity  vs. LRU and LFU (Least Recently/Frequently Used) We improve the SotA algorithms by obtaining the similar RMSE for a much smaller observation window (30’ vs. 4 hs) RMSE

Selected examples II: Passive Media Curation A new way of helping users finding, fast, relevant content in the web mPlane User clicks are a good measure of Interest (users don’t click randomly) Curated (relevant) content

WP2 WP3 WP4.1 – Analysis Modules Portals vs Contents Content popularityStatistics Classify Contents Elect Content to promote Supervisor Publish Content How can mPlane make it happen User URLs Interesting URLs WP3- Scalable data analysis Orchestrate http://webrowse.polito.it/ (prototype running since few months) Up to 5M requests/hour WP2’ (active)

Content versus Portal classifier module Features:  Hostname  URL length  Frequency as hostname  Request Arrival Process cross-correlation  1-day periodicity « Feed » a naive bayes classifier  Tested different combinations  The best is: URL length+period: As accurate as the 5 features together Accuracy  Tested on manually verified ground truth traces  Used 2/3 for training and 1/3 for “prediction”  Overall 96% accuracy of the classifier  94% precision, 100% recall in detecting content-URLs Content-URLWeb portal www.news.com/region/news1.htmwww.news.com/region/news1.htm www.news.com, www.news.com/region/www.news.comwww.news.com/region/

Content promotion module (in progress) Three types of promoted content-URLs so far  Live stream: News/Videos/Blogs currently attracting the attention of the crowd  Top : Most popular (last day, week, month etc) over all content-URLs.  Hot: A « mixture » of popularity and freshness (adapted from reddit’s hot algorithm) First users like them! Timestamp of first view Absolute reference: start date of Netcurator A freshness constant period (12 hours) Relevant 28% Very relevant 42% Extremely relevant 13% Poor 6% Not that relevant 8%

MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg.

Similar presentations

Presentation on theme: "MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg.

Similar presentations

Presentation on theme: "MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg."— Presentation transcript:

Similar presentations

About project

Feedback