Management Plane Analytics Aaron Gember-Jacobson, Wenfei Wu, Xiujun Li, Aditya Akella, Ratul Mahajan 1.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Andrea M. Landis, PhD, RN UW LEAH
Autonomic Scaling of Cloud Computing Resources
Controlling for Time Dependent Confounding Using Marginal Structural Models in the Case of a Continuous Treatment O Wang 1, T McMullan 2 1 Amgen, Thousand.
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
High School Graduation Test Review Domain: Data Analysis How is data presented, compared and used to predict future outcomes?
Two-Sample Inference Procedures with Means
Departments of Medicine and Biostatistics
A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.
Statistics Micro Mini Threats to Your Experiment!
Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.
1 Pertemuan 11 Analisis Varians Data Nonparametrik Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
BS704 Class 7 Hypothesis Testing Procedures
Diagnosis of Ovarian Cancer Based on Mass Spectrum of Blood Samples Committee: Eugene Fink Lihua Li Dmitry B. Goldgof Hong Tang.
Designs that allow testing of hypotheses.  Describe pre-experimental, experimental and quasi-experimental research designs.  Explain the types of conclusions.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Theophilus Benson Aditya Akella David A Maltz
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter 8 Introduction to Hypothesis Testing
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Active Learning for Class Imbalance Problem
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Management Plane Analytics Aaron Gember-Jacobson, Wenfei Wu, Xiujun Li, Aditya Akella, Ratul Mahajan 1.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
Everything is Missing… Data A primer on causal inference and propensity scores Dan Chateau.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
381 Hypothesis Testing (Testing with Two Samples-III) QSCI 381 – Lecture 32 (Larson and Farber, Sects 8.3 – 8.4)
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
I.Intro to Statistics II.Various Variables. I.Intro to Statistics A. Definitions -
Benk Erika Kelemen Zsolt
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.
7.4 – Sampling Distribution Statistic: a numerical descriptive measure of a sample Parameter: a numerical descriptive measure of a population.
Independent Samples 1.Random Selection: Everyone from the Specified Population has an Equal Probability Of being Selected for the study (Yeah Right!)
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Confidence intervals and hypothesis testing Petter Mostad
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Konstantina Christakopoulou Liang Zeng Group G21
Aim: What factors must we consider to make an experimental design?
Statistics for the Behavioral Sciences, Sixth Edition by Frederick J. Gravetter and Larry B. Wallnau Copyright © 2004 by Wadsworth Publishing, a division.
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
DEPARTMENT OF STATISTICS Statistical literacy. DEPARTMENT OF STATISTICS Damaged for life by too much TV.
Ch 1: Scientific Understanding of Behavior Ch 4: Studying Behavior.
Of the following situations, decide which should be analyzed using one-sample matched pair procedure and which should be analyzed using two-sample procedures?
(ARM 2004) 1 INNOVATIVE STATISTICAL APPROACHES IN HSR: BAYESIAN, MULTIPLE INFORMANTS, & PROPENSITY SCORES Thomas R. Belin, UCLA.
Bayesian Active Learning with Evidence-Based Instance Selection LMCE at ECML PKDD th September 2015, Porto Niall Twomey, Tom Diethe, Peter Flach.
Introduction to Hypothesis Testing. Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation.
Canadian Bioinformatics Workshops
Chapter 13 – Ensembles and Uplift
Analytics in Higher Education: Methods Overview
Statistical Data Analysis
Stable and Practical AS Relationship Inference with ProbLink
Inference Concepts 1-Sample Z-Tests.
Presentation transcript:

Management Plane Analytics Aaron Gember-Jacobson, Wenfei Wu, Xiujun Li, Aditya Akella, Ratul Mahajan 1

Important network planes Data plane Forwards packets Data plane Forwards packets Control plane Computes routes Control plane Computes routes Analyze using traceroute, Rocketfuel, pathchar, pathload, etc. 2 Management plane Defines the network’s physical structure Configures the control plane Management plane Defines the network’s physical structure Configures the control plane Analyze using ???

Why analyze the management plane? 3 Does a network management practice impact network health (i.e., problem frequency)? Good management practices are important!

Disagreement among experts To what extent does a management practice impact the frequency/severity of problems? 4

Management plane analytics (MPA) 5 Configs Tickets Inventory MPA framework Quantify management practices and network health Analyze relationships Practices that cause poor health Apply to 850+ networks from a large online service provider Predictive model

Motivation How do we… 1. Quantify an organization’s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices? Outline 6

Classes of management practices 1.Design practices – long-term decisions about network structure – # of devices, roles, models – routing protocols, size of routing domains, … 2.Operational practices – day-to-day activities that address emerging needs – frequency of config changes, fraction automated, types of stanzas changed, … 7 Practices not directly logged!

Inferring management practices 8 Configs InventoryPractices (28) + Health (# of tickets) Tickets Data from 850+ networks for 17 months Quantify on a monthly basis Discretize into equal-width bins

Motivation How do we… 1. Quantify an organization’s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices? Outline 9

Statistical dependencies 10 Challenge: identify causal relationships

Experimental design 11 causes Other practices TreatmentOutcome Confounding factors Randomized experiment Quasi-experimental design (QED) [Krishnan et al. IMC ‘12, IMC’ 13] PracticeHealth

TreatmentConfoundingOutcome # Models# Roles# Changes# Tickets Propensity score matching 12 Untreated Treated Propensity score = predicted probability (Treatment = yes | Confounding Practices = …) Compare cases from population samples where distribution of confounding factor values are similar Randomized Pre-defined Want randomized

Test for causality 13 TreatmentConfoundingOutcome # Models# Roles# Changes# Tickets = -1 Can we reject? H 0 : median = 0 0 # of pairs Sign-test p-value < ? =

< Causal relationships Practicep-value No. of change events1.05 x No. of change types5.75 x No. of roles2.99 x Frac. events w/ ACL change9.10 x No. of devices1.92 x Avg. devices changed per event3.56 x No. of models1.31 x No. of VLANs6.46 x Frac. events w/ interface change5.27 x Intra-device complexity1.53 x Operators had mixed beliefs Discredits belief that impact is low Agrees with operators

Outline 15 Motivation How do we… 1. Quantify an organization’s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices?

73% Build decision trees using machine learning +Model arbitrary boundaries +Easy to understand Predicting network health 16 Challenge: heavy skew in practices and health

Addressing skew Oversampling – repeat minority class examples during training Boosting – in each iteration, increase the weight of examples that were misclassified using the prior model 17 x2

Overall accuracy: 81% Model accuracy 18 91% with 2-classes Majority predictor Decision tree (DT) DT with oversampling and boosting (MPA)

Conclusion Management plane analysis is important MPA framework 1)Determine which practices cause a decline in health 2)Construct a predictive model of health based on practices Results from an OSP with 850+ networks 19