Management Plane Analytics Aaron Gember-Jacobson, Wenfei Wu, Xiujun Li, Aditya Akella, Ratul Mahajan 1
Important network planes Data plane Forwards packets Data plane Forwards packets Control plane Computes routes Control plane Computes routes Analyze using traceroute, Rocketfuel, pathchar, pathload, etc. 2 Management plane Defines the network’s physical structure Configures the control plane Management plane Defines the network’s physical structure Configures the control plane Analyze using ???
Why analyze the management plane? 3 Does a network management practice impact network health (i.e., problem frequency)? Good management practices are important!
Disagreement among experts To what extent does a management practice impact the frequency/severity of problems? 4
Management plane analytics (MPA) 5 Configs Tickets Inventory MPA framework Quantify management practices and network health Analyze relationships Practices that cause poor health Apply to 850+ networks from a large online service provider Predictive model
Motivation How do we… 1. Quantify an organization’s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices? Outline 6
Classes of management practices 1.Design practices – long-term decisions about network structure – # of devices, roles, models – routing protocols, size of routing domains, … 2.Operational practices – day-to-day activities that address emerging needs – frequency of config changes, fraction automated, types of stanzas changed, … 7 Practices not directly logged!
Inferring management practices 8 Configs InventoryPractices (28) + Health (# of tickets) Tickets Data from 850+ networks for 17 months Quantify on a monthly basis Discretize into equal-width bins
Motivation How do we… 1. Quantify an organization’s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices? Outline 9
Statistical dependencies 10 Challenge: identify causal relationships
Experimental design 11 causes Other practices TreatmentOutcome Confounding factors Randomized experiment Quasi-experimental design (QED) [Krishnan et al. IMC ‘12, IMC’ 13] PracticeHealth
TreatmentConfoundingOutcome # Models# Roles# Changes# Tickets Propensity score matching 12 Untreated Treated Propensity score = predicted probability (Treatment = yes | Confounding Practices = …) Compare cases from population samples where distribution of confounding factor values are similar Randomized Pre-defined Want randomized
Test for causality 13 TreatmentConfoundingOutcome # Models# Roles# Changes# Tickets = -1 Can we reject? H 0 : median = 0 0 # of pairs Sign-test p-value < ? =
< Causal relationships Practicep-value No. of change events1.05 x No. of change types5.75 x No. of roles2.99 x Frac. events w/ ACL change9.10 x No. of devices1.92 x Avg. devices changed per event3.56 x No. of models1.31 x No. of VLANs6.46 x Frac. events w/ interface change5.27 x Intra-device complexity1.53 x Operators had mixed beliefs Discredits belief that impact is low Agrees with operators
Outline 15 Motivation How do we… 1. Quantify an organization’s practices? 2. Identify which practices impact network health? 3. Predict network health given a set of practices?
73% Build decision trees using machine learning +Model arbitrary boundaries +Easy to understand Predicting network health 16 Challenge: heavy skew in practices and health
Addressing skew Oversampling – repeat minority class examples during training Boosting – in each iteration, increase the weight of examples that were misclassified using the prior model 17 x2
Overall accuracy: 81% Model accuracy 18 91% with 2-classes Majority predictor Decision tree (DT) DT with oversampling and boosting (MPA)
Conclusion Management plane analysis is important MPA framework 1)Determine which practices cause a decline in health 2)Construct a predictive model of health based on practices Results from an OSP with 850+ networks 19