Download presentation
Presentation is loading. Please wait.
Published byRose Anderson Modified over 9 years ago
1
Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University
2
Outline: Degradation – deviation from “normal” (minimum) RTT. Predicting Degradation: –Different Predictors Performance Evaluation: –Precision/recall methodology Suggested Application: Gateway selection
3
Motivating Application AS 56 Peering link AS 123 Intelligent Routing device ? Gateway selection (Intelligent Routing device) Choosing peering links AS 12 AS 41
4
Data and Measurements: Sources Aciri (CA2) AT&T (CA1) AT&T(NJ1) Princeton (NJ2) Base Measurements from 4 different location (AS) simulated 4 gateway: California (CA): AT&T + ACIRI New Jersey (NJ): AT&T + Princeton
5
Data and Measurements: Destinations Obtaining a representative sets of web servers + weights (derived from proxy-log) Aciri (CA2) AT&T(CA1) AT&T(NJ1) Princeton (NJ2)
6
Data and Measurements: RTT Data: Weekly RTT (SYN) ( End to End (path+server)) Hourly measurements 35,124 servers Once-a-minute weighted sample measurements 100 servers Aciri (CA2) AT&T(CA1) AT&T(NJ1) Princeton(NJ2)
7
Degradation: Definition Deviation from minimum recorded RTT (propagation delay) Discrete degradation levels 1-6. time (ms)Level 50+1 +1002 +2003 +4004 +8005 +16006
8
Objective: Avoiding degradation ? Attempt to reroute through a different gateway Two conditions have to hold Need to be able to predict the failure from a gateway Need to have a substitute gateway (low correlation between gateways) Blackout (consecutive degradation) through one gateway
9
Blackout durations Longer duration, easier to predict. Majority of blackouts are short 1-3 consecutive points However, considerable fraction occurs in longer durations. Long duration blackout
10
Gateways Correlation Gateways are correlated but often the correlation is not too strong
11
Gateways Correlation Longer blackouts more likely to be shared – failure closer to the server Majority of 2-gateways blackouts involved same-coast pairs
12
Building predictors For a given degradation level l. Prediction per IP. Input: Previous RTT Measurements for the IP- address. Output: probability for a failure Predict “failure” if probability > Ф
13
Precision \ Recall Methodology Predicted degraded Actual degraded Precision = Predicted degraded Actual degraded & Predicted Degraded Recall = Actual degraded Actual degraded & Predicted Degraded
14
Precision-recall curve Sweep the threshold Ф in [0,1] to obtain a precision-recall curve. In other words, let P(t) the predicted failure probability at time t
15
What is important for prediction? Recency principle –The more recent RTTs are more important. Quantity Principle –The more measurements the higher the accuracy.
16
Recency Principle : Importance Test case: Single measurement predictor – predict according to a measurement x-minute ago. –observe the change in the quality of the prediction. 15% different between using the last minute measurement or the 15 minutes ago measurement NJ-1 failure level 3 recall(=precision) NJ-2 failure level 6 recall(=precision) Minute ago 0.520.331 0.490.312 0.480.294 0.460.287 0.450.2710 0.440.2615
17
Quantity Principle: Importance Test case: Fixed-Window-Count (FWC) –the prediction is the fraction of failures in the W most recent measurements By quantity we can achieve better precision for high recall FWC 1 FWC 5 FWC 10 FWC 50
18
Our predictors –Exponential Decay –Polynomial Decay –Model based Predictors: VW-cover : Variable Window Cover algorithm HMM : Hidden Markov Model
19
Exponential-decay predictors The weight of each measurement is exponentially decreasing with its age by factor λ. For consecutive measurements: –Binary variable f t represents a failure at time t. In general,
20
Polynomial-decay predictors Exact computation required to maintaining the complete history. We approximated it.
21
The VW-Cover predictor Consists of a list of pairs ( a 1, b 1 ) ( a 2, b 2 ) …( a n, b n ) Predict a failure if exist i such that there are at least b i failures among previous a i measurements
22
VW-Cover predictor: Building Build the predictor greedily to cover the failures. Use a learning set of measurements –Pick ( a 1, b 1 ) to be the pair which maximizes precision –Pick ( a i, b i ) to be the pair which maximizes precision among uncovered failures
23
Hidden Markov Model Finite set states S (we use 3 states) Output probability a s (0),a s (1) Transition function, determines the probability distribution of the next state. The probability for a failure: Where p s (t) is the probability to be at state s at time t. P s (t) is updated according to the output of time t-1.
24
Experimental Evaluation
25
A recall 0.5 precision close to 0.9 Predictor Performance – Level 3 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM
26
Predictor Performance – Level 6 Degradation of level-6 are harder to predict: recall 0.5 precision 0.4 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM
27
Predictor Performance: Conclusion The best predictors in level 3 and 6 are VW-cover and HMM But they only slightly outperform ExpDecay 0.95 which is considerable simpler to implement
28
Gateway Selection Static: IP Gateway VW- Cover ExpDecay 0.95 Optimal Worst Gateway Best Gateway 0.86%0.49%0.52%0.08%3.29%1.15% Level 6 Static: IP Gateway VW- Cover ExpDecay 0.95 Optimal Worst Gateway Best- Gateway 2.41%1.50%1.56%0.45%5.77%3.45% Level 3
29
Gateway Selection: Conclusion Active gateway selection resulted in 50% reduction in the degradation-rate with respect to best single gateway. Static gateway selection can avoid at most 25% of degradations. Again ExpDecay 0.95 only slightly under perform the best predictor (VW-cover).
30
Performance of gateway selection as a function of recency
31
Correlation between coast Gateway selection on same-coast pair resulted only in 10% reduction. Chose independent gateways CA-2 NJ-2 NJ-2 NJ-1 Best Predictor Best gateway Best Predictor Best gateway level 0.54%1.15%1.05%1.15%6 1.78%3.45%3.05%3.45%3
32
Controlling prediction overhead Type of measurements: –Active measurements : initiate probes (SYN,ping,HTTP request). Scalability problem. –Passive measurements: collected on regular traffic Controlling the prediction overhead: –Using less-recent measurements –Active measurements only to small set of destinations, which cover the majority of traffic. –Cluster destinations. The measurements of one destination can be used to predict another.
33
Questions ?? natali@cs.tau.ac.il edith@research.att.com haimk@cs.tau.ac.il mansour@cs.tau.ac.il
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.