Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv.

Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University

Outline: Degradation – deviation from “normal” (minimum) RTT. Predicting Degradation: –Different Predictors Performance Evaluation: –Precision/recall methodology Suggested Application: Gateway selection

Motivating Application AS 56 Peering link AS 123 Intelligent Routing device ? Gateway selection (Intelligent Routing device) Choosing peering links AS 12 AS 41

Data and Measurements: Sources Aciri (CA2) AT&T (CA1) AT&T(NJ1) Princeton (NJ2) Base Measurements from 4 different location (AS) simulated 4 gateway: California (CA): AT&T + ACIRI New Jersey (NJ): AT&T + Princeton

Data and Measurements: Destinations Obtaining a representative sets of web servers + weights (derived from proxy-log) Aciri (CA2) AT&T(CA1) AT&T(NJ1) Princeton (NJ2)

Data and Measurements: RTT Data: Weekly RTT (SYN) ( End to End (path+server))  Hourly measurements  35,124 servers  Once-a-minute weighted sample measurements  100 servers Aciri (CA2) AT&T(CA1) AT&T(NJ1) Princeton(NJ2)

Degradation: Definition Deviation from minimum recorded RTT (propagation delay) Discrete degradation levels 1-6. time (ms)Level 50+1 +1002 +2003 +4004 +8005 +16006

Objective: Avoiding degradation ? Attempt to reroute through a different gateway Two conditions have to hold  Need to be able to predict the failure from a gateway  Need to have a substitute gateway (low correlation between gateways) Blackout (consecutive degradation) through one gateway

Blackout durations Longer duration, easier to predict. Majority of blackouts are short 1-3 consecutive points However, considerable fraction occurs in longer durations. Long duration blackout

Gateways Correlation Gateways are correlated but often the correlation is not too strong

Gateways Correlation Longer blackouts more likely to be shared – failure closer to the server Majority of 2-gateways blackouts involved same-coast pairs

Building predictors For a given degradation level l. Prediction per IP. Input: Previous RTT Measurements for the IP- address. Output: probability for a failure Predict “failure” if probability > Ф

Precision \ Recall Methodology Predicted degraded Actual degraded Precision = Predicted degraded Actual degraded & Predicted Degraded Recall = Actual degraded Actual degraded & Predicted Degraded

Precision-recall curve Sweep the threshold Ф in [0,1] to obtain a precision-recall curve. In other words, let P(t) the predicted failure probability at time t

What is important for prediction? Recency principle –The more recent RTTs are more important. Quantity Principle –The more measurements the higher the accuracy.

Recency Principle : Importance Test case: Single measurement predictor – predict according to a measurement x-minute ago. –observe the change in the quality of the prediction.  15% different between using the last minute measurement or the 15 minutes ago measurement NJ-1 failure level 3 recall(=precision) NJ-2 failure level 6 recall(=precision) Minute ago 0.520.331 0.490.312 0.480.294 0.460.287 0.450.2710 0.440.2615

Quantity Principle: Importance Test case: Fixed-Window-Count (FWC) –the prediction is the fraction of failures in the W most recent measurements  By quantity we can achieve better precision for high recall FWC 1 FWC 5 FWC 10 FWC 50

Our predictors –Exponential Decay –Polynomial Decay –Model based Predictors: VW-cover : Variable Window Cover algorithm HMM : Hidden Markov Model

Exponential-decay predictors The weight of each measurement is exponentially decreasing with its age by factor λ. For consecutive measurements: –Binary variable f t represents a failure at time t. In general,

Polynomial-decay predictors Exact computation required to maintaining the complete history. We approximated it.

The VW-Cover predictor Consists of a list of pairs ( a 1, b 1 ) ( a 2, b 2 ) …( a n, b n ) Predict a failure if exist i such that there are at least b i failures among previous a i measurements

VW-Cover predictor: Building Build the predictor greedily to cover the failures. Use a learning set of measurements –Pick ( a 1, b 1 ) to be the pair which maximizes precision –Pick ( a i, b i ) to be the pair which maximizes precision among uncovered failures

Hidden Markov Model Finite set states S (we use 3 states) Output probability a s (0),a s (1) Transition function, determines the probability distribution of the next state. The probability for a failure: Where p s (t) is the probability to be at state s at time t. P s (t) is updated according to the output of time t-1.

Experimental Evaluation

 A recall 0.5 precision close to 0.9 Predictor Performance – Level 3 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM

Predictor Performance – Level 6  Degradation of level-6 are harder to predict: recall 0.5 precision 0.4 FWC10 FWC 50 ExpDecay 0.99 ExpDecay 0.95 VW-Cover HMM

Predictor Performance: Conclusion The best predictors in level 3 and 6 are VW-cover and HMM But they only slightly outperform ExpDecay 0.95 which is considerable simpler to implement

Gateway Selection Static: IP  Gateway VW- Cover ExpDecay 0.95 Optimal Worst Gateway Best Gateway 0.86%0.49%0.52%0.08%3.29%1.15% Level 6 Static: IP  Gateway VW- Cover ExpDecay 0.95 Optimal Worst Gateway Best- Gateway 2.41%1.50%1.56%0.45%5.77%3.45% Level 3

Gateway Selection: Conclusion Active gateway selection resulted in 50% reduction in the degradation-rate with respect to best single gateway. Static gateway selection can avoid at most 25% of degradations. Again ExpDecay 0.95 only slightly under perform the best predictor (VW-cover).

Performance of gateway selection as a function of recency

Correlation between coast Gateway selection on same-coast pair resulted only in 10% reduction.  Chose independent gateways CA-2 NJ-2 NJ-2 NJ-1 Best Predictor Best gateway Best Predictor Best gateway level 0.54%1.15%1.05%1.15%6 1.78%3.45%3.05%3.45%3

Controlling prediction overhead Type of measurements: –Active measurements : initiate probes (SYN,ping,HTTP request). Scalability problem. –Passive measurements: collected on regular traffic Controlling the prediction overhead: –Using less-recent measurements –Active measurements only to small set of destinations, which cover the majority of traffic. –Cluster destinations. The measurements of one destination can be used to predict another.

Questions ?? natali@cs.tau.ac.il edith@research.att.com haimk@cs.tau.ac.il mansour@cs.tau.ac.il

Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv.

Similar presentations

Presentation on theme: "Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv.

Similar presentations

Presentation on theme: "Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv."— Presentation transcript:

Similar presentations

About project

Feedback