Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.

Similar presentations


Presentation on theme: "Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella."— Presentation transcript:

1 Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella

2 Latency-critical applications in data centers  Guaranteeing low end-to-end latency is important  Web search (e.g., Google’s instant search service)  Retail advertising  Recommendation systems  High-frequency trading in financial data centers  Operators want to troubleshoot latency anomalies  End-host latencies can be monitored locally  Detection, diagnosis and localization through a network: no native support of latency measurements in a router/switch

3 Prior solutions  Lossy Difference Aggregator (LDA)  Kompella et al. [SIGCOMM ’09]  Aggregate latency statistics  Reference Latency Interpolation (RLI)  Lee et al. [SIGCOMM ’10]  Per-flow latency measurements More suitable due to more fine-grained measurements

4 Deployment scenario of RLI  Upgrading all switches/routers in a data center network  Pros  Provide finest granularity of latency anomaly localization  Cons  Significant deployment cost  Possible downtime of entire production data centers  In this work, we are considering partial deployment of RLI  Our approach: RLI across Routers (RLIR)

5 Overview of RLI architecture  Goal  Latency statistics on a per-flow basis between interfaces  Problem setting  No storing timestamp for each packet at ingress and egress due to high storage and communication cost  Regular packets do not carry timestamps Router Ingress I Egress E

6 Overview of RLI architecture  Premise of RLI: delay locality  Approach 1) The injector sends reference packets regularly 2) Reference packet carries ingress timestamp 3) Linear interpolation: compute per-packet latency estimates at the latency estimator 4) Per-flow estimates by aggregating per-packet estimates Latency Estimator Reference Packet Injector Ingress IEgress E 12 21 R L Delay Time R L 1 Linear interpolation line 2 Interpolated delay

7 Full vs. Partial deployment  Full deployment: 16 RLI sender-receiver pairs  Partial deployment: 4 RLI senders + 2 RLI receivers  81.25 % deployment cost reduction Switch 1Switch 5 Switch 2Switch 4 Switch 3 Switch 6 RLI Sender (Reference Packet Injector)RLI Receiver (Latency Estimator)

8 Case 1: Presence of cross traffic  Issue: Inaccurate link utilization estimation at the sender leads to high reference packet injection rate  Approach  Not actively addressing the issue  Evaluation shows no much impact on packet loss rate increase  Details in the paper Switch 1Switch 5 Switch 2Switch 4 Switch 3 Switch 6 RLI Sender (Reference Packet Injector)RLI Receiver (Latency Estimator) Link utilization estimation on Switch 1 Bottleneck Link Cross Traffic

9 Case 2: RLI Sender side  Issue: Traffic may take different routes at an intermediate switch  Approach: Sender sends reference packets to all receivers Switch 1Switch 5 Switch 2Switch 4 Switch 3 Switch 6 RLI Sender (Reference Packet Injector)RLI Receiver (Latency Estimator)

10 Case 3: RLI Receiver side  Issue: Hard to associate reference packets and regular packets that traversed the same path  Approaches  Packet marking: requires native support from routers  Reverse ECMP computation: ‘reverse’ engineer intermediate routes using ECMP hash function  IP prefix matching at limited situation Switch 1Switch 5 Switch 2Switch 4 Switch 3 Switch 6 RLI Sender (Reference Packet Injector)RLI Receiver (Latency Estimator)

11 Deployment example in fat-tree topology RLI Sender (Reference Packet Injector)RLI Receiver (Latency Estimator) IP prefix matching Reverse ECMP computation / IP prefix matching

12 Evaluation  Simulation setup  Trace: regular traffic (22.4M pkts) + cross traffic (70M pkts)  Simulator  Results  Accuracy of per-flow latency estimates Traffic Divider Switch1Switch2 Cross Traffic Injector Packet Trace RLI Receiver RLI Sender Cross Traffic Regular Traffic Reference packets 10% / 1% injection rate

13 67% Accuracy of per-flow latency estimates 10% injection 1% injection 10% injection 1% injection Bottleneck link utilization:93% Relative error CDF 1.2%4.5% 18%31%

14 Summary  Low latency applications in data centers  Localization of latency anomaly is important  RLI provides flow-level latency statistics, but full deployment (i.e., all routers/switches) cost is expensive  Proposed a solution enabling partial deployment of RLI  No too much loss in localization granularity (i.e., every other router)

15 Thank you! Questions?


Download ppt "Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella."

Similar presentations


Ads by Google