Download presentation
Presentation is loading. Please wait.
Published byGordon Satterthwaite Modified over 9 years ago
1
Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese
2
Trend toward low-latency networks Low latency: one of important metrics in designing a network Switch vendors introduce switches that provide low latency Financial data center begins to demand more stringent latency
3
Low latency Benefits of low-latency networks An automated trading program can buy shares cheaply A cluster application can run 1000’s more instructions Financial Service Provider Network Content Provider Brokerage Our network provides E-to-E latency SLA of a few μ seconds
4
But… Guaranteeing low latency in data centers is hard Congestion needs to be less than a certain level Reason 1: No traffic models for different applications Hinders managers from predicting offending applications Reason 2: New application’s behavior is often unforeseen until it is actually deployed E.g., TCP incast problem [SIGCOMM ’09]
5
Latency & loss measurements are crucial Need latency & loss measurements on a continuous basis Detect problems Fix: re-routing offending application, upgrading links, etc. Goal: Providing fine-grained end-to-end aggregate latency and loss measurements in data center environments AB Content Provider Brokerage E-to-E latency and loss measurements
6
Content Provider Brokerage Measurement model Out-of-order packet delivery due to multiple paths Packet filtering associates packet stream between A and B Time synchronization: IEEE 1588, GPS clock, etc. No header changes: Regular packets carry no timestamp Financial Service Provider Network AB … Multiple paths Out-of-order delivery Brokerage Filter
7
Measurement model Interval message: A special ‘sync’ control packet to mark off a measurement interval Injected by measurement modules at an edge (e.g., Router A) Measurement interval: A set of packets ‘bookended’ by a pair of interval messages AB Financial Service Provider Network Content Provider Brokerage Filter Router A Router B Interval Message Interval Message Measurement Interval
8
Existing solutions Active probes Problem: Not effective due to huge probe rate requirement Storing timestamps and packet digests locally Problem: Significant overhead for communication Packet sampling: Trade-off between accuracy and overhead Lossy Difference Aggregator (LDA) [Kompella, SIGCOMM ’09] State-of-the-art solution with FIFO packet delivery assumption Problem: Not suitable in case where packets can be reordered
9
LDA in packet loss case Key point: Only useful buckets must be used for estimation A useful bucket: a bucket updated by the same set of packets at A and B Bad packets: lost packets to corrupt buckets Router A Router B Hash 1 1 123 Bucket 1 3 X 1 2 59 2 6 2 12 711 2 9 1 True delay = Corrupted bucket (3 – 1) 3 = 3.3 + (11 – 7) + (9 – 5) Estimated delay = 12 – 6 2 = 3 Interval Message Estimation error = 9% Packet count Timestamp sum
10
LDA in packet loss + reordering case Problem: LDA confounds loss and reordering Packet count match in buckets between A and B is insufficient Reordered packets are also bad packets Significant error in loss and aggregate latency estimation Router A Router B Hash 1 1 123 1 3 X 1 2 59 2 6 2 12 711 2 9 1 13 2 24 No reordering Reordering True delay = 3.3 Estimated delay = 12 + 24 – 6 – 9 4 = 5.25 Estimation error = 59% Freeze buckets after update True delay = 3.3 Packet count Timestamp sum Freeze buckets
11
Quick fix of LDA: per-path LDA Let LDA operate on a per-path basis Exploit the fact that packets in a flow are not reordered by ECMP Issues (1) Associating a flow with a path is difficult (2) Not scalable: potentially need to handle millions of separate TCP flows
12
Packet reordering in IP networks Today’s trend No reordering among packets in a flow No reordering across flows between two interfaces New trend: Data centers exploit the path diversity ECMP splits flows across multiple equal-cost paths Reordering can occur across flows Future direction: Switches may allow reordering within switches for improved load balancing and utilization Reordering-tolerant TCP for use in data centers
13
Proposed approach: FineComb Objective Detect and correct unusable buckets Control the number of unusable buckets Key ideas 1) Incremental stream digests: Detect unusable buckets 2) Stash recovery: Make corrupted buckets useful by correction 3) Packet sampling: Control the number of bad packets included
14
Incremental stream digests (ISDs) An ISD = H(pkt 1 ) H(pkt 2 ) … H(pkt k ) is an invertible commutative operator (e.g., XOR) Property 1: Low collision probability Two different packet streams hash to different value Allows to detect corrupted buckets Property 2: Invertibility Easy addition/subtraction of a packet digest from an ISD The basis of stash recovery
15
ISDs handles loss and reordering ISDs detects corrupted buckets by loss and reordering Buckets are usable only if both packet counts and ISDs match each other between A and B Router A Router B Hash 1 1 030403 1 3 X 1 2 06 2 6 2 12 2A 2 9 1 11 10 2 24 True delay = 3.3 03 04 092E 03 092A3A ISDs don’t match Hash value Packet count Timestamp sum ISD
16
Latency and loss estimation Average latency estimation 22 69 2E09 ISD Timestamp sum Packet count Router ARouter B A1 9 322 1224 3A099C 19 1 Delay sum = (12 – 6) 2 Average latency = 3.0 Count = + (0 – 0) + 0 = 6 = 2 Loss count sum = (2 – 2) 2 Loss rate = 0.43 Total packets = + (2 – 2)+ (3 – 1) + 2+ 3 = 3 = 7 Loss estimation
17
Stash recovery Stash: A set of (timestamp, bucket index, hash value) tuple of packets which are potentially reordered (-) stash Contains packets potentially added to a receiver (Router B) In recovery, packet digests are subtracted from bad buckets at a receiver (+) stash Contains packets potentially missing at a receiver (Router B) In recovery, packet digests are added to bad buckets at a receiver
18
Stash recovery A bad bucket can be recovered iff reordered packets corrupted it Reordered packets are not counted as lost packets Increase loss estimation accuracy A bucket in A (–) Stash in B 2122E3343E A bucket in B – 12042323A2292E ISDs don’t match ISDs match 151015 1204 1510 1204 {} {} 15101204 {} All subsets
19
Sizing buckets and stashes Known loss and reordering rates Given a fixed storage size, we obtain the optimal packet sampling rate (p*) We provision stash and buckets based on the the p* Unknown loss and reordering rates Use multiple banks optimized for different set of loss and reordering rate Details can be found in our paper
20
Accuracy of latency estimation Average relative error Reordering rate 1000x difference FineComb: ISD+stash, FineComb-: ISD only Packet loss rate = 0.01%, #packets = 5M, true mean delay = 10 μs
21
Accuracy of loss estimation Average relative error Reordering rate Packet loss rate = 0.01%, #packets = 5M Stash helps to obtain accurate loss estimation
22
Summary Data centers require end-to-end fine-grain latency and loss measurements We proposed a data structure called FineComb Resilient to packet loss and reordering Incremental stream digest detects corrupted buckets Stash recovers buckets only corrupted by reordered packets Evaluation shows FineComb achieves higher accuracy in latency and loss estimation than LDA
23
Thank you! Questions?
24
Backup
25
Microscopic loss estimation Average relative error Reordering rate
26
Handling unknown loss & reordering rates Average relative error Reordering rate LDA: 2-banks, FineComb: 4-banks with same memory size
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.