AMP: A Better Multipath TCP for Data Center Networks Morteza Kheirkhah University of Edinburgh UK January 2017
Two Key Characteristics of Data Centre Networks (DCN) Diverse applications with diverse communication patterns and requirements Some apps are bandwidth hungry (online file storage) Other apps are latency sensitive (online search) Short flow dominance Majority of flows are short-lived with deadline in flow completion time (FCT). Majority of data volumes come from a few long flows. Data centers exhibit a highly dynamic network
Two Types of Network Congestion Persistent Congestion: Two or more long flows collide on a link (due to poor load-splitting of ECMP routing) Low overall network throughput Transient Congestion: Many short flows collide on a link High queuing delays and packet drop probability Latency sensitive short flows miss their deadlines
Persistent Congestion Existing Approaches Transient Congestion DCTCP (SIGCOMM ’10) D2TCP (SIGCOMM ’12) Persistent Congestion MPTCP (SIGCOMM ’11) Hedera (NSDI ’10) XMP (coNEXT ’13) Low latency for short flows High throughput for long flows XMP attempts to provide a good balance between the latency-throughput trade-off
Problems with XMP TCP Incast Last Hop Unfairness (LHU) XMP is not robust against the TCP Incast problem More subflows -> more packets -> buffer easily overflow -> higher chance of experiencing a retransmission timeout Last Hop Unfairness (LHU) XMP does not preserve network fairness when all its subflows compete with single-path flows at a shared link More subflows -> more throughput These problems are not only linked to XMP, they exist in any ECN-based variants of MPTCP
Incast (showcase) Multipath protocols use 4 subflows Flow size is 128KB Link rate 10Gbps Link delay 5us (RTT 20us) Switch buffer size is 100 packets The y-axies is log-scaled Data Center MultiPath (DCM) is another ECN-Capable MPTCP variants that we also proposed in this paper. The idea is to combine MPTCP with DCTCP. Multipath protocols complete their flows by 1-2 orders of magnitude longer than DCTCP
Last Hop Unfairness (example) Let’s assume: Propagation delay is zero Marking threshold (K) at switches sets to 4 packets (K=4) Minimum congestion window size sets to one packet (cwndmin=1) Normal situation Two single-path flows share the link fairly. Each flow generating two packets per RTT on average Persistent buffer inflation A new arriving packet always finds the queue size equal to K. Each flow is thus forced to reduce its cwnd to one packet Last Hop Unfairness The multipath flow (S5) with 4 subflows sending four times more packets than single-path flows The LHU leads to severe unfairness and escalates the likelihood of persistent buffer inflation significantly
Last Hop Unfairness (showcase) 8 DCTCP flows and one XMP flows As the number of XMP’s subflows increases, the impact of LHU problem increases
AMP: Adoptive Multipath TCP Key observation: When all subflows of a multipath flow have the smallest cwnd value, it is a good indicator that the subflows are at the same bottleneck link AMP’s subflow suppression/release algorithms: Suppression: When the minimum window state across all subflows remains for a small time period (e.g., 2 RTTs), AMP deactivate all subflows but one Release: When AMP no longer receives ECN-marked packets for some time period (e.g., 8 RTTs), it reactivates all suspended subflows AMP behaves like a single-path flow once it detects an Incast-like condition
AMP Simplifies Congestion Control Operation We make two key observations: RTT measurements of subflows are unnecessary for updating their cwnd (when ECN is used in a DCN) DCTCP-like window reduction slows down traffic shifting
AMP is robust against the TCP incast problem AMP under Incast Flow Size of 128KB AMP is robust against the TCP incast problem
AMP under LHU No. of multipath flows = 1 No. of subflow = 4 No LHU Severe LHU
Summary Existing multipath congestion control schemes fail to handle: The TCP incast problem that causes temporal switch buffer overflow due to synchronized traffic arrival The last hop unfairness that causes persistent buffer inflation and serious unfairness We designed AMP to effectively overcome these problems: AMP adoptively switches its operation between a multiple-subflow and single-subflow mode
AMP Paper and Source Code NS-3 implementation: https://github.com/mkheirkhah
Thank You!