Download presentation
Presentation is loading. Please wait.
Published byPhilippa Allison Modified over 8 years ago
1
Shuihai Hu, Wei Bai, Kai Chen, Chen Tian (NJU), Ying Zhang (HP Labs), Haitao Wu (Microsoft) Sing Group @ Hong Kong University of Science and Technology Providing Bandwidth Guarantees, Work Conservation and Low Latency Simultaneously in the Cloud 1 IEEE INFOCOM 2016, San Francisco, USA
2
Today’s public cloud is shared by multiple tenants running various applications. Multi-tenant Public Cloud 2
3
Bandwidth guarantees for throughput- intensive applications – In public cloud, network is shared by all tenants. – Bandwidth guarantee can offer predictable performance for tenants’ applications. Three Goals 3
4
Bandwidth guarantees for throughput- intensive applications Work conservation to fully utilize network bandwidth – Tenants can make use of spare bandwidth from unallocated or underutilized bandwidth guarantee – Significantly improve performance. 4 Three Goals
5
Bandwidth guarantees for throughput- intensive applications Work conservation to fully utilize the network bandwidth Low latency for latency-sensitive short messages – Improve the performance of user-facing applications. 5 Three Goals
6
Related Works 6 Oktopus [SIGCOMM’11] – Provides bandwidth guarantees but not work conserving. EyeQ [NSDI’13] – Requires the network core to be congestion-free. ElasticSwitch [SIGCOMM’13] – A fundamental tradeoff between bandwidth guarantees and work conservation. Silo [SIGCOMM’15] – Cannot achieve work conservation due to the tradeoff between work conservation and low latency
7
Related Works 7 Oktopus [SIGCOMM’11] – Provides bandwidth guarantees but not work conserving. EyeQ [NSDI’13] – Requires the network core to be congestion-free. ElasticSwitch [SIGCOMM’13] – A fundamental tradeoff between bandwidth guarantees and work conservation. Silo [SIGCOMM’15] – Cannot achieve work conservation due to the tradeoff between work conservation and low latency
8
Related Works Oktopus [SIGCOMM’11] – Provides bandwidth guarantees but not work conserving. EyeQ [NSDI’13] – Requires the network core to be congestion-free. ElasticSwitch [SIGCOMM’13] – A fundamental tradeoff between bandwidth guarantees and work conservation. Silo [SIGCOMM’15] – Cannot achieve work conservation due to the tradeoff between work conservation and low latency. 8
9
Question How to provide bandwidth guarantees, work conservation and low latency simultaneously in commodity data centers? 9
10
Design Goal 1 10 Eliminate the tradeoff between bandwidth guarantee and work conservation
11
How to provide bandwidth guarantees, work conservation and low latency simultaneously in commodity data centers? Design Goal 2 11 Eliminate the tradeoff between work conservation and low latency
12
How to provide bandwidth guarantees, work conservation and low latency simultaneously in commodity data centers? Design Goal 3 12 Readily-deployable: Work with existing commodity switches & be compatible with legacy network stacks
13
Question How to provide bandwidth guarantees, work conservation and low latency simultaneously without any tradeoff in commodity data centers? Our answer: Trinity 13
14
Trinity’S DESIGN 14
15
At the core, Trinity is an in-network isolation solution. Design Overview 15 low priority high priority VM1 Sender Module VM3 Sender Module Receiver Module VM2 VM4 end-hostsnetworkend-hosts
16
Differentiate traffic of bandwidth guarantees from that of work conservation with two colors at the sender. Design Overview 16 low priority high priority VM1 Sender Module VM3 Sender Module Receiver Module VM2 VM4 end-hostsnetworkend-hosts
17
Enforce strict priority queueing with two priority queues to decouple bandwidth guarantee traffic from work-conserving traffic. Design Overview 17 low priority high priority VM1 Sender Module VM3 Sender Module Receiver Module VM2 VM4 end-hostsnetworkend-hosts
18
Send congestion feedback to senders and handle possible packet-reordering problem. Design Overview 18 low priority high priority VM1 Sender Module VM3 Sender Module Receiver Module VM2 VM4 end-hostsnetworkend-hosts
19
With in-network isolation, low latency for short flows can be easily achieved. – Set a threshold to identify short flows at the sender module. – According to this threshold, Sender module colors the first few packets of every new flow as green to assign high priority to packets of short flows. – Threshold can be statically set as a few or tens of KBs, or also leverage some advanced thresholding schemes (e.g., PIAS) to dynamically adjust this threshold. Design Overview 19
20
Sender module will color part of the packets as red when application demand is larger than bandwidth guarantee. 20 low priority high priority VM1 Sender Module VM3 Sender Module VM2 VM4 networkend-hosts Simple Example Illustrating Trinity end-hosts Receiver Module
21
Red packets will enter the low priority queue and can utilize spare bandwidth when the high priority is empty. 21 low priority high priority VM1 Sender Module VM3 Sender Module VM2 VM4 networkend-hosts Simple Example Illustrating Trinity end-hosts
22
Sender module tracks per-flow information and ensure packets of short flows are colored as green. 22 low priority high priority VM1 Sender Module VM3 Sender Module VM2 VM4 networkend-hosts 2 2 2 2 Simple Example Illustrating Trinity end-hosts 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Receiver Module Threshold to identify short flows
23
In the network, packets of short flows will enter high priority queue and experience little queueing delay (thus achieve low latency). 23 low priority high priority VM1 Sender Module VM3 Sender Module VM2 VM4 networkend-hosts 2 2 Simple Example Illustrating Trinity end-hosts 1 1 1 1 1 1 1 1 1 1 Receiver Module 2 2 1 1 1 1 1 1 1 1 1 1
24
24 Issue #1: How to do efficient rate control for good work-conservation? Solution: Adopt ECN marking in network to assist rate control. Issue #2: how to handle packet trapping problem? Solution: The receivers detect the possible packet trapping problems and notify the senders. Issue #3: how to handle packet re-ordering problem? Solution: introduce a color transition delay at the sender and adopt a re-sequencing buffer at the receiver. Handle Design Issues Refer to our paper for more details.
25
Testbed Experiments Trinity prototype – https://github.com/baiwei0427/Trinity https://github.com/baiwei0427/Trinity Testbed Setup – Two Gigabit Pronto-3295 switches – 16 Dell servers Compared schemes – No Protection – Static Reservation – ElasticSwitch 25
26
Experiment-1: Bandwidth Guarantees and Work Conservation 26 L = 1Gbps Bottleneck link A2A1 B2B1 … Both tenants are provisioned with 300Mbps guarantees. A1 sends traffic to A2 using one TCP connection. B1 sends traffic to B2 using different numbers of TCP connections.
27
Experiment-1: Bandwidth Guarantees and Work Conservation 27 No protection scheme cannot provide any bandwidth guarantee. Average throughput of VM A2 under four schemes
28
Experiment-1: Bandwidth Guarantees and Work Conservation 28 Static reservation cannot utilize any spare bandwidth. Average throughput of VM A2 under four schemes
29
Experiment-1: Bandwidth Guarantees and Work Conservation 29 Average throughput of VM A2 under four schemes ElasticSwitch provides bandwidth guarantees and utilizes part of the spare bandwidth.
30
30 Trinity provides bandwidth guarantees and fully utilizes all the spare bandwidth. Experiment-1: Bandwidth Guarantees and Work Conservation Average throughput of VM A2 under four schemes
31
31 L = 1Gbps Bottleneck link A2A1 B2B1 … Three tenants A, B and C are provisioned with 200Mbps, 400Mbps and 400Mbps guarantee, respectively. A1 sends 1KB or 20KB short flows to A2 periodically. B1 and C1 send long flows to B2 and C2, respectively. C2C1 … Experiment-2: Low Latency for Short Flows
32
Compared to ElasticSwitch, Trinity reduces the FCT by 33% on average and by 71% at the 99 th percentile for 1KB short flows 32 Flow completion time (FCT) of short flows Experiment-2: Low Latency for Short Flows
33
Compared to ElasticSwitch, Trinity reduces the FCT by 38% on average and by 70% at the 99 th percentile for 20KB short flows 33 Flow completion time (FCT) of short flows Experiment-2: Low Latency for Short Flows
34
By letting packets of short flows receive high priority in the network, Trinity can improve the FCT of short flows significantly. 34 Flow completion time (FCT) of short flows Experiment-2: Low Latency for Short Flows
35
Conclusion Three goals – Bandwidth Guarantee – Work Conservation – Low Latency Core of Trinity design: in-network isolation. – End-host coloring. – In-network prioritizing. 35
36
Thanks! 36
37
Backup slides 37
38
Persistent Connections Solution: periodically reset flow states based on more behaviors of traffic – When a flow idles for some time, we reset the bytes sent of this flow to 0. – Define a flow as packets demarcated by incoming packets with payload within a single connection 38
39
Sender module will color all the packets as green when application demand is no larger than bandwidth guarantee. 39 low priority high priority VM1 Sender Module VM3 Sender Module VM2 VM4 networkend-hosts Simple Example Illustrating Trinity end-hosts Receiver Module
40
Green packets will enter high priority queue and enjoy guaranteed throughput in the network. 40 low priority high priority VM1 Sender Module VM3 Sender Module VM2 VM4 networkend-hosts Simple Example Illustrating Trinity end-hosts
41
41 Packet Re-ordering Problem Long flows may suffer from packet re-ordering problem when the color of its packets alternates from red to green. 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6
42
42 Rate Control Algorithm Too aggressive rate control will lead to a lot of packet drops and hence hurt the performance of TCP flows.
43
43 Rate Control Algorithm Too conservative rate control will lead to poor work-conservation and wastes a lot of spare bandwidth in the network. Spare bandwidth Time Rate guaranteed bandwidth
44
44 Rate Control Algorithm Ideal rate control should keep the occupancy of low priority queue at a moderate range to fully utilize spare bandwidth and avoid packet drops. Spare bandwidth Time Rate guaranteed bandwidth
45
45 Rate Control Algorithm Receiver Module Sender Module Adopt ECN marking in network to assist rate control. – Receiver module sends ECN-based congestion information to sender module periodically. – Sender module update the current sending rate according to the congestion feedback. low priority high priority Congestion feedback ECN marking threshold
46
46 Problem #2: how to handle packet trapping problem?
47
47 Packet Trapping Problem Red packets will get trapped in the network when there is no spare bandwidth (green packets already fully utilize the link capacity). get trapped in the low priority queue
48
48 Packet Trapping Problem Receiver Module Sender Module Receiver module sends packet trapping notification to sender module when no red packets is received in the last time period. Sender module reduces the sending rate of red packets to a small value after the packet trapping problem is confirmed. low priority high priority Packet trapping notification
49
49 Problem #3: how to handle packet re- ordering problem?
50
50 Packet Re-ordering Problem Long flows may suffer from packet re-ordering problem when the color of its packets alternates from red to green. 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 3 4 4 5 5 6 6
51
51 Packet Re-ordering Problem At the sender, we introduce a color transition delay (denoted as τ) to mitigate packet re- ordering problem. – When there is a need to change the colors of packets from red to green, we defer the change by τ seconds. Reserve some additional time for the red packets to transmit. Seek the opportunity for some other flows without re- ordering issue to consume guarantee quotas instead.
52
52 Packet Re-ordering Problem At the sender, we introduce a color transition delay (denoted as τ) to minimize the case that packets of a flow alternate from red back to green. At the receiver, we adopt a re-sequencing buffer to absorb possible out-of-order packets.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.