Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan

Slides:



Advertisements
Similar presentations
Lookback Scheduling for Long-Term Quality-of-Service Over Multiple Cells Hatem Abou-zeid*, Stefan Valentin, Hossam S. Hassanein*, and Mohamed F. Feteiha.
Advertisements

Finishing Flows Quickly with Preemptive Scheduling
B 黃冠智.
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Presented by Shaddi.
Fixing TCP in Datacenters Costin Raiciu Advanced Topics in Distributed Systems 2011.
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.
Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
Congestion control in data centers
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
The War Between Mice and Elephants By Liang Guo (Graduate Student) Ibrahim Matta (Professor) Boston University ICNP’2001 Presented By Preeti Phadnis.
Information-Agnostic Flow Scheduling for Commodity Data Centers
Reduced TCP Window Size for Legacy LAN QoS Niko Färber July 26, 2000.
Ns Simulation Final presentation Stella Pantofel Igor Berman Michael Halperin
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall,
Practical TDMA for Datacenter Ethernet
Mohammad Alizadeh Stanford University Joint with: Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, Masato Yasuda HULL: High bandwidth, Ultra Low-Latency.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1.
B 李奕德.  Abstract  Intro  ECN in DCTCP  TDCTCP  Performance evaluation  conclusion.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,
Analysis of Buffer Size in Core Routers by Arthur Dick Supervisor Anirban Mahanti.
TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail.
CQRD: A Switch-based Approach to Flow Interference in Data Center Networks Guo Chen Dan Pei, Youjian Zhao Tsinghua University, Beijing, China.
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
Analysis and Design of an Adaptive Virtual Queue (AVQ) Algorithm for AQM By Srisankar Kunniyur & R. Srikant Presented by Hareesh Pattipati.
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
Revisiting Transport Congestion Control Jian He UT Austin 1.
Shuihai Hu, Wei Bai, Kai Chen, Chen Tian (NJU), Ying Zhang (HP Labs), Haitao Wu (Microsoft) Sing Hong Kong University of Science and Technology.
Dzmitry Kliazovich, Fabrizio Granelli, University of Trento, Italy
Data Center TCP (DCTCP)
Low-Latency Software Rate Limiters for Cloud Networks
Resilient Datacenter Load Balancing in the Wild
6.888 Lecture 5: Flow Scheduling
How I Learned to Stop Worrying About the Core and Love the Edge
Dzmitry Kliazovich, Fabrizio Granelli, University of Trento, Italy
Topics discussed in this section:
OTCP: SDN-Managed Congestion Control for Data Center Networks
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers Conference Paper in Proceedings of ICC16 By Ahmed M. Abdelmoniem,
ECE 544: Traffic engineering (supplement)
Improving Datacenter Performance and Robustness with Multipath TCP
Congestion-Aware Load Balancing at the Virtual Edge
Joint work with: Yilong Li, Mohammad Alizadeh, John Ousterhout
Augmenting Proactive Congestion Control with Aeolus
Congestion Control in Software Define Data Center Network
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.
AMP: A Better Multipath TCP for Data Center Networks
Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan.
Kevin Lee & Adam Piechowicz 10/10/2009
Congestion Control in SDN-Enabled Networks
Fast Congestion Control in RDMA-Based Datacenter Networks
The War Between Mice & Elephants by, Matt Hartling & Sumit Kumbhar
Centralized Arbitration for Data Centers
Lecture 16, Computer Networks (198:552)
Congestion-Aware Load Balancing at the Virtual Edge
Lecture 17, Computer Networks (198:552)
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Dragonfly+: Low Cost Topology for scaling Datacenters
AMP: An Adaptive Multipath TCP for Data Center Networks
Congestion Control in SDN-Enabled Networks
Horizon: Balancing TCP over multiple paths in wireless mesh networks
2019/10/9 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Jin-Li Ye, Yu-Huang Chu, Chien Chen.
Presentation transcript:

Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan Slytherin: Dynamic, Network-assisted Prioritization of Tail Packets in Datacenter Networks Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan This project is carried on the bits system lab. And I worked with hamed rezai in this project. And its is supervised by Balajee Vamanan. This is a join work with

Diverse datacenter applications Datacenters host diverse applications Foreground applications Respond to user queries (e.g., Web search) Short flows (e.g., < 32 KB) Background applications Update data for foreground applications (e.g., Web crawler) Long flows Some example for foreground applications are …. Substantial diversity in datacenter applications

Multiple Network Objectives Requirements Foreground applications Sensitive to tail (e.g., 99th %-ile) flow completion times Background applications Sensitive to throughput Foreground Background Tail flow completion times Differing Objectives But Its important for a data center networks to be able to fulfil all requirements for these applications. But the challenge is that they have deferring objectives. Solution? Throughput Datacenter network must provide both low tail flow completion times for short flows and high throughput for long flows

Existing state-of-the-art Load balancing Presto [SIGCOMM ’15], Hula [SOSR ’16] Conga [SIGCOMM ’14] Congestion control DCTCP [SIGCOMM ’10], D2TCP [SIGCOMM ’12] Packet scheduling Information-aware pFabric [SIGCOMM ’13] PASE [SIGCOMM ’14] Information-agnostic PIAS [NSDI ‘15] Problem exists even with perfect load balancing Does not specifically target short flows (tail latency) Apriori knowledge of flow sizes 1)There has been a lot of work on data center networks over the last several years broadly speaking, previous work can be categorized into … 2)To the best of our knowledge pias is the most recent info agnostic packet scheduling scheme and its most relevant to our proposal. 3)Congestion control: end to end mechanism, iterative ->slow! doesn’t specifically target short flows Most relevant to our proposal Packet scheduling plays a direct role in the flow completion times of short flows (our focus on this paper)

Shortest Job First (SJF) PIAS aims to implement SJF Practically Improves FCT SJF per switch ≠ SJF for network Optimal for a single switch (PIAS [NSDI ‘15]) NOT optimal for the whole network (pFabric [Sigcomm ‘13]) High Low Short flows Long flows SJF SJF So basically pias tries to send shorter flows to queues with higher priority and longer flows to queues with lower priority. Global vs. local Optimal Not optimal PIAS (SJF per switch) is locally but not globally optimal

Slytherin Key question! Is it possible to identify and prioritize tail packets, to improve the tail flow completion times beyond PIAS? Slytherin We answer this question in the affirmative with our design called Slytherin

Our contributions Key insight: Tail packets often incur congestion at multiple points in the network Slytherin Step I: Targets tail packets Uses ECN marks to identify tail packets Step II: Prioritizes tail packets Uses multi-level queues Drastically reduces network queuing 2x lower queue length OVERAL packets that are more likely to fall in the tail often incur congestion at multiple points in the network Slytherin achieves 19% lower tail flow completion times without losing throughput

Outline Datacenter network objectives Existing proposals Our contributions Idea and Opportunity Slytherin Design Methodology Results Closing remarks This is the outline of my talk High level idea and contributions Going to explain details And then _> experimental methodology And key results

Idea and opportunity High-level idea Packets that incur congestion at multiple points Likely to fall in the tail Worsen tail FCT 2 Num of slots required before dequeue S2 S4 Maximum time slot = 2 1 1 S1 S6 S3 S5

Idea and opportunity High-level idea Packets that incur congestion at multiple points Likely to fall in the tail Worsen tail FCT 2 1 1 Num of slots required before dequeue S2 S4 Maximum time slot = 2 1 1 1 S1 S6 S3 S5

Idea and opportunity (cont.) Could packets marked at multiple points, influence tail? What is the fraction of packets that are marked at more than one switch AND fall in the tail Experimental analysis Explain the axis We ran a typical datacenter workload and observed the fraction of packets that are marked at …. Substantial opportunity at high loads!

Outline Datacenter network objectives Existing proposals Our contributions Idea and Opportunity Slytherin Design Methodology Results Closing remarks 7 min This is the outline of my talk High level idea and contributions Going to explain details And then _> experimental methodology And key results

High level overview Policy intent Mechanism Step I Step II Identify packets that fall in the tail Leverages ECN mark to identify tail packets Step I Step II Prioritizes tail packets Multi-level queue I will talk about steps in details What we do how policy intent , mechanism

Slytherin: Design Slytherin has two steps: Step I: Identify packets that previously incur congestion Use ECN marked packets B C ✓ Mark packet A D S1 S2 S3 S4 S5 S6

Slytherin: Design Slytherin has two steps: Step I: Identify packets that previously incur congestion Use ECN marked packets ✓ This packet previously incured congestion B C ✓ Mark packet A D S1 S2 S3 S4 S5 S6

Slytherin: Design Slytherin has two steps: Step II: Prioritize previously congested packets Use Multi-level queue Supported in today’s datacenter switches Multi-level queue Simple queue High Low Low

Slytherin: Design Slytherin has two steps: Step II: Prioritize previously contented packets Use Multi-level queue for all switches Multi-level queue Multi-level queue High High Low Low B C Multi-level queue Multi-level queue High High Low Low Going back to our original example … A D S1 S2 S3 S4 S5 S6

Slytherin: Design Slytherin has two steps: Step II: Prioritize previously contented packets Use Multi-level queue High High Low Low B C High High Low Low A D S1 S2 S3 S4 S5 S6

Slytherin: Design Slytherin has two steps: Step II: Prioritize previously contented packets Use Multi-level queue High High Low Low B C High High Low Low A D S1 S2 S3 S4 S5 S6

Design Considerations Ack prioritization Ack packets have the highest priority Promote all Ack packets to the queue with higher priority Implementation Most of the switches support multi-level queue Implement Slytherin with some switch rules

Outline Datacenter network objectives Existing proposals Our contributions Idea and Opportunity Slytherin Design Methodology Results Closing remarks

Simulation Methodology Network simulator (ns-3 v3.28) Transport protocol DCTCP as underlying congestion control ECN threshold Threshold 25% of total queue size Workloads Short flows (8 - 32 KB) (web search workload) long flows (1 MB) (data mining) Schemes compared PIAS [NSDI ‘15] DCTCP [SIGCOMM ’10] Mixed of sh and l flows

Topology Two level, leaf-spine topology 400 hosts through 20 leaf switches Connected to 10 spine switches in full mesh 10 Gbps 400 Hosts 10 Spine Switches 20 Leaf Switches Typical in today dc

Tail (99th) percentile FCT Tail FCT (ms) Remember, the key metrics for DataCenter are tail FCT and Throughput There are many results in the paper but I wanna show you three most important of them First result I wanna to show! Axes We show DCTCP with a yellow line, … PIAS with a…. As expected, FCT of all schemes increase with load PIAS achieves lower FCT than DCTCT, matching their paper. Slytherin achieves EVEN lower FCT than PIAS. Also, our improvements are larger at higher loads Gap between increases. Slytherin achieves 18.6% lower 99th percentile flow completion times for short flows.

Average throughput (long flows) Better Throughput At high loads there is more congestion and queuing therefor, long flows achieve lower throughout at high loads. The more the load is the lower the throughput, as there are more packets and more queuing delay. Slytherin just target some packets while PIAS is prioritizing all packets and it should prioritize some useless ones Slytherin achieves an average increase in long flows throughput by about 32%.

CDF of queue length in switches Axes Tor switch + load 40 60% SJF We have lower queue length Load =40% Load =60% Slytherin significantly reduces 99th percentile queue length in switches by about a factor of 2x on average

Conclusion Prior approaches We proposed Slytherin Future work: Leverage SJF to minimize average flow completion times We proposed Slytherin Identifies tail packets and prioritizes them in the network switches Optimizes tail flow completion times Tail flow completion time  most appropriate for many applications Future work: Improve Slytherin’s accuracy in identifying tail packets Efficient switch hardware implementations As data continues to grow at a rapid rate, schemes such as Slytherin that minimize network tail latency will become important

Thanks for the attention mmalek3@uic.edu

Reordering PIAS with limited buffer for each port Reordering PIAS/Slytherin

Sensitivity to threshold

Sensitivity to incast

Slytherin: Design - ACK Prioritize ACK packets Packets will follow Slytherin rules High High Low Low B C High High Low Low A D S1 S2 S3 S1 S2 S3

Tail flow completion time Scenario I: Assume we have 5 short flows as follow Scenario II: which is better? Flow ID Duration 10 1ms 20 5ms 30 2ms 40 50 4ms Average FCT = 1+2+2+5+4 5 Tail FCT = 5 Flow ID Duration 10 2 1 20 6 30 40 50