On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1.

Slides:



Advertisements
Similar presentations
Fast Data at Massive Scale Lessons Learned at Facebook Bobby Johnson.
Advertisements

Interconnection Networks: Flow Control and Microarchitecture.
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Modified by Feng.
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
1 Updates on Backward Congestion Notification Davide Bergamasco Cisco Systems, Inc. IEEE 802 Plenary Meeting San Francisco, USA July.
Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
60 GHz Flyways: Adding multi-Gbps wireless links to data centers
Bertha & M Sadeeq.  Easy to manage the problems  Scalability  Real time and real environment  Free data collection  Cost efficient  DCTCP only covers.
Designing Networks with Little or No Buffers or Can Gulliver Survive in Lilliput? Yashar Ganjali High Performance Networking Group Stanford University.
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
Understanding TCP Incast Throughput Collapse in Datacenter Network Offense: Carly Ho Ning Xia.
Datacenters CS 168, Fall 2014 Sylvia Ratnasamy
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
Information-Agnostic Flow Scheduling for Commodity Data Centers
CONGA: Distributed Congestion-Aware Load Balancing for Datacenters
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall,
Internet Traffic Management Prafull Suryawanshi Roll No - 04IT6008.
Practical TDMA for Datacenter Ethernet
1 Performance Evaluation of Ring- based Peer-to-Peer Virtual Private Network (RING-P2P-VPN) Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Mohammad Alizadeh Stanford University Joint with: Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, Masato Yasuda HULL: High bandwidth, Ultra Low-Latency.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
Internet Traffic Management. Basic Concept of Traffic Need of Traffic Management Measuring Traffic Traffic Control and Management Quality and Pricing.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
B 李奕德.  Abstract  Intro  ECN in DCTCP  TDCTCP  Performance evaluation  conclusion.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 Jayaram Mudigonda, HP Labs Praveen Yalagandula, HP Labs Mohammad Al-Fares, UCSD Jeff Mogul,
Congestion Control in CSMA-Based Networks with Inconsistent Channel State V. Gambiroza and E. Knightly Rice Networks Group
1 Modeling and Performance Evaluation of DRED (Dynamic Random Early Detection) using Fluid-Flow Approximation Hideyuki Yamamoto, Hiroyuki Ohsaki Graduate.
Datacenter Network Simulation using ns3
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,
1 On Scalable Edge-based Flow Control Mechanism for VPN Tunnels --- Part 2: Scalability and Implementation Issues Hiroyuki Ohsaki Graduate School of Information.
Presto: Edge-based Load Balancing for Fast Datacenter Networks
1 On Dynamic Parallelism Adjustment Mechanism for Data Transfer Protocol GridFTP Takeshi Itou, Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Windows Azure Conference 2014 LAMP on Windows Azure.
Subways: A Case for Redundant, Inexpensive Data Center Edge Links Vincent Liu, Danyang Zhuo, Simon Peter, Arvind Krishnamurthy, Thomas Anderson University.
Data Center Load Balancing T Seminar Kristian Hartikainen Aalto University, Helsinki, Finland
CATNIP – Context Aware Transport/Network Internet Protocol Carey Williamson Qian Wu Department of Computer Science University of Calgary.
CQRD: A Switch-based Approach to Flow Interference in Data Center Networks Guo Chen Dan Pei, Youjian Zhao Tsinghua University, Beijing, China.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
TCP Traffic Characteristics—Deep buffer Switch
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.
SCIENCE_DMZ NETWORKS STEVE PERRY, DIRECTOR OF NETWORKS UNM PIYASAT NILKAEW, DIRECTOR OF NETWORKS NMSU.
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
6.888: Lecture 4 Data Center Load Balancing Mohammad Alizadeh Spring
Scalable Congestion Control Protocol based on SDN in Data Center Networks Speaker : Bo-Han Hua Professor : Dr. Kai-Wei Ke Date : 2016/04/08 1.
HULA: Scalable Load Balancing Using Programmable Data Planes
Data Center TCP (DCTCP)
Incast-Aware Switch-Assisted TCP Congestion Control for Data Centers
Resilient Datacenter Load Balancing in the Wild
6.888 Lecture 5: Flow Scheduling
How I Learned to Stop Worrying About the Core and Love the Edge
OTCP: SDN-Managed Congestion Control for Data Center Networks
ECE 544: Traffic engineering (supplement)
Congestion-Aware Load Balancing at the Virtual Edge
Microsoft Research Stanford University
Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan
Augmenting Proactive Congestion Control with Aeolus
湖南大学-信息科学与工程学院-计算机与科学系
Congestion Control in Software Define Data Center Network
Carnegie Mellon University, *Panasas Inc.
AMP: A Better Multipath TCP for Data Center Networks
Centralized Arbitration for Data Centers
Data Center Networks Mohammad Alizadeh Fall 2018
Lecture 16, Computer Networks (198:552)
Congestion-Aware Load Balancing at the Virtual Edge
Lecture 17, Computer Networks (198:552)
Presentation transcript:

On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1

Datacenter Networks s of server ports  Must complete transfers or “flows” quickly need very high throughput, very low latency web app db map- reduce HPC monitoring cache

Leaf-Spine DC Fabric 3 Approximates ideal output-queued switch H1 H2 H3 H4 H5 H6 H7 H8 H9 H1 H2 H3 H4 H5 H6 H7 H8 H9 Leaf switches Spine switches ≈ How close is Leaf-Spine to ideal OQ switch? What impacts its performance? – Link speeds, oversubscription, buffering

Methodology 4 Widely deployed mechanisms – TCP-Reno+Sack – DropTail (10MB shared buffer per switch) – ECMP (hash-based) load balancing OMNET++ simulations – 100×10Gbps servers (2-tiers) – Actual Linux TCP stack Realistic workloads – Bursty query traffic – All-to-all background traffic: web search, data mining Metric: Flow completion time

Impact of Link Speed 5 Three non-oversubscribed topologies: 20×10Gbps Downlinks 20×10Gbps Uplinks 20×10Gbps Downlinks 5×40Gbps Uplinks 20×10Gbps Downlinks 2×100Gbps Uplinks

Better Impact of Link Speed 6 40/100Gbps fabric: ~ same FCT as OQ 10Gbps fabric: FCT up 40% worse than OQ

Intuition Higher speed links improve ECMP efficiency 7 20×10Gbps Uplinks 2×100Gbps Uplinks 11×10Gbps flows (55% load) Prob of 100% throughput = 3.27% Prob of 100% throughput = 99.95%

Impact of Buffering 8 Where do queues build up? 2.5:1 oversubscribed fabric with large buffers Spine port 24% load 60% load Leaf front-panel port Leaf fabric port Leaf fabric ports queues ≈ Spine port queues – 1:1 port correspondence; same speed & load

Impact of Buffering Where are large buffers more effective for Incast? 9 Better Larger buffers better at Leaf than Spine

Summary 40/100Gbps fabric + ECMP ≈ OQ-switch; some performance loss with 10Gbps fabric Buffering should generally be consistent in Leaf & Spine tiers Larger buffers more useful in Leaf than Spine for Incast 10

11