* Mellanox Technologies LTD, + Technion - EE Department

Slides:

Advertisements

Similar presentations

Policy-based Congestion Management for an SMS Gateway Alberto Gonzalez (KTH) Roberto Cosenza (Infoflex) Rolf Stadler (KTH) June 8, 2004, Policy Workshop.

Advertisements

RED Enhancement Algorithms By Alina Naimark. Presented Approaches Flow Random Early Drop - FRED By Dong Lin and Robert Morris Sabilized Random Early Drop.

CSIT560 Internet Infrastructure: Switches and Routers Active Queue Management Presented By: Gary Po, Henry Hui and Kenny Chong.

24-1 Chapter 24. Congestion Control and Quality of Service (part 1) 23.1 Data Traffic 23.2 Congestion 23.3 Congestion Control 23.4 Two Examples.

1 Transport Protocols & TCP CSE 3213 Fall April 2015.

Cloud Control with Distributed Rate Limiting Raghaven et all Presented by: Brian Card CS Fall Kinicki 1.

Wavelength Routed Networks Wavelength Assignment Wavelength Conversion Cost Implications Network Modeling.

Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,

1 End to End Bandwidth Estimation in TCP to improve Wireless Link Utilization S. Mascolo, A.Grieco, G.Pau, M.Gerla, C.Casetti Presented by Abhijit Pandey.

1 TCP CSE May TCP Services Flow control Connection establishment and termination Congestion control 2.

PERSISTENT DROPPING: An Efficient Control of Traffic Aggregates Hani JamjoomKang G. Shin Electrical Engineering & Computer Science UNIVERSITY OF MICHIGAN,

The War Between Mice and Elephants LIANG GUO, IBRAHIM MATTA Computer Science Department Boston University ICNP (International Conference on Network Protocols)

Video over ICN IRTF Interim Meeting Boston, MA Cedric Westphal.

Monday, June 01, 2015 ARRIVE: Algorithm for Robust Routing in Volatile Environments 1 NEST Retreat, Lake Tahoe, June

School of Information Technologies TCP Congestion Control NETS3303/3603 Week 9.

Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.

6/3/ Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross-Layer Information Awareness CS495 – Spring 2005 Northwestern University.

The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP

Color Aware Switch algorithm implementation The Computer Communication Lab (236340) Spring 2008.

AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.

Before start… Earlier work single-path routing in sensor networks

Performance Enhancement of TFRC in Wireless Ad Hoc Networks Mingzhe Li, Choong-Soo Lee, Emmanuel Agu, Mark Claypool and Bob Kinicki Computer Science Department.

1 ATP: A Reliable Transport Protocol for Ad-hoc Networks Sundaresan, Anantharam, Hseih, Sivakumar.

Self-organized fault-tolerant routing in P2P overlays Wojciech Galuba, Karl Aberer EPFL, Switzerland Zoran Despotovic, Wolfgang Kellerer Docomo Euro-Labs,

Color Aware Switch algorithm implementation The Computer Communication Lab (236340) Spring 2008.

Ns Simulation Final presentation Stella Pantofel Igor Berman Michael Halperin

Diffusion Early Marking Department of Electrical and Computer Engineering University of Delaware May / 2004 Rafael Nunez Gonzalo Arce.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.

Switching, routing, and flow control in interconnection networks.

Quasi Fat Trees for HPC Clouds and their Fault-Resilient Closed-Form Routing Technion - EE Department; *and Mellanox Technologies Eitan Zahavi* Isaac Keslassy.

Network Support for Cloud Services Lixin Gao, UMass Amherst.

Courtesy: Nick McKeown, Stanford 1 TCP Congestion Control Tahir Azim.

1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.

A Scalable, Commodity Data Center Network Architecture Jingyang Zhu.

CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.

Understanding the Performance of TCP Pacing Amit Aggarwal, Stefan Savage, Thomas Anderson Department of Computer Science and Engineering University of.

ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.

1 On Class-based Isolation of UDP, Short-lived and Long-lived TCP Flows by Selma Yilmaz Ibrahim Matta Computer Science Department Boston University.

Department of Computer Science A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08 Reporter:

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Evaluation of ad hoc routing over a channel switching MAC protocol Ethan Phelps-Goodman Lillie Kittredge.

An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.

TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.

An End-to-End Service Architecture r Provide assured service, premium service, and best effort service (RFC 2638) Assured service: provide reliable service.

Courtesy Piggybacking: Supporting Differentiated Services in Multihop Mobile Ad Hoc Networks Wei LiuXiang Chen Yuguang Fang WING Dept. of ECE University.

Access Link Capacity Monitoring with TFRC Probe Ling-Jyh Chen, Tony Sun, Dan Xu, M. Y. Sanadidi, Mario Gerla Computer Science Department, University of.

Logically Centralized? State Distribution Trade-offs in Software Defined Networks.

MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,

R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.

McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 Muhammad Waseem Iqbal Lecture # 20 Data Communication.

How I Learned to Stop Worrying About the Core and Love the Edge

How to Train your Dragonfly

Controlling the Cost of Reliability in Peer-to-Peer Overlays

ISP and Egress Path Selection for Multihomed Networks

Congestion-Aware Load Balancing at the Virtual Edge

A Course-End Conclusions

Switching, routing, and flow control in interconnection networks

Presented by LINGLING MENG( ), XUN XU( )

So far, On the networking side, we looked at mechanisms to links hosts using direct linked networks and then forming a network of these networks. We introduced.

Distributed Channel Assignment in Multi-Radio Mesh Networks

IT351: Mobile & Wireless Computing

The Impact of Multihop Wireless Channel on TCP Performance

Staged Refresh Timers for RSVP

Fast Congestion Control in RDMA-Based Datacenter Networks

CS 6290 Many-core & Interconnect

Congestion-Aware Load Balancing at the Virtual Edge

AI Applications in Network Congestion Control

QoS routing Finding a path that can satisfy the QoS requirement of a connection. Achieving high resource utilization.

Presentation transcript:

* Mellanox Technologies LTD, + Technion - EE Department Distributed Adaptive Routing for Big-Data Applications Running on Data Center Networks Eitan Zahavi*+ Isaac Keslassy+ Avinoam Kolodny+ * Mellanox Technologies LTD, + Technion - EE Department ANCS 2012

Longer, Higher BW and Fewer Flows Big Data – Larger Flows Data-set sizes keep rising Web2 and Cloud Big-Data applications Data Center Traffic changes to: Longer, Higher BW and Fewer Flows Google

Static Routing of Big-Data = Low BW Static Routing cannot balance a small number of flows Congestion: when BW of link flows > link capacity When longer and higher-BW flows contend: On lossy network: packet drop → BW drop On lossless network: congestion spreading → BW drop Data flow SR

Traffic Aware Load Balancing Systems Adaptive Routing adjusts routing to network load Centralized Flows are routed according to a “global” knowledge Distributed Each flow is routed by its input switch with “local” knowledge Self Routing Unit Central Routing Control SR SR SR

Central vs. Distributed Adaptive Routing Property Central Adaptive Routing Distributed Adaptive Routing Scalability Low High Knowledge Global Local (to keep scalability) Non-Blocking Yes Unknown Distributed Adaptive Routing is either scalable or have global knowledge It is Reactive

Research Question Can a Scalable Distributed Adaptive Routing System perform like centralized system and produce non- blocking routing assignments in reasonable time?

Trial and Error Is Fundamental to Distributed AR Randomize output port – Trial 1 Send the traffic Contention 1 Un-route contending flow Randomize new output port – Trial 2 Contention 2 Randomize new output port – Trial 3 Convergence! SR

Routing Trials Cause BW Loss Packet Simulation: R1 is delivered followed by G1 R2 is stuck behind G1 Re-route R3 arrives before R2 Out-of-Order Packets delivery! Implications are significant drop in flow BW TCP* sees out-of-order as packet-drop and throttle the senders See “Incast” papers… * Or any other reliable transport R3 R1 R2 SR R1 G1

Research Plan Given Analyze Distributed Adaptive Routing systems Find how many routing trials are required to converge Find conditions that make the system reach a non-blocking assignment in a reasonable time events New Traffic Trial 1 Trial 2 Trial N No Contention t

A Simple Policy for Selecting a Flow to Re-Route At each time step Each output switch Request re-route of a single worst contending flow At t=0 New traffic pattern is applied Randomize output-ports and Send flows At t=0.5 Request Re-Routes Repeat for t=t+1 until no contention 1 1 m r 1 SR n n SR SR input switch output switch

Evaluation Measure average number of iterations I to convergence I is exponential with system size !

A Balls and Bins Representation Each output switch is a “balls and bins” system Bins are the switch input links, balls are the link flows Assume 1 ball (=flow) is allowed on each bin (=link) A “good” bin has ≤ 1 ball Bins are either “empty”, “good” or “bad” SR Middle Switch 1 m empty bad good

Balls are numbered by their input switch number System Dynamics Two reasons of ball moves Improvement or Induced-move Induced 2 1 3 4 SW2 SW1 SW3 3 Output switch 1 1 2 3 Middle Switch: 1 2 3 4 Improve 3 Output switch 2 2 1 3 Middle Switch: 1 2 3 4 Balls are numbered by their input switch number

The “Last” Step Governs Convergence Estimated Markov chain models What is the probability of the required last Improvement to not cause a bad Induced move? Each one of the r output-switches must do that step Therefore convergence time is exponential with r Absorbing – 1 Absorbing 1 A B C D Good Bad Output switch 1 Output switch 2 Output switch r

Introducing p Assume a symmetrical system: flows have same BW What if the Flow_BW < Link_BW? The network load is Flow_BW/Link_BW p = how many balls are allowed in one bin p=1 p=2 SR p=2 p=1 SR SR

p has Great Impact on Convergence Measure average number of iterations I to convergence I shows very strong dependency on p

Implementable Distributed System Replace congestion detection by flow-count with QCN Detected on middle switch output – not output switch input Replace “worst flow selection” by congested flow sampling Implement as extension to detailed InfiniBand flit level model

52% Load on 1152 nodes Fat-Tree No change in number of adaptations over time ! No convergence

48% Load on 1152 nodes Fat-Tree t [sec] Switch Routing Adaptations/ 10usec

Conclusions Study: Distributed Adaptive Routing of Big-Data flows Focus on: Time to convergence to non-blocking routing Learning: The cause for the slow convergence Corollary: Half link BW flows converge in few iterations Evaluation: 1152 nodes fat-tree simulation reproduce these results Distributed Adaptive Routing of Half Link_BW Flows is both Non-Blocking and Scalable