6.888 Lecture 5: Flow Scheduling

Slides:

Advertisements

Similar presentations

Finishing Flows Quickly with Preemptive Scheduling

Advertisements

Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.

PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.

Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,

Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.

Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.

Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.

XCP: Congestion Control for High Bandwidth-Delay Product Network Dina Katabi, Mark Handley and Charlie Rohrs Presented by Ao-Jan Su.

One More Bit Is Enough Yong Xia, RPI Lakshminarayanan Subramanian, UCB Ion Stoica, UCB Shivkumar Kalyanaraman, RPI SIGCOMM’05, August 22-26, 2005, Philadelphia,

The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP

TDC365 Spring 2001John Kristoff - DePaul University1 Internetworking Technologies Transmission Control Protocol (TCP)

Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.

Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

Information-Agnostic Flow Scheduling for Commodity Data Centers

Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.

3: Transport Layer3b-1 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle”

1 Lecture 14 High-speed TCP connections Wraparound Keeping the pipeline full Estimating RTT Fairness of TCP congestion control Internet resource allocation.

Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,

Network Protocols: Design and Analysis Polly Huang EE NTU

TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.

1 SIGCOMM ’ 03 Low-Rate TCP-Targeted Denial of Service Attacks A. Kuzmanovic and E. W. Knightly Rice University Reviewed by Haoyu Song 9/25/2003.

CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

Advance Computer Networks Lecture#09 & 10 Instructor: Engr. Muhammad Mateen Yaqoob.

Ασύρματες και Κινητές Επικοινωνίες Ενότητα # 11: Mobile Transport Layer Διδάσκων: Βασίλειος Σύρης Τμήμα: Πληροφορικής.

TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Congestion Control 0.

1 ICCCN 2003 Modelling TCP Reno with Spurious Timeouts in Wireless Mobile Environments Shaojian Fu School of Computer Science University of Oklahoma.

TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.

Transmission Control Protocol (TCP) Retransmission and Time-Out

Impact of New CC on Cross Traffic

Topics discussed in this section:

Chapter 3 outline 3.1 transport-layer services

Satellite TCP Lecture 19 04/10/02.

Chapter 6 TCP Congestion Control

Introduction to Congestion Control

CS 268: Lecture 6 Scott Shenker and Ion Stoica

TCP Congestion Control at the Network Edge

Router-Assisted Congestion Control

CIS, University of Delaware

TCP Congestion Control

TCP Congestion Control

TCP-LP: A Distributed Algorithm for Low Priority Data Transfer

Open Issues in Router Buffer Sizing

Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan

TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.

Lecture 19 – TCP Performance

Queuing and Queue Management

Improving Slow Start InKwan Yu.

Augmenting Proactive Congestion Control with Aeolus

So far, On the networking side, we looked at mechanisms to links hosts using direct linked networks and then forming a network of these networks. We introduced.

AMP: A Better Multipath TCP for Data Center Networks

Jiyong Park Seoul National University, Korea

CS640: Introduction to Computer Networks

TCP Throughput Modeling

TCP Congestion Control

Centralized Arbitration for Data Centers

Lecture 16, Computer Networks (198:552)

TCP Congestion Control

Lecture 17, Computer Networks (198:552)

CSE 4213: Computer Networks II

The Transport Layer Reliability

Transport Layer: Congestion Control

Chapter 3 outline 3.1 Transport-layer services

TCP: Transmission Control Protocol Part II : Protocol Mechanisms

Lecture 5, Computer Networks (198:552)

Lecture 6, Computer Networks (198:552)

Presentation transcript:

6.888 Lecture 5: Flow Scheduling Mohammad Alizadeh Spring 2016

Datacenter Transport Goal: Complete flows quickly / meet deadlines Short flows (e.g., query, coordination) Large flows (e.g., data update, backup) Low Latency High Throughput

Low Latency Congestion Control (DCTCP, RCP, XCP, …) Keep network queues small (at high throughput) Implicitly prioritize mice Can we do better?

The Opportunity Many DC apps/platforms know flow size or deadlines in advance Key/value stores Data processing Web search 4 Front end Server Aggregator … … Worker …

What You Said Amy: “Many papers that propose new network protocols for datacenter networks (such as PDQ and pFabric) argue that these will improve "user experience for web services". However, none seem to evaluate the impact of their proposed scheme on user experience… I remain skeptical that small protocol changes really have drastic effects on end-to-end metrics such as page load times, which are typically measured in seconds rather than in microseconds.”

H1 H1 H2 H2 H3 H3 H4 H4 H5 H5 H6 H6 H7 H7 H8 H8 H9 H9 TX RX

Flow scheduling on giant switch DC transport = Flow scheduling on giant switch Objective? Minimize avg FCT Minimize missed deadlines H1 H1 H2 H2 ingress & egress capacity constraints H3 H3 H4 H4 H5 H5 H6 H6 H7 H7 H8 H8 H9 H9 TX RX

Example: Minimize Avg FCT Flow A Size Flow B Flow C 1 2 3 A B C arrive at the same time share the same bottleneck link Adapted from slide by Chi-Yao Hong (UIUC)

Example: Minimize Avg FCT Fair sharing: 3, 5, 6 mean: 4.67 Shortest flow first: 1, 3, 6 mean: 3.33 B B C C C Time Throughput Time Throughput 1 3 B A 6 C 1 A B C 3 B C 5 6 C Adapted from slide by Chi-Yao Hong (UIUC)

Optimal Flow Scheduling for Avg FCT NP-hard for multi-link network [Bar-Noy et al.] Shortest Flow First: 2-approximation 1 2 3

Some transmission order How can we schedule flows based on flow criticality in a distributed way? Some transmission order

PDQ Several slides based on presentation by Chi-Yao Hong (UIUC)

PDQ: Distributed Explicit Rate Control Sender Switch Switch Receiver … criticality rate = 10 Packet hdr Switch preferentially allocates bandwidth to critical flows 5 Traditional explicit rate control Fair sharing (e.g., XCP, RCP)

Contrast with Traditional Explicit Rate Control Traditional schemes (e.g. RCP, XCP) target fair sharing Sender Switch Switch Receiver … rate = 10 Packet hdr 5 Each switch determines a “fair share” rate based on local congestion: R  R - k*congestion-measure Source use smallest rate advertised on their path

Challenges PDQ switches need to agree on rate decisions Low utilization during flow switching Congestion and queue buildup Paused flows need to know when to start

Challenge: Switches need to agree on rate decisions Sender Switch Switch Receiver … Packet hdr criticality rate = 10 pauseby = X What can go wrong without consensus? How do PDQ switches reach consensus? Why is “pauseby” needed?

What You Said Austin: “It is an interesting departure from AQM in that, with the concept of paused queues, PDQ seems to leverage senders as queue memory.”

Challenge: Low utilization during flow switching B A C Goal: How does PDQ avoid this? 1-2 RTTs A Practice: B C

Early Start: Seamless flow switching Start next set of flows 2 RTTs Throughput 1 Time

Early Start: Seamless flow switching Solution: rate controller at switches [XCP/TeXCP/D3] increased queue Throughput 1 Time

Discussion

Mean FCT Mean flow completion time [Normalized to a lower bound] RCP Omniscient scheduler controls with zero control feedback delay [Normalized to a lower bound] RCP TCP PDQ w/o Early Start PDQ

What if flow size not known? Why does flow size estimation (criticality = bytes sent) work better for Pareto?

Other questions Fairness: can long flows starve? Resilience to error: what if packet gets lost or flow information is inaccurate? Multipath: does PDQ benefit from multipath? 99% of jobs complete faster under SJF than under fair sharing [Bansal, Harchol-Balter; SIGMETRICS’01] Assumption: heavy-tailed flow distribution

pFabric

Decouple scheduling from rate control pFabric in 1 Slide Packets carry a single priority # e.g., prio = remaining flow size pFabric Switches Send highest priority / drop lowest priority pkts Very small buffers (20-30KB for 10Gbps fabric) pFabric Hosts Send/retransmit aggressively Minimal rate control: just prevent congestion collapse Main Idea: Decouple scheduling from rate control

pFabric Switch Boils down to a sort Essentially unlimited priorities Thought to be difficult in hardware Existing switching only support 4-16 priorities pFabric queues very small 51.2ns to find min/max of ~600 numbers Binary comparator tree: 10 clock cycles Current ASICs: clock ~ 1ns

pFabric Rate Control Minimal version of TCP algorithm What about queue buildup? Why window control? Minimal version of TCP algorithm Start at line-rate Initial window larger than BDP No retransmission timeout estimation Fixed RTO at small multiple of round-trip time Reduce window size upon packet drops Window increase same as TCP (slow start, congestion avoidance, …) After multiple consecutive timeouts, enter “probe mode” Probe mode sends min. size packets until first ACK

Why does pFabric work? Key invariant: Priority scheduling At any instant, have the highest priority packet (according to ideal algorithm) available at the switch. Priority scheduling High priority packets traverse fabric as quickly as possible What about dropped packets? Lowest priority → not needed till all other packets depart Buffer > BDP → enough time (> RTT) to retransmit

Discussion

Overall Mean FCT

Mice FCT (<100KB) Average 99th Percentile

Elephant FCT (>10MB) Average Why the gap? Pkt drop rate at 80% load: 4.3%

Loss Rate vs Packet Priority (at 80% load) * Loss rate at other hops is negligible Almost all packet loss is for large (latency-insensitive) flows

Next Time: Multi-Tenant Performance Isolation