Catching the Microburst Culprits with Snappy

Slides:



Advertisements
Similar presentations
Computer Networking Lecture 20 – Queue Management and QoS.
Advertisements

Traffic Engineering with Forward Fault Correction (FFC)
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Incremental Consistent Updates Naga Praveen Katta Jennifer Rexford, David Walker Princeton University.
Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.
A Case for Relative Differentiated Services and the Proportional Differentiation Model Constantinos Dovrolis Parameswaran Ramanathan University of Wisconsin-Madison.
1 Modeling and Emulation of Internet Paths Pramod Sanaga, Jonathon Duerig, Robert Ricci, Jay Lepreau University of Utah.
A Deficit Round Robin Input Arbiter for NetFPGA Jonathan Woodruff.
Computer Networking Lecture 17 – Queue Management As usual: Thanks to Srini Seshan and Dave Anderson.
Information-Agnostic Flow Scheduling for Commodity Data Centers
UCB Improvements in Core-Stateless Fair Queueing (CSFQ) Ling Huang U.C. Berkeley cml.me.berkeley.edu/~hlion.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
An Integrated IP Packet Shaper and Scheduler for Edge Routers MSEE Project Presentation Student: Yuqing Deng Advisor: Dr. Belle Wei Spring 2002.
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
CONGESTION CONTROL and RESOURCE ALLOCATION. Definition Resource Allocation : Process by which network elements try to meet the competing demands that.
Fair Queueing. 2 First-Come-First Served (FIFO) Packets are transmitted in the order of their arrival Advantage: –Very simple to implement Disadvantage:
Queueing and Active Queue Management Aditya Akella 02/26/2007.
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly.
Analysis of QoS Arjuna Mithra Sreenivasan. Objectives Explain the different queuing techniques. Describe factors affecting network voice quality. Analyse.
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
WB-RTO: A Window-Based Retransmission Timeout Ioannis Psaras Demokritos University of Thrace, Xanthi, Greece.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.
1 Fair Queuing Hamed Khanmirza Principles of Network University of Tehran.
Adaptive Inverse Multiplexing for Wide-Area Wireless Networks Alex C. Snoeren MIT Laboratory for Computer Science IEEE Globecom ’99 Rio de Janeiro, December.
TCP Traffic Characteristics—Deep buffer Switch
1 Sheer volume and dynamic nature of video stresses network resources PIE: A lightweight latency control to address the buffer problem issue Rong Pan,
SketchVisor: Robust Network Measurement for Software Packet Processing
scheduling for local-area networks”
Yiting Xia, T. S. Eugene Ng Rice University
HULA: Scalable Load Balancing Using Programmable Data Planes
Corelite Architecture: Achieving Rated Weight Fairness
FlowRadar: A Better NetFlow For Data Centers
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
Empirically Characterizing the Buffer Behaviour of Real Devices
Buffer Management in a Switch
Queue Management Jennifer Rexford COS 461: Computer Networks
Measuring Service in Multi-Class Networks
TCP-LP: A Distributed Algorithm for Low Priority Data Transfer
Queuing and Queue Management
Optimal Elephant Flow Detection Presented by: Gil Einziger,
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
Operating systems Process scheduling.
CPU SCHEDULING.
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Process Scheduling Decide which process should run and for how long
COMP/ELEC 429 Introduction to Computer Networks
Programmable Switches
Lecture 16, Computer Networks (198:552)
Congestion Control Reasons:
Lecture 17, Computer Networks (198:552)
Network-Wide Routing Oblivious Heavy Hitters
Ran Ben Basat, Xiaoqi Chen, Gil Einziger, Ori Rottenstreich
Introduction to Packet Scheduling
Fast Network Congestion Detection And Avoidance Using P4
Horizon: Balancing TCP over multiple paths in wireless mesh networks
Lu Tang , Qun Huang, Patrick P. C. Lee
Toward Self-Driving Networks
Maintaining Stream Statistics over Sliding Windows
Toward Self-Driving Networks
Catching the Microburst Culprits with Snappy
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Introduction to Packet Scheduling
کنترل جریان امیدرضا معروضی.
2019/11/12 Efficient Measurement on Programmable Switches Using Probabilistic Recirculation Presenter:Hung-Yen Wang Authors:Ran Ben Basat, Xiaoqi Chen,
Presentation transcript:

Catching the Microburst Culprits with Snappy Xiaoqi Chen, Shir Landau Feibish, Yaron Koral, Ori Rottenstreich and Jennifer Rexford SIGCOMM SelfDN Workshop August 24th, 2018 Budapest, Hungary 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Microbursts: Short Lived Traffic Bursts Normal traffic rates are much lower than queue throughput Buildup is normally minimal 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Microbursts: Short Lived Traffic Bursts Occasional short lived traffic spikes Cause significant queue buildup 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Queue Buildup in Data Centers 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Queue Buildup in Carrier Networks 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Microbursts are expensive… Network admins want to: avoid packet loss use cheap switches high link utilizations support bursty workloads 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Who caused the microburst? The General Queue Occupancy Problem: What’s the size of each flow in the queue? Snappy solves: If a packet belongs to a heavy flow When queue is long Key Count 1 5 2 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Queue Occupancy Problem The problem is hard! Simultaneous add and delete. 3 Count Key 1 1 1 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Queue Occupancy Problem The problem is hard! Simultaneous add and delete. Count Key 1 Update both for arrivals and departures 1 2 1 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks Solution: snapshots Snappy maintains snapshots for short periods of incoming traffic. We then combine snapshots to estimate entire queue’s content. Observation 1: when queue is long, low relative error Observation 2: we care about heavy flows, not everyone ? S1 S2 S3 S4 … ~Count Key 1 5 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Round-Robin between Snapshots Observation 3: limited #snapshots needed. Read Read Read Write Clean Read Read 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Precision vs. Snapshot Size Catching heavy flows: Using 4~8 snapshots is sufficient. 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

In Queue Flow Size Estimation Flow-size estimate: Low absolute error (~50kb) 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks Summary & Future work Problem OUR Solution Can’t add/delete simultaneously Restricted computation in data plane Microburst is short Use snapshot to avoid deletion, combine snapshots Use sketch Immediate action in data plane Future Work Deployment on Backbone Variations on the queue model (Priority, non- FIFO) Variations on the flow statistics (heavy flow groups) Weighted actions 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks Backup Slides 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Evaluation – Window size 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Protocol Independent Switch Architecture Queuing metadata becomes available R R W C R Snappy snapshots live here 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Queuing and processing Parser Traffic Manager Ingress Pipe Queuing Egress Pipe Deparser Queue Depth info becomes available Snappy resides here 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Implementing Snappy on PISA: Approximation Using CM Sketch Count-Min Sketch [CM ‘05] Register Arrays +1 +1 B Counters +1 f C columns 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks

Backup: Snapshot Data Structure Residing in the data plane Stage 1 Stage 2 Stage 3 Stage 4 Snap 1 Row 1 +1 Snap 1 Row 2 +1 Snap 2 Row 1 Read Snap 2 Row 2 Read Packet 8/24/18 SIGCOMM 2018 Afternoon Workshop on Self-Driving Networks