Understanding TCP Incast Throughput Collapse in Datacenter Network Offense: Carly Ho Ning Xia.

Slides:



Advertisements
Similar presentations
Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
Advertisements

A Switch-Based Approach to Starvation in Data Centers Alex Shpiner and Isaac Keslassy Department of Electrical Engineering, Technion. Gabi Bracha, Eyal.
1 The ns-2 Network Simulator H Plan: –Discuss discrete-event network simulation –Discuss ns-2 simulator in particular –Demonstration and examples: u Download,
A Measurement Study of Available Bandwidth Estimation Tools MIT - CSAIL with Jacob Strauss & Frans Kaashoek Dina Katabi.
Pathload A measurement tool for end-to-end available bandwidth Manish Jain, Univ-Delaware Constantinos Dovrolis, Univ-Delaware Sigcomm 02.
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
Fixing TCP in Datacenters Costin Raiciu Advanced Topics in Distributed Systems 2011.
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
FLAME: A Flow-level Anomaly Modeling Engine
Balajee Vamanan et al. Deadline-Aware Datacenter TCP (D 2 TCP) Balajee Vamanan, Jahangir Hasan, and T. N. Vijaykumar.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.
CacheCast: Eliminating Redundant Link Traffic for Single Source Multiple Destination Transfers Piotr Srebrny, Thomas Plagemann, Vera Goebel Department.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
Offense: Chang Seok Bae Yi Yang. Offense Outline Challenge the contributions Challenge the methodology Challenge the conclusions Challenge the details.
An Implementation and Experimental Study of the eXplicit Control Protocol (XCP) Yongguang Zhang and Tom Henderson INFOCOMM 2005 Presenter - Bob Kinicki.
Congestion control in data centers
1 Modeling and Taming Parallel TCP on the Wide Area Network Dong Lu,Yi Qiao Peter Dinda, Fabian Bustamante Department of Computer Science Northwestern.
Denial of Service Resilience in Ad Hoc Networks Imad Aad, Jean-Pierre Hubaux, and Edward W. Knightly Designed by Yao Zhao.
New Local Climate Outlook Products from the NWS Andrea Bair NOAA/NWS Western Region Headquarters Climate Services Program Manager.
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty of Electrical.
Low-Rate TCP Denial of Service Defense Johnny Tsao Petros Efstathopoulos Tutor: Guang Yang UCLA 2003.
Analysis of Active Queue Management Jae Chung and Mark Claypool Computer Science Department Worcester Polytechnic Institute Worcester, Massachusetts, USA.
Ch. 28 Q and A IS 333 Spring Q1 Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer data across.
Practical TDMA for Datacenter Ethernet
Characteristics of QoS-Guaranteed TCP on Real Mobile Terminal in Wireless LAN Remi Ando † Tutomu Murase ‡ Masato Oguchi † † Ochanomizu University,Japan.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
11 Writing a Conference Research Paper Miguel A. Labrador Department of Computer Science & Engineering
TCP Throughput Collapse in Cluster-based Storage Systems
An End-to-end Approach to Increase TCP Throughput Over Ad-hoc Networks Sarah Sharafkandi and Naceur Malouch.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1.
20.1 Chapter 20 Network Layer: Internet Protocol Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
A test-bed investigation of QoS mechanisms for supporting SLAs in IPv6 Vasilios A. Siris and Georgios Fotiadis University of Crete and FORTH Heraklion,
U NDERSTANDING TCP I NCAST T HROUGHPUT C OLLAPSE IN D ATACENTER N ETWORKS Presenter: Aditya Agarwal Tyler Maclean.
1 On Class-based Isolation of UDP, Short-lived and Long-lived TCP Flows by Selma Yilmaz Ibrahim Matta Computer Science Department Boston University.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
1 Raspberry Pi HPC Testbed By Bradford W. Bazemore Georgia Southern University.
1 Measuring Congestion Responsiveness of Windows Streaming Media James Nichols Advisors: Prof. Mark Claypool Prof. Bob Kinicki Reader: Prof. David Finkel.
Welcome to EECS 395/495 Networking Problems in Cloud Computing.
Datacenter Network Simulation using ns3
Mitigating Congestion in Wireless Sensor Networks Bret Hull, Kyle Jamieson, Hari Balakrishnan Networks and Mobile Systems Group MIT Computer Science and.
1. Introduction REU 2006-Packet Loss Distributions of TCP using Web100 Zoriel M. Salado, Mentors: Dr. Miguel A. Labrador and Cesar D. Guerrero 2. Methodology.
03/03/051 Performance Engineering of Software and Distributed Systems Research Activities at IIT Bombay Varsha Apte March 3 rd, 2005.
Internet research Needs Better Models Sally Floyd, Eddie Kohler ISCI Center for Internet Research, Berkeley, California Presented by Max Podlesny.
Integration of Wireless Sensor Networks to the Internet of Things using a 6LoWPAN Gateway Integration of Wireless Sensor Networks to the Internet of Things.
1 Transport Layer: Basics Outline Intro to transport UDP Congestion control basics.
1 ICCCN 2003 Modelling TCP Reno with Spurious Timeouts in Wireless Mobile Environments Shaojian Fu School of Computer Science University of Oklahoma.
ICTCP: Incast Congestion Control for TCP in Data Center Networks By: Hilfi Alkaff.
© THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 1 ns-2 TCP Simulations with The Network Simulation Cradle Sam Jansen and Anthony McGregor.
MPTCP Implementation: Use cases for Enhancement Opportunities
CIS 700-5: The Design and Implementation of Cloud Networks
OTCP: SDN-Managed Congestion Control for Data Center Networks
ECE 544: Traffic engineering (supplement)
ElasticTree Michael Fruchtman.
Subject Name: Computer Communication Networks Subject Code: 10EC71
IEEE MEDIA INDEPENDENT HANDOVER DCN:
ATP TCP Reducing the Latency-Tail of Short-Lived Flows: Adding Forward Error Correction in Data Centers Klaus-Tycho Foerster, Demian Jaeger, David Stolz,
Congestion Control in Software Define Data Center Network
Carnegie Mellon University, *Panasas Inc.
Exam marking season feedback
Modeling and Taming Parallel TCP on the Wide Area Network
Machine Learning in Practice Lecture 27
A Case for Interconnect-Aware Architectures
When to use and when not to use BBR:
Presentation transcript:

Understanding TCP Incast Throughput Collapse in Datacenter Network Offense: Carly Ho Ning Xia

Offense Outline Challenge the contributions Challenge the methodology Challenge the details

Weak Contribution: Repeated Work Highly depends on another paper Reproduce the results in prior work Use other’s workload Unpolished tools and code Use other’s Linux kernel modification >> What did the authors do for this paper? o Just change the parameters and repeat!

Weak Contribution: Not well addressed topic A "possible" explanation is that the switch buffer is a fundamentally shared resource. However, switches and routers with large buffers are expensive, and even large bu ff ers "may" be filled up quickly with ever higher speed links. Some variables are inter-dependent with others, some variables "may" have no impact on goodput at all.

Weak Contribution: Workloads and Environment The understanding of Incast should be evaluated under a wide variety of settings and environments. o "we also plan to evaluate our mechanisms for different applications, environments, network topologies."

Too small a minimum RTO can lead to spurious timeouts for wide-area network traffic [2] Does not address the case where a large number of short-lived TCP burst and non- TCP traffic might share the Ethernet fabric, causing severe unfairness to TCP traffic [1] Weak Contribution: Other problems

Methodology Problem: Model What is the model for the variable- fragment workload? o It's not really explained at all Model is incomplete and so limited Are you sure your model works for other networks? o Only done on two different testbeds

Methodology Problem: Testbed If results were so different from previous work, they should have tested them on a variety of other testbeds with different settings to see if results remained constant Verification of results not yet performed

Methodology Problem: Weakness of Quantitative models We want to know the statistical value of measured and predicted results, rather than just saying the shapes of curves are identical.

Methodology Problem: Measurement What’s the timeline reconstruction and analysis tool you built? How to guarantee its correctness even though tools are not sufficiently polished to be released

Details: What does this figure mean? What does this mean? What is the difference here? Why they are highly similar?

Details: What does this figure mean? Does this figure mean your prediction is wrong?

Reference [1] V. S. Rajanna et al, XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platform [2] T. Benson et al, The case for fine-grained traffic engineering in data centers

Thank you