Download presentation
Presentation is loading. Please wait.
1
Understanding TCP Incast Throughput Collapse in Datacenter Network Offense: Carly Ho Ning Xia
2
Offense Outline Challenge the contributions Challenge the methodology Challenge the details
3
Weak Contribution: Repeated Work Highly depends on another paper Reproduce the results in prior work Use other’s workload Unpolished tools and code Use other’s Linux kernel modification >> What did the authors do for this paper? o Just change the parameters and repeat!
4
Weak Contribution: Not well addressed topic A "possible" explanation is that the switch buffer is a fundamentally shared resource. However, switches and routers with large buffers are expensive, and even large bu ff ers "may" be filled up quickly with ever higher speed links. Some variables are inter-dependent with others, some variables "may" have no impact on goodput at all.
5
Weak Contribution: Workloads and Environment The understanding of Incast should be evaluated under a wide variety of settings and environments. o "we also plan to evaluate our mechanisms for different applications, environments, network topologies."
6
Too small a minimum RTO can lead to spurious timeouts for wide-area network traffic [2] Does not address the case where a large number of short-lived TCP burst and non- TCP traffic might share the Ethernet fabric, causing severe unfairness to TCP traffic [1] Weak Contribution: Other problems
7
Methodology Problem: Model What is the model for the variable- fragment workload? o It's not really explained at all Model is incomplete and so limited Are you sure your model works for other networks? o Only done on two different testbeds
8
Methodology Problem: Testbed If results were so different from previous work, they should have tested them on a variety of other testbeds with different settings to see if results remained constant Verification of results not yet performed
9
Methodology Problem: Weakness of Quantitative models We want to know the statistical value of measured and predicted results, rather than just saying the shapes of curves are identical.
10
Methodology Problem: Measurement What’s the timeline reconstruction and analysis tool you built? How to guarantee its correctness even though tools are not sufficiently polished to be released
11
Details: What does this figure mean? What does this mean? What is the difference here? Why they are highly similar?
12
Details: What does this figure mean? Does this figure mean your prediction is wrong?
13
Reference [1] V. S. Rajanna et al, XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platform [2] T. Benson et al, The case for fine-grained traffic engineering in data centers
14
Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.