EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon Kim Albert Greenberg
What are we depending on? lessons-weve-learned-using-aws.html 5 Lessons We’ve Learned Using AWS … in the Netflix data centers, we have a high capacity, super fast, highly reliable network. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency. Overhaul apps to deal with variability 2 Many customers don’t even realise network issues: Just “spin up more VMs!” Makes app more network dep.
Cloud: Warehouse Scale Computer Multi-tenancy: To increase cluster utilisation 6/11/123 Provisioning the Warehouse CPU, memory, disk Network
Sharing the Network Policy – Sharing model Mechanism – Computing rates – Enforcing rates on entities… Per-VM (multi-tenant) Per-service (search, map-reduce, etc.) 6/11/124 Can we achieve this? 2Ghz VCPU 15GB memory 1Gbps network Tenant X’s Virtual Switch VM1 VM2 VMn VM3 … Tenant Y’s Virtual Switch VM1 VM2 VMi VM3 … Customer X specifies the thickness of each pipe. No traffic matrix. (Hose Model)
Why is it hard? (1) Bandwidth demands can be… – Random, bursty – Short: few millisecond requests Timescales matter! – Need guarantees on the order of few RTTs (ms) 6/11/125 Default policy insufficient: 1 vs many TCP flows, UDP, etc. Poor scalability of traditional QoS mechanisms 10–100KB 10–100MB
Seconds: Eternity 6/11/126 Switch 1 Long lived TCP flow Bursty UDP session ON: 5ms OFF: 15ms Shared 10G pipe
Under the hood 6/11/127 Switch
Why is it hard? (2) 6/11/128 Switch Switch sees contention, but lacks VM state Receiver-host has VM state, but does not see contention (1) Drops in network: servers don’t see true demand (2) Elusive TCP (back-off) makes true demand detection harder
Key Idea: Bandwidth Headroom Bandwidth guarantees: managing congestion Congestion: link util reaches 100% – At millisecond timescales Don’t allow 100% util – 10% headroom: Early detection at receiver 6/11/129 N x 10G UDP TCP Shared pipe Limit to 9G Single Switch: Headroom What about a network?
Network design: the old 6/11/ for-cloud-and-big-data-interop-2012-session-teaser/ Over-subscription
Network design: the new 6/11/ for-cloud-and-big-data-interop-2012-session-teaser/ (1) Uniform capacity across racks (2) Over-subscription only at Top-of-Rack
Mitigating Congestion in a Network 6/11/1212 Load balancing + Admissibility = Hotspot free network core [VL2, FatTree, Hedera, MicroTE] Aggregate rate > 10Gbps Fabric gets congested Server VM 10Gbps pipe Fabric Aggregate rate < 10Gbps Congestion free Fabric Server VM 10Gbps pipe Fabric Load balancing: ECMP, etc. Admissibility: e2e congestion control (EyeQ)
EyeQ Platform 6/11/1213 TX packets VM TX VM Software VSwitch Adaptive Rate Limiters untrusted RX 3Gbps 6Gbps RX packets Software VSwitch VM Congestion Detectors untrusted VM RX component detects TX component reacts End-to-end flow control (VSwitch—VSwitch) DataCentre Fabric Congestion Feedback
Does it work? 6/11/1214 Without EyeQWith EyeQ Improves utilisation Provides protection TCP: 6Gbps UDP: 3Gbps
State: only at edge 15 EyeQ One Big Switch
Thanks! 16 EyeQ Load balancing + Bandwidth headroom + Admissibility at millisec timescales = Network as one big switch = Bandwidth sharing at edge Linux, Windows implementation for 10Gbps ~1700 lines C code (Linux kmod) No documentation, yet.
6/11/1217