Sharing is Caring In Datacenter Networks A cloud computing discussion led by Justine
Techniques we learned today Random can work “pretty well” (VLB/ECMP) to avoid collisions in DC networks. For bulk computing applications like Dryad and MR, optimizing transfers rather than flows makes jobs run much faster. Using a centralized controller – To schedule (Orchestra) – To route and reroute (Hedera) …can improve performance.
But let’s be careful about when they’re useful… Bulk computing frameworks – Flows are large – Flows come in batches - transfers – Can typically modify endpoints Front-end/Web Services – Flows are small – Flows arise on-demand – Want flows to complete as quickly as possible
But let’s be careful about when they’re useful… And what about multi-tenant environments, e.g. EC2? – Traffic patterns are varied – Fairness very important – Limited or no ability to change end-host stack
What’s missing from the centralized schedulers in Hedera and Orchestra?
None of these papers focused on network topology – is topology even relevant?
ECMP/VLBHederaOrchestra Bulk Transfer Completion Time Long Flow Completion Time Short Flow Completion Time Fairness Complexity in Network Modification to End- Hosts Obligatory WIN/LOSE Table