Networking the Cloud Presenter: b 電機三 姜慧如
Data Centers holding tens to hundreds of thousands of servers. Concurrently supporting a large number of distinct services. Economies of scale. Dynamically reallocate servers among services as workload pattern changes. High utilization is needed. Agility!
Agility: Any machine to be able to play any role. “Any Service, any Server.”
Ugly Secret: 30% utilization is considered good. Uneven application fit: -- Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others. Long provisioning timescales: -- New servers purchased quarterly at best. Uncertainty in demand: -- Demand for a new service can spike quickly. Risk management: -- Not having spare servers to meet demand brings failure just when success is at hand. Session state and storage constraints: -- If the world were stateless servers….
Workload management -- Means for rapidly installing a service’s code on a server. Virtual Machines, disk images. Storage management -- Means for a server to access persistent data. Distributed filesystems. Network -- Means for communicating with other servers, regardless of where they are in the data center. But: Today’s data center network prevent agility in several ways.
Static network assignment Fragmentation of resource Poor server to server connectivity Traffics affects each other Poor reliability and utilization CR AR S S S S S S S S A A A A A A … S S S S A A A A A A …... S S S S S S S S A A A A A A … S S S S A A A A A A … I want more I have spare ones, but… 1:5 1:80 1:240
Achieve scale by assigning servers topologically related IP addresses and dividing servers among VLANs. Limited utility of VMs, cannot migrate out the original VLAN while keeping the same IP address. Fragmentation of address space. Configuration needed when reassigned to different services.
9 1. L2 semantics 2. Uniform high capacity 3. Performance isolation A A A A A A … A A A A A A …... A A A A A A … A A A A A A … CR AR S S S S S S S S S S S S S S S S S S S S S S S S A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A...
Developers want a mental model where all their servers, and only their servers, are plugged into an Ethernet switch. 1. Layer-2 semantics -- Flat addressing, so any server can have any IP Address. -- Server configuration is the same as in a LAN. -- VM keeps the same IP address even after migration 2. Uniform high capacity -- Capacity between servers limited only by their NICs. -- No need to consider topology when adding servers. 3. Performance isolation -- Traffic of one service should be unaffected by others.
11 Solution Approach Objective 2. Uniform high capacity between servers Enforce hose model using existing mechanisms only Employ flat addressing 1. Layer-2 semantics 3. Performance Isolation Guarantee bandwidth for hose-model traffic Flow-based random traffic indirection (Valiant LB) Name-location separation & resolution service TCP “Hose”: each node has ingress/egress bandwidth constraints
Ethernet switching (layer 2) Cheaper switch equipment Fixed addresses and auto-configuration Seamless mobility, migration, and failover IP routing (layer 3) Scalability through hierarchical addressing Efficiency through shortest-path routing Multipath routing through equal-cost multipath So, like in enterprises… Data centers often connect layer-2 islands by IP routers 12
Data-Center traffic analysis: DC traffic != Internet traffic Traffic volume between servers to entering/leaving data center is 4:1 Demand for bandwidth between servers growing faster Network is the bottleneck of computation Flow distribution analysis: Majority of flows are small, biggest flow size is 100MB The distribution of internal flows is simpler and more uniform 50% times of 10 concurrent flows, 5% greater than 80 concurrent flows
Traffic matrix analysis: Poor summarizing of traffic patterns Instability of traffic patterns Failure characteristics: Pattern of networking equipment failures: 95% 10 days No obvious way to eliminate all failures from the top of the hierarchy
Flat Addressing: Allow service instances (ex. virtual machines) to be placed anywhere in the network. Valiant Load Balancing: (Randomly) Spread network traffic uniformly across network paths. End-system based address resolution: To scale to large server pools, without introducing complexity to the network control plane.
Design principle: Randomizing to cope with volatility: ▪ Using Valiant Load Balancing (VLB) to do destination independent traffic spreading across multiple intermediate nodes Building on proven networking technology: ▪ Using IP routing and forwarding technologies available in commodity switches Separating names from locators: ▪ Using directory system to maintain the mapping between names and locations Embracing end systems: ▪ A VL2 agent at each server
18... TOR 20 Servers Int... Aggr K aggr switches with D ports 20*(DK/4) Servers Offer huge aggr capacity & multi paths at modest cost
20 xy payload T3T3 y z T5T5 z I ANY Cope with arbitrary TMs with very little overhead Links used for up paths Links used for down paths T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 [ ECMP + IP Anycast ] Harness huge bisection bandwidth Obviate esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today [ ECMP + IP Anycast ] Harness huge bisection bandwidth Obviate esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today 1. Must spread traffic 2. Must ensure dst independence 1. Must spread traffic 2. Must ensure dst independence Equal Cost Multi Path Forwarding
xy payload T3T3 y z T5T5 z I ANY Links used for up paths Links used for down paths T1T1 T2T2 T3T3 T4T4 T5T5 T6T6
How Smart servers use Dumb switches– Encapsulation. Commodity switches have simple forwarding primitives. Complexity moved to servers -- computing the headers.
RSM DS RSM DS RSM DS Agent... Agent... Directory Servers RSM Servers 2. Reply 1. Lookup “Lookup” 5. Ack 2. Set 4. Ack (6. Disseminate) 3. Replicate 1. Update “Update”
payload ToR 3... y x Servers use flat names Switches run link-state routing and maintain only switch-level topology yz payload ToR 4 z ToR 2 ToR 4 ToR 1 ToR 3 y, z payload ToR 3 z... Directory Service … x ToR 2 y ToR 3 z ToR 4 … Lookup & Response … x ToR 2 y ToR 3 z ToR 3 … LAs AAs
Data center Oses already heavily modified for VMs, storage clouds, etc. No change to apps or clients outside DC.
Uniform high capacity: All-to-all data shuffle stress test: ▪ 75 servers, deliver 500MB ▪ Maximal achievable goodput is 62.3 ▪ VL2 network efficiency as 58.8/62.3 = 94%
Performance isolation: Two types of services: ▪ Service one: 18 servers do single TCP transfer all the time ▪ Service two: 19 servers starts a 8GB transfer over TCP every 2 seconds ▪ Service two: 19 servers burst short TCP connections
Convergence after link failures 75 servers All-to-all data shuffle Disconnect links between intermediate and aggregation switches