Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 401/601 Computer Network Systems Mehmet Gunes

Similar presentations


Presentation on theme: "CS 401/601 Computer Network Systems Mehmet Gunes"— Presentation transcript:

1 CS 401/601 Computer Network Systems Mehmet Gunes
Data Center Networks CS 401/601 Computer Network Systems Mehmet Gunes Slides modified from: Mohammad Alizadeh, Albert Greenberg, Changhoon Kim, Srinivasan Seshan

2 What are Data Centers? Large facilities with 10s of thousands of networked servers Compute, storage, and networking working in concert “Warehouse-Scale Computers” Huge investment: ~ 0.5 billion for large datacenter

3 Data Center Costs Amortized Cost* Component Sub-Components ~45%
Servers CPU, memory, disk ~25% Power infrastructure UPS, cooling, power distribution ~15% Power draw Electrical utility costs Network Switches, links, transit The Cost of a Cloud: Research Problems in Data Center Networks. Sigcomm CCR Greenberg, Hamilton, Maltz, Patel. *3 yr amortization for servers, 15 yr for infrastructure; 5% cost of money

4 Server Costs Each server has CPU, memory, disk:
30% utilization considered “good” in most data centers! Uneven application fit Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others Uncertainty in demand Demand for a new service can spike quickly Risk management Not having spare servers to meet demand brings failure just when success is at hand

5 Goal: Agility – Any service, Any Server
Turn the servers into a single large fungible pool Dynamically expand and contract service footprint as needed Benefits Lower cost (higher utilization) Increase developer productivity Achieve high performance and reliability Increase productivity because services can be deployed much faster

6 Provide the illusion of
Datacenter Networks 10,000s of ports Provide the illusion of “One Big Switch” Compute Storage (Disk, Flash, …)

7 Datacenter Traffic Growth
Today: Petabits/s in one DC More than core of the Internet! Source: “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network”, SIGCOMM 2015.

8 Latency is King Who does she know? What has she done? Large-scale Web Application Data Center Traditional Application App Tier App Logic App Logic App. Logic Alice << 1µs latency Fabric 10μs-1ms latency Data Structures Single machine Data Tier 1 user request  1000s of messages over DC network Microseconds of latency matter Even at the tail (e.g., 99.9th percentile) Eric Minnie Pics Apps Videos Based on slide by John Ousterhout (Stanford)

9 Datacenter Arms Race Amazon, Google, Microsoft, Yahoo!, … race to build next-gen mega-datacenters Industrial-scale Information Technology 100,000+ servers Located where land, water, fiber-optic connectivity, and cheap power are available

10 Computers + Net + Storage + Power + Cooling

11 ~ 1,000 servers/pod == IP subnet
DC Networks Internet — L2 pros, cons? — L3 pros, cons? CR CR DC-Layer 3 . . . AR AR AR AR DC-Layer 2 S S Key CR = Core Router (L3) AR = Access Router (L3) S = Ethernet Switch (L2) A = Rack of app. servers . . . S S S S A A A A A A ~ 1,000 servers/pod == IP subnet Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004

12 Reminder: Layer 2 vs. Layer 3
Ethernet switching (layer 2) Fixed IP addresses and auto-configuration (plug & play) Seamless mobility, migration, and failover Broadcast limits scale (ARP) No multipath (Spanning Tree Protocol) IP routing (layer 3) Scalability through hierarchical addressing Multipath routing through equal-cost multipath Can’t migrate w/o changing IP address Complex configuration

13 Layer 2 vs. Layer 3 for Data Centers

14 Data center networks load balancer: application-layer routing
receives external client requests directs workload within data center returns results to external client (hiding data center internals from client) Internet Border router Load balancer Load balancer Access router Tier-1 switches B A C Tier-2 switches TOR switches Server racks 1 2 3 4 5 6 7 8

15 Scaling a LAN network Self-learning Ethernet switches work great at small scales, but buckle at larger scales Broadcast overhead of self-learning linear in the total number of interfaces Broadcast storms possible in non-tree topologies Goals Scalability to a very large number of machines Isolation of unwanted traffic from unrelated subnets Ability to accommodate general types of workloads (Web, database, MapReduce, scientific computing, etc.)

16 Data center networks rich interconnection among switches, racks:
increased throughput between racks (multiple routing paths possible) increased reliability via redundancy Server racks TOR switches Tier-1 switches Tier-2 switches 1 2 3 4 5 6 7 8

17 Broad questions How are massive numbers of commodity machines networked inside a data center? Virtualization: How to effectively manage physical machine resources across client virtual machines? Operational costs: Server equipment Power and cooling

18 Data Center Network

19 Hierarchical Addresses

20 Hierarchical Addresses

21 Hierarchical Addresses

22 Hierarchical Addresses

23 PortLand: Location Discovery Protocol
Location Discovery Messages (LDMs) exchanged between neighboring switches Switches self-discover location on boot up

24 Data Center Packet Transport
Large purpose-built DCs Huge investment: R&D business Transport inside the DC TCP rules 99.9% of traffic

25 TCP in the Data Center TCP does not meet demands of apps.
Suffers from bursty packet drops, Incast, ... Builds up large queues: Adds significant latency. Wastes precious buffers, esp. bad with shallow-buffered switches. Operators work around TCP problems Ad-hoc, inefficient, often expensive solutions No solid understanding of consequences, tradeoffs

26 Partition/Aggregate Application Structure
Art is… Picasso 1. 2. Art is a lie… 3. ….. TLA MLA Worker Nodes ……… Deadline = 250ms Deadline = 50ms Deadline = 10ms Picasso Time is money Strict deadlines (SLAs) Missed deadline Lower quality result 1. 2. 3. ….. 1. Art is a lie… 2. The chief… “Computers are useless. They can only give you answers.” “I'd like to live as a poor man with lots of money.“ “Bad artists copy. Good artists steal.” “Everything you can imagine is real.” “It is your work in life that is the ultimate seduction.“ “The chief enemy of creativity is good sense.“ “Inspiration does exist, but it must find you working.” “Art is a lie that makes us realize the truth.

27 Generality of Partition/Aggregate
The foundation for many large-scale web applications. Web search, Social network composition, Ad selection, etc. Example: Facebook Partition/Aggregate ~ Multiget Aggregators: Web Servers Workers: Memcached Servers Memcached Servers Internet Web Servers Memcached Protocol

28 Workloads Partition/Aggregate (Query) Short messages [50KB-1MB]
(Coordination, Control state) Large flows [1MB-50MB] (Data update) Delay-sensitive Throughput-sensitive Replace PDF

29 Tension Between Requirements
High Throughput High Burst Tolerance Low Latency Deep Buffers: Queuing Delays Increase Latency Shallow Buffers: Bad for Bursts & Throughput Objective: Low Queue Occupancy & High Throughput Deep Buffers – bad for latency Shallow Buffers – bad for bursts & throughput Reduce RTOmin – no good for latency AQM – Difficult to tune, not fast enough for incast-style micro-bursts, lose throughput in low stat-mux Reduced RTOmin Doesn’t Help Latency AQM – RED: Avg Queue Not Fast Enough for Incast

30 Review: The TCP/ECN Control Loop
Sender 1 ECN = Explicit Congestion Notification ECN Mark (1 bit) Receiver DCTCP is based on the existing Explicit Congestion Notification framework in TCP. Sender 2

31 Two Key Ideas React in proportion to the extent of congestion, not its presence Reduces variance in sending rates, lowering queuing requirements Mark based on instantaneous queue length. Fast feedback to better deal with bursts. ECN Marks TCP DCTCP Cut window by 50% Cut window by 40% Cut window by 5%

32 DCTCP in Action (Kbytes) Setup: Win 7, Broadcom 1Gbps Switch
Not ns2. Setup: Win 7, Broadcom 1Gbps Switch Scenario: 2 long-lived flows, K = 30KB

33 Why it Works High Burst Tolerance Low Latency 3. High Throughput
Large buffer headroom → bursts fit Aggressive marking → sources react before packets are dropped Low Latency Small buffer occupancies → low queuing delay 3. High Throughput ECN averaging → smooth rate adjustments, low variance

34 Current solutions for increasing data center network bandwidth
FatTree BCube Data intensive application operating on large volumes of data have motivated a lot of research work on data center networking. The basic problem is traditional tree-structure Ethernet are heavily over-subscribed when a large amount of data are shuffled across different server racks. Recent solutions propose to construct full bisection bandwidth network using commodity packet switches. Full bisection bandwidth network means each server can talk to another server at full NIC rate. Full bisection bandwidth is a nice property, but as you can see , these full bisection bandwidth network is quite complicated. They requires a lot of wires and we have to follow restricted rules to construct such networks. When the network is constructed, it is hard to expand because they requires major rewiring to expand. 1. Hard to construct 2. Hard to expand

35 Fat-Tree Inter-connect racks (of servers) using a fat-tree topology
Fat-Tree: a special type of Clos Networks (after C. Clos) K-ary fat tree: three-layer topology (edge, aggregation and core) each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches each edge switch connects to k/2 servers & k/2 aggr. switches each aggr. switch connects to k/2 edge & k/2 core switches (k/2)2 core switches: each connects to k pods

36 Fat-Tree Fat-tree with K=4

37 Why Fat-Tree? Fat tree has identical bandwidth at any bisections
Each layer has the same aggregated bandwidth Can be built using cheap devices with uniform capacity Each port supports same speed as end host All devices can transmit at line speed if packets are distributed uniform along available paths Great scalability: k-port switch supports k3/4 servers Fat tree network with K = 3 supporting 54 hosts


Download ppt "CS 401/601 Computer Network Systems Mehmet Gunes"

Similar presentations


Ads by Google