Download presentation
Presentation is loading. Please wait.
1
CIT 668: System Architecture
Data Centers II
2
Topics Containers Data Center Network Reliability Economics
3
Containers
4
Containers Data Center in a shipping container. Advantages
4-10X normal data center density. 1000s of servers. 100s of kW of power. Advantages Efficient cooling High server density Rapid deployment Scalability Vendors: HP, IBM, Rackable, Sun, Verari Images from Vendor offerings:
5
Microsoft Chicago Data Center
6
Google Container Patents
Containers docked at central power spline Vertical stack of containers Container air flow diagram, with a center cold aisle and hot air return behind servers
7
Data Center Networks
8
Performance Impact of App Living Across
Data Center Racks Servers Processors Cores Bandwidth increases Latency, storage, processing power increases
9
Conventional DC Network Design
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta VL2: a scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication (SIGCOMM '09). ACM, New York, NY, USA, DOI= /
10
Limitations of Conventional DC Network
Layer 2 is flexible, allowing movement of a device transparently to the IP layer. Layer 3 divides network into subnet hierarchy, which requires address reassignment to move But cannot scale due to STP, which reduces number of paths and enforces inefficient paths Must buy extra switches for redundancy STP prevents using their capacity (low utilization) Layer 3 segmentation is static and difficult to change quickly Spreading a service outside a single layer 2 domain requires reconfiguring IP addresses and VLAN trunks
11
Over-Subscription Rack uplink oversubscription Global oversubscription
ToR switches 1:5 to 1:20 oversubscribed 1-4 Gbps of uplink for 20 servers at 1Gbps each Global oversubscription Paths through top of network 1:80 to 1:240 over Oversubscription exacerbated by STP Single path between any 2 bridges in network Redundant links blocked, reducing bandwidth
12
Impact of Over-Subscription
Mostly vertical flows Oversubscription impact minimal in client/server applications Mix of vertical and horizontal flows Oversubscription limits capacity to communicate between servers
13
Solution: Improve Layer 2 Networking
IEEE Data Center Bridging Add flow control and multipathing standards Infiniband Widely used in top supercomputer clusters 40 Gbps is widely used; 10 Gbps is slowest available FabricPath Cisco feature to bring routing features to layer 2 Extends layer 2 header with Fabricpath header
14
FabricPath Overview
15
FabricPath Benefits Simplicity: no need to manually configure a subnet hierarchy; layer 2 is still “plug-n-play” Performance: optimal use of bandwidth; all the shortest paths between any two devices can be used concurrently Agility: fabric can be modified in real time without traffic interruption or server reconfiguration Add links between switches to increase bandwidth To scale up, add switches
16
Scale Bandwidth while Reducing Switches
17
FabricPath Enables VM Mobility
18
Reliability
19
Data Center Failure Events
20
Hardware Cannot Be Reliable Enough
If servers are 99% reliable, then a system with 10 servers is ≈ 90% reliable a system with 100 servers is ≈ 37% reliable
21
Fault-Tolerant Software Architecture
Must use fault-tolerant software architecture Hardware must detect faults Hardware must notify software in timely fashion Fault-tolerant architecture reduces costs Choose hardware reliability level that maximizes cost efficiency, not just reliability Fault-tolerant architecture can improve perf Spreading processing and storage across many servers improves bandwidth and CPU capacity
22
Causes of Service Disruptions
23
Economics
24
Total Cost of Ownership (TCO)
TCO = Data Center Depreciation + Data Center Operating Expenses (Opex) + Server Depreciation + Server Operating Expenses (Opex) Depreciation is the process of allocating cost of assets across period during which assets are used. Example: server cost = $10,000, $0 residual value annual depreciation over 4 years = $2500
25
Cost to Build Data Center
Primary components (power, cooling, space) scale roughly linearly with space. 80% of total construction cost goes to power + cooling Typical depreciation periods of years
26
Operational Costs Operational costs include
Electricity Salaries for personnel Server maintenance contracts Software licenses Larger data centers are cheaper Smaller number of sysadmins per server Fixed number of security guards For multi-MW data center, $0.02-$0.08/month
27
Case Study Tier 3 multi-MW data center
Dell 2950 III EnergySmart servers (300W, $6000) Cost of electricity is 6.2₵/kW Servers financed with 3-year 12% Cost of DC construction is $15/W, 12-yr lifetime DC opex is 4₵/month PUE = 2.0 Server lifetime is 3 years Server maintenance is 5% of capex Server avg power = 75% peak
28
Case Study Cost Breakdown
29
Key Points DC networking limits service/VM mobility
Hurt much more by oversubscription than office network since greater bandwidth needed New technology migrates layer 3 to layer 2: routing, multipathing, TTL Hardware cannot be reliable enough for cloud Must use fault-tolerant software architecture Data Center Economics TCO = DC depr + DC opex + Svr depr + Svr opex
30
References Luiz Andre Barroso and Urs Holzle, The Case for Energy-Proportional Computing, IEEE Computer, Vol 40, Issue 12, December 2007. Luiz Andre Barroso and Urs Holzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edition, Morgan and Claypool Publishers Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso, Power provisioning for a warehouse-sized computer, ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta VL2: a scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication (SIGCOMM '09). ACM, New York, NY, USA, Thomas A. Limoncelli, Christina J. Hogan, and Strata R. Chalup, The Practice of System and Network Administration, Second Edition, Addison-Wesley Professional, 2007. Evi Nemeth, Garth Snyder, Trent R. Hein, Ben Whaley, UNIX and Linux System Administration Handbook, 4th edition, Prentice Hall, 2010.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.