CIT 668: System Architecture

CIT 668: System Architecture
Data Centers II

Topics Containers Data Center Network Reliability Economics

Containers

Containers Data Center in a shipping container. Advantages
4-10X normal data center density. 1000s of servers. 100s of kW of power. Advantages Efficient cooling High server density Rapid deployment Scalability Vendors: HP, IBM, Rackable, Sun, Verari Images from Vendor offerings:

Microsoft Chicago Data Center

Google Container Patents
Containers docked at central power spline Vertical stack of containers Container air flow diagram, with a center cold aisle and hot air return behind servers

Data Center Networks

Performance Impact of App Living Across
Data Center Racks Servers Processors Cores Bandwidth increases Latency, storage, processing power increases

Conventional DC Network Design
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta VL2: a scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication (SIGCOMM '09). ACM, New York, NY, USA, DOI= /

Limitations of Conventional DC Network
Layer 2 is flexible, allowing movement of a device transparently to the IP layer. Layer 3 divides network into subnet hierarchy, which requires address reassignment to move But cannot scale due to STP, which reduces number of paths and enforces inefficient paths Must buy extra switches for redundancy STP prevents using their capacity (low utilization) Layer 3 segmentation is static and difficult to change quickly Spreading a service outside a single layer 2 domain requires reconfiguring IP addresses and VLAN trunks

Over-Subscription Rack uplink oversubscription Global oversubscription
ToR switches 1:5 to 1:20 oversubscribed 1-4 Gbps of uplink for 20 servers at 1Gbps each Global oversubscription Paths through top of network 1:80 to 1:240 over Oversubscription exacerbated by STP Single path between any 2 bridges in network Redundant links blocked, reducing bandwidth

Impact of Over-Subscription
Mostly vertical flows Oversubscription impact minimal in client/server applications Mix of vertical and horizontal flows Oversubscription limits capacity to communicate between servers

Solution: Improve Layer 2 Networking
IEEE Data Center Bridging Add flow control and multipathing standards Infiniband Widely used in top supercomputer clusters 40 Gbps is widely used; 10 Gbps is slowest available FabricPath Cisco feature to bring routing features to layer 2 Extends layer 2 header with Fabricpath header

FabricPath Overview

FabricPath Benefits Simplicity: no need to manually configure a subnet hierarchy; layer 2 is still “plug-n-play” Performance: optimal use of bandwidth; all the shortest paths between any two devices can be used concurrently Agility: fabric can be modified in real time without traffic interruption or server reconfiguration Add links between switches to increase bandwidth To scale up, add switches

Scale Bandwidth while Reducing Switches

FabricPath Enables VM Mobility

Reliability

Data Center Failure Events

Hardware Cannot Be Reliable Enough
If servers are 99% reliable, then a system with 10 servers is ≈ 90% reliable a system with 100 servers is ≈ 37% reliable

Fault-Tolerant Software Architecture
Must use fault-tolerant software architecture Hardware must detect faults Hardware must notify software in timely fashion Fault-tolerant architecture reduces costs Choose hardware reliability level that maximizes cost efficiency, not just reliability Fault-tolerant architecture can improve perf Spreading processing and storage across many servers improves bandwidth and CPU capacity

Causes of Service Disruptions

Economics

Total Cost of Ownership (TCO)
TCO = Data Center Depreciation + Data Center Operating Expenses (Opex) + Server Depreciation + Server Operating Expenses (Opex) Depreciation is the process of allocating cost of assets across period during which assets are used. Example: server cost = $10,000, $0 residual value annual depreciation over 4 years = $2500

Cost to Build Data Center
Primary components (power, cooling, space) scale roughly linearly with space. 80% of total construction cost goes to power + cooling Typical depreciation periods of years

Operational Costs Operational costs include
Electricity Salaries for personnel Server maintenance contracts Software licenses Larger data centers are cheaper Smaller number of sysadmins per server Fixed number of security guards For multi-MW data center, $0.02-$0.08/month

Case Study Tier 3 multi-MW data center
Dell 2950 III EnergySmart servers (300W, $6000) Cost of electricity is 6.2₵/kW Servers financed with 3-year 12% Cost of DC construction is $15/W, 12-yr lifetime DC opex is 4₵/month PUE = 2.0 Server lifetime is 3 years Server maintenance is 5% of capex Server avg power = 75% peak

Case Study Cost Breakdown

Key Points DC networking limits service/VM mobility
Hurt much more by oversubscription than office network since greater bandwidth needed New technology migrates layer 3 to layer 2: routing, multipathing, TTL Hardware cannot be reliable enough for cloud Must use fault-tolerant software architecture Data Center Economics TCO = DC depr + DC opex + Svr depr + Svr opex

References Luiz Andre Barroso and Urs Holzle, The Case for Energy-Proportional Computing, IEEE Computer, Vol 40, Issue 12, December 2007. Luiz Andre Barroso and Urs Holzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edition, Morgan and Claypool Publishers Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso, Power provisioning for a warehouse-sized computer, ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta VL2: a scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication (SIGCOMM '09). ACM, New York, NY, USA, Thomas A. Limoncelli, Christina J. Hogan, and Strata R. Chalup, The Practice of System and Network Administration, Second Edition, Addison-Wesley Professional, 2007. Evi Nemeth, Garth Snyder, Trent R. Hein, Ben Whaley, UNIX and Linux System Administration Handbook, 4th edition, Prentice Hall, 2010.

CIT 668: System Architecture

Similar presentations

Presentation on theme: "CIT 668: System Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CIT 668: System Architecture

Similar presentations

Presentation on theme: "CIT 668: System Architecture"— Presentation transcript:

Similar presentations

About project

Feedback