Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIT 668: System Architecture

Similar presentations


Presentation on theme: "CIT 668: System Architecture"— Presentation transcript:

1 CIT 668: System Architecture
Data Centers II

2 Topics Containers Data Center Network Reliability Economics

3 Containers

4 Containers Data Center in a shipping container. Advantages
4-10X normal data center density. 1000s of servers. 100s of kW of power. Advantages Efficient cooling High server density Rapid deployment Scalability Vendors: HP, IBM, Rackable, Sun, Verari Images from Vendor offerings:

5 Microsoft Chicago Data Center

6 Google Container Patents
Containers docked at central power spline Vertical stack of containers Container air flow diagram, with a center cold aisle and hot air return behind servers

7 Data Center Networks

8 Performance Impact of App Living Across
Data Center Racks Servers Processors Cores Bandwidth increases Latency, storage, processing power increases

9 Conventional DC Network Design
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta VL2: a scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication (SIGCOMM '09). ACM, New York, NY, USA, DOI= /

10 Limitations of Conventional DC Network
Layer 2 is flexible, allowing movement of a device transparently to the IP layer. Layer 3 divides network into subnet hierarchy, which requires address reassignment to move But cannot scale due to STP, which reduces number of paths and enforces inefficient paths Must buy extra switches for redundancy STP prevents using their capacity (low utilization) Layer 3 segmentation is static and difficult to change quickly Spreading a service outside a single layer 2 domain requires reconfiguring IP addresses and VLAN trunks

11 Over-Subscription Rack uplink oversubscription Global oversubscription
ToR switches 1:5 to 1:20 oversubscribed 1-4 Gbps of uplink for 20 servers at 1Gbps each Global oversubscription Paths through top of network 1:80 to 1:240 over Oversubscription exacerbated by STP Single path between any 2 bridges in network Redundant links blocked, reducing bandwidth

12 Impact of Over-Subscription
Mostly vertical flows Oversubscription impact minimal in client/server applications Mix of vertical and horizontal flows Oversubscription limits capacity to communicate between servers

13 Solution: Improve Layer 2 Networking
IEEE Data Center Bridging Add flow control and multipathing standards Infiniband Widely used in top supercomputer clusters 40 Gbps is widely used; 10 Gbps is slowest available FabricPath Cisco feature to bring routing features to layer 2 Extends layer 2 header with Fabricpath header

14 FabricPath Overview

15 FabricPath Benefits Simplicity: no need to manually configure a subnet hierarchy; layer 2 is still “plug-n-play” Performance: optimal use of bandwidth; all the shortest paths between any two devices can be used concurrently Agility: fabric can be modified in real time without traffic interruption or server reconfiguration Add links between switches to increase bandwidth To scale up, add switches

16 Scale Bandwidth while Reducing Switches

17 FabricPath Enables VM Mobility

18 Reliability

19 Data Center Failure Events

20 Hardware Cannot Be Reliable Enough
If servers are 99% reliable, then a system with 10 servers is ≈ 90% reliable a system with 100 servers is ≈ 37% reliable

21 Fault-Tolerant Software Architecture
Must use fault-tolerant software architecture Hardware must detect faults Hardware must notify software in timely fashion Fault-tolerant architecture reduces costs Choose hardware reliability level that maximizes cost efficiency, not just reliability Fault-tolerant architecture can improve perf Spreading processing and storage across many servers improves bandwidth and CPU capacity

22 Causes of Service Disruptions

23 Economics

24 Total Cost of Ownership (TCO)
TCO = Data Center Depreciation + Data Center Operating Expenses (Opex) + Server Depreciation + Server Operating Expenses (Opex) Depreciation is the process of allocating cost of assets across period during which assets are used. Example: server cost = $10,000, $0 residual value annual depreciation over 4 years = $2500

25 Cost to Build Data Center
Primary components (power, cooling, space) scale roughly linearly with space. 80% of total construction cost goes to power + cooling Typical depreciation periods of years

26 Operational Costs Operational costs include
Electricity Salaries for personnel Server maintenance contracts Software licenses Larger data centers are cheaper Smaller number of sysadmins per server Fixed number of security guards For multi-MW data center, $0.02-$0.08/month

27 Case Study Tier 3 multi-MW data center
Dell 2950 III EnergySmart servers (300W, $6000) Cost of electricity is 6.2₵/kW Servers financed with 3-year 12% Cost of DC construction is $15/W, 12-yr lifetime DC opex is 4₵/month PUE = 2.0 Server lifetime is 3 years Server maintenance is 5% of capex Server avg power = 75% peak

28 Case Study Cost Breakdown

29 Key Points DC networking limits service/VM mobility
Hurt much more by oversubscription than office network since greater bandwidth needed New technology migrates layer 3 to layer 2: routing, multipathing, TTL Hardware cannot be reliable enough for cloud Must use fault-tolerant software architecture Data Center Economics TCO = DC depr + DC opex + Svr depr + Svr opex

30 References Luiz Andre Barroso and Urs Holzle, The Case for Energy-Proportional Computing, IEEE Computer, Vol 40, Issue 12, December 2007. Luiz Andre Barroso and Urs Holzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st edition, Morgan and Claypool Publishers Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso, Power provisioning for a warehouse-sized computer, ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta VL2: a scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication (SIGCOMM '09). ACM, New York, NY, USA, Thomas A. Limoncelli, Christina J. Hogan, and Strata R. Chalup, The Practice of System and Network Administration, Second Edition, Addison-Wesley Professional, 2007. Evi Nemeth, Garth Snyder, Trent R. Hein, Ben Whaley, UNIX and Linux System Administration Handbook, 4th edition, Prentice Hall, 2010.


Download ppt "CIT 668: System Architecture"

Similar presentations


Ads by Google