Presentation is loading. Please wait.

Presentation is loading. Please wait.

Expansion Plans for the Brookhaven Computer Center HEPIX – St. Louis November 7, 2007 Tony Chan - BNL.

Similar presentations


Presentation on theme: "Expansion Plans for the Brookhaven Computer Center HEPIX – St. Louis November 7, 2007 Tony Chan - BNL."— Presentation transcript:

1 Expansion Plans for the Brookhaven Computer Center HEPIX – St. Louis November 7, 2007 Tony Chan - BNL

2 Background Brookhaven National Lab (BNL) is a U.S. gov`t funded multi-disciplinary research laboratory. Brookhaven National Lab (BNL) is a U.S. gov`t funded multi-disciplinary research laboratory. RACF formed in the mid-90`s to address computing needs of RHIC experiments. Became U.S. Tier 1 Center for ATLAS in late 90`s. RACF formed in the mid-90`s to address computing needs of RHIC experiments. Became U.S. Tier 1 Center for ATLAS in late 90`s. RACF supports HENP and HEP scientific computing efforts and also provides various general services (backup, e-mail, web, off-site data transfer, Grid, etc). RACF supports HENP and HEP scientific computing efforts and also provides various general services (backup, e-mail, web, off-site data transfer, Grid, etc).

3 Background (cont.) Growing operational complexity: local  global resource Growing operational complexity: local  global resource Increasing staffing levels to handle additional responsibilities  nearly 40 FTE Increasing staffing levels to handle additional responsibilities  nearly 40 FTE Almost 9 million SI2K in computing capacity Almost 9 million SI2K in computing capacity Over 3 PB of disk storage capacity Over 3 PB of disk storage capacity Over 7 PB of tape storage capacity Over 7 PB of tape storage capacity

4 Staff Growth at the RACF

5 The Growth of the Linux Farm

6 Total Distributed Storage Capacity

7 Evolution of Space Usage Capacity of current data center Intel dual and quad-core deployed

8 Evolution of Power Usage Existing UPS Capacity

9 Evolution of Power Costs Unexpected decrease in cost/kW-hr Estimates assume 8 cents/kW-hr

10 How did we get in trouble? Bought more than planned because of unexpected favorable prices Bought more than planned because of unexpected favorable prices Failed to prepare adequately for increase in power/cooling needs – items that require long delivery times Failed to prepare adequately for increase in power/cooling needs – items that require long delivery times Inefficient use of available infrastructure (4,500 square feet of total space and 1 MW of UPS-backed power) Inefficient use of available infrastructure (4,500 square feet of total space and 1 MW of UPS-backed power) Increasing cost (5  7-10 cents) per kW-hr since 2003 Increasing cost (5  7-10 cents) per kW-hr since 2003 Running out of space, power & cooling in current facility Running out of space, power & cooling in current facility

11 What are we doing about it? More efficient use of current data center resources More efficient use of current data center resources Emphasize power efficiency in new purchases Emphasize power efficiency in new purchases Renovating 2,000 sq. ft. and adding 300 KW of power for RACF – availability in October 2008 Renovating 2,000 sq. ft. and adding 300 KW of power for RACF – availability in October 2008 Building an additional 7,000 sq. ft. and 1.5 MW of power – availability in summer 2009 Building an additional 7,000 sq. ft. and 1.5 MW of power – availability in summer 2009

12 Improvements to Current Data Center Better layout to maximize floor space Better layout to maximize floor space Additional rack-top cooling units for “hot spots” Additional rack-top cooling units for “hot spots” Additional PDU/UPS (up to 240 kW) units to complement existing UPS Additional PDU/UPS (up to 240 kW) units to complement existing UPS Use of 3-phase power (208V/30A) to maximize usage of PDU`s Use of 3-phase power (208V/30A) to maximize usage of PDU`s

13 Rack-Top Cooling Units

14

15 Data Center Layout in 2007

16 Power Efficiency Deploy multi-core processors Deploy multi-core processors Investigate blade servers Investigate blade servers DC-powered servers DC-powered servers Virtualization Virtualization Mobile Data Centers Mobile Data Centers Other power saving techniques Other power saving techniques

17 Multi-core processors First purchase of AMD (Opteron 265) dual-core in 2006 First purchase of AMD (Opteron 265) dual-core in 2006 20% power savings when compared to previous generation of single-core Intel Xeon processors (3.4 GHz) 20% power savings when compared to previous generation of single-core Intel Xeon processors (3.4 GHz) First purchase of Intel quad-core (Xeon E5335) in 2007 First purchase of Intel quad-core (Xeon E5335) in 2007 Improved SI2K/Watt should translate to more power savings Improved SI2K/Watt should translate to more power savings

18 SI2K per Watt Improvements AMD dual-core deployed Intel dual and quad-core deployed

19 Blade Servers Better SI2K/Watt than 1-U servers (51.1 vs. 40.9, according to IBM’s power calculator) Better SI2K/Watt than 1-U servers (51.1 vs. 40.9, according to IBM’s power calculator) Increased density and power requirements (up to 17.5 kW/rack) is a big problem Increased density and power requirements (up to 17.5 kW/rack) is a big problem Plan to test blades with real-life applications from ATLAS Plan to test blades with real-life applications from ATLAS Hardware capacity (disk, RAM, etc) is a drawback for blades when compared to 1-U servers Hardware capacity (disk, RAM, etc) is a drawback for blades when compared to 1-U servers

20 DC-powered servers DC-powered servers made by a few suppliers DC-powered servers made by a few suppliers Steep up-front costs for a DC power distribution system – suitable only for large installations or new buildings Steep up-front costs for a DC power distribution system – suitable only for large installations or new buildings Alternative is to plug rectifiers between AC source and server rack Alternative is to plug rectifiers between AC source and server rack A DC-powered server with rectifier from Rackable yielded only 5% savings  not very significant A DC-powered server with rectifier from Rackable yielded only 5% savings  not very significant

21 Virtualization Virtualization may help to maximize cluster usage and minimize need for more hardware Virtualization may help to maximize cluster usage and minimize need for more hardware Collapse multiple applications on fewer servers and cluster them for failover protection  power and space savings Collapse multiple applications on fewer servers and cluster them for failover protection  power and space savings Extensive evaluation of Xen and Vmware at BNL for the past year Extensive evaluation of Xen and Vmware at BNL for the past year Initial deployment beginning now Initial deployment beginning now Not a cure-all  not recommended for certain applications Not a cure-all  not recommended for certain applications

22 Mobile Data Centers Used by large financial institutions for high-demand, short- duration needs – Project Blackbox by Sun, other suppliers Used by large financial institutions for high-demand, short- duration needs – Project Blackbox by Sun, other suppliers Shipping container with 2000 cores – mobile and easy deployment Shipping container with 2000 cores – mobile and easy deployment Not seriously considered at BNL – issues with protection of sensitive data, integration with existing hardware, incompatible computing models Not seriously considered at BNL – issues with protection of sensitive data, integration with existing hardware, incompatible computing models Does not fully address our power/space problems Does not fully address our power/space problems

23 Other Power Savings Techniques New CPU chips have wake-on features (AMD’s PowerNow and Intel’s SpeedStep) – not used at BNL because most of our servers are utilized at the > 80% level New CPU chips have wake-on features (AMD’s PowerNow and Intel’s SpeedStep) – not used at BNL because most of our servers are utilized at the > 80% level Most suppliers provide low-efficiency (65%-75%) power supplies (PS) – high-efficiency PS (>85%) are available at a higher cost Most suppliers provide low-efficiency (65%-75%) power supplies (PS) – high-efficiency PS (>85%) are available at a higher cost Require metered rack PDU’s since 2006  measure and (later) collect historical power information to understand dynamic power load Require metered rack PDU’s since 2006  measure and (later) collect historical power information to understand dynamic power load

24 New Data Center Increased raised floor (12  36 inches) for higher air (cooling) flow Increased raised floor (12  36 inches) for higher air (cooling) flow Cable trays (above or below raised floor) for improved cable management and air (cooling) flow Cable trays (above or below raised floor) for improved cable management and air (cooling) flow Building properly designed (reinforced raised floor, large power and cooling capacities, 13-ft ceilings, proper ventilation and insulation, etc) Building properly designed (reinforced raised floor, large power and cooling capacities, 13-ft ceilings, proper ventilation and insulation, etc) Dedicated to meet RACF computing needs until 2014 Dedicated to meet RACF computing needs until 2014

25 Why We Need Better Cable Management Why We Need Better Cable Management

26 Data Center Layout in 2009

27 Beyond 2014 Long-range plan for a 25,000 square foot new data center to serve all of BNL Long-range plan for a 25,000 square foot new data center to serve all of BNL Part of BNL long-range plan, but not funded at present Part of BNL long-range plan, but not funded at present We expect RACF computing requirements to exceed existing data center capacity by 2014, including the new space available in 2009 We expect RACF computing requirements to exceed existing data center capacity by 2014, including the new space available in 2009

28 Beyond 2014 (cont.) Computing requirements for LEP and FNAL programs were underestimated by a factor of 10 early in their programs Computing requirements for LEP and FNAL programs were underestimated by a factor of 10 early in their programs http://www.fnal.gov/projects/monarc/task2/feb_23_99.txt http://www.fnal.gov/projects/monarc/task2/feb_23_99.txt http://www.fnal.gov/projects/monarc/task2/feb_23_99.txt Similar underestimates for RHIC program Similar underestimates for RHIC program It would be wise to have the data center infrastructure potential capacity to exceed “long-term” RHIC and ATLAS requirements, even if they are not available right away It would be wise to have the data center infrastructure potential capacity to exceed “long-term” RHIC and ATLAS requirements, even if they are not available right away

29 Summary The growth of the RACF in the past few years exposed severe infrastructure problems in the data center The growth of the RACF in the past few years exposed severe infrastructure problems in the data center Learned some valuable lessons on maximizing use of existing infrastructure (and some lessons on what NOT to do) Learned some valuable lessons on maximizing use of existing infrastructure (and some lessons on what NOT to do) Actively evaluating new technologies and approaches to operating a sustainable data center Actively evaluating new technologies and approaches to operating a sustainable data center Upgrading existing infrastructure to stretch facility until 2009 Upgrading existing infrastructure to stretch facility until 2009

30 Summary (cont.) Renovated space and additional power available in October 2008 – some breathing room Renovated space and additional power available in October 2008 – some breathing room New data center dedicated to RACF will be available in summer 2009 and is expected to meet our needs until 2014 New data center dedicated to RACF will be available in summer 2009 and is expected to meet our needs until 2014 New facility needed after 2014 New facility needed after 2014


Download ppt "Expansion Plans for the Brookhaven Computer Center HEPIX – St. Louis November 7, 2007 Tony Chan - BNL."

Similar presentations


Ads by Google