Operating Dedicated Data Centers – Is It Cost-Effective? CHEP 2013 - Amsterdam Tony Wong - Brookhaven National Lab.

Slides:



Advertisements
Similar presentations
Computer Room Provision in Atlas and R89 Graham Robinson.
Advertisements

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Deploying GMP Applications Scott Fry, Director of Professional Services.
Chapter 4 Infrastructure as a Service (IaaS)
The CDCE BNL HEPIX – LBL October 28, 2009 Tony Chan - BNL.
Cloud Computing Imranul Hoque. Today’s Cloud Computing.
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INSFO-RI Ioannis Konstantinou Greek.
OPNET Technologies, Inc. Performance versus Cost in a Cloud Computing Environment Yiping Ding OPNET Technologies, Inc. © 2009 OPNET Technologies, Inc.
1 Distributed Systems Meet Economics: Pricing in Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
Virtualization on the Intel Platform Scott Elliott Senior Systems Network Specialist Christie Digital A Customer Implementation with VMware and IBM.
UMF Cloud
Ken Birman. Massive data centers We’ve discussed the emergence of massive data centers associated with web applications and cloud computing Generally.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
Distributed Systems Meet Economics: Pricing In The Cloud Authors: Hongyi Wang, Qingfeng Jing, Rishan Chen, Bingsheng He, Zhengping He, Lidong Zhou Presenter:
How to build a subsidized virtualization service Dave Zavatson
Presented by Jacob Wilson SharePoint Practice Lead Bross Group 1.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Condor at Brookhaven Xin Zhao, Antonio Chan Brookhaven National Lab CondorWeek 2009 Tuesday, April 21.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Cloud Computing using AWS C. Edward Chow. Advanced Internet & Web Systems chow2 Outline of the Talk Introduction to Cloud Computing AWS EC2 EC2 API A.
Leveraging the Cloud for Green IT: Predicting the Energy, Cost & Performance of Cloud Computing © 2009 Optimal Innovations, Hyperformix & RS Performance.
Discussion on Financial Models for Shared IT Services CSG 5/14/09 Notes by Jerry Grochow.
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
SOLUTIONS FOR THE EFFICIENT ENTERPRISE Sameer Garde Country GM,India.
Introduction to Cloud Computing
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
Ian Alderman A Little History…
New Data Center at BNL– Status Update HEPIX – CERN May 6, 2008 Tony Chan - BNL.
Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.
Your business runs even when your server doesn’t DR Recommendation November 2011.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
How AWS Pricing Works Jinesh Varia Technology Evangelist.
The GRID and the Linux Farm at the RCF HEPIX – Amsterdam HEPIX – Amsterdam May 19-23, 2003 May 19-23, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
U.S. ATLAS Tier 1 Planning Rich Baker Brookhaven National Laboratory US ATLAS Computing Advisory Panel Meeting Argonne National Laboratory October 30-31,
Brookhaven Analysis Facility Michael Ernst Brookhaven National Laboratory U.S. ATLAS Facility Meeting University of Chicago, Chicago 19 – 20 August, 2009.
HEPiX Fall 2014 Tony Wong (BNL) UPS Monitoring with Sensaphone: A cost-effective solution.
The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind,
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
High Performance Computing (HPC) Data Center Proposal Imran Latif, Facility Project Manager Scientific & Enterprise Computing Data Centers at BNL 10/14/2015.
Virtualization Supplemental Material beyond the textbook.
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS
CHEP 2010 – Taiwan Oct. 18, 2010 Tony Wong (Brookhaven National Lab)
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Extending Auto-Tiering to the Cloud For additional, on-demand, offsite storage resources 1.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
GPCF* Update Present status as a series of questions / answers related to decisions made / yet to be made * General Physics Computing Facility (GPCF) is.
Expansion Plans for the Brookhaven Computer Center HEPIX – St. Louis November 7, 2007 Tony Chan - BNL.
Enterprise Vitrualization by Ernest de León. Brief Overview.
CHEP 2016 – San Francisco Tony Wong Brookhaven National Laboratory
Public vs Private Cloud Usage Costs:
Hyper-Converged Technology a User's Perspective
Brandon Hixon Jonathan Moore
Presentation transcript:

Operating Dedicated Data Centers – Is It Cost-Effective? CHEP Amsterdam Tony Wong - Brookhaven National Lab

Motivation Traditional HENP computing models mostly relied on dedicated data centers until recently Budgetary realities (and economies of scale) compel the HENP community to evaluate alternatives For-profit providers (Amazon, Google) offer cloud services Virtual organizations (OSG, EGI, etc) are harnessing the power of non-dedicated resources

RACF Context at BNL Dedicated resources – Tier0 for RHIC computing – U.S. Tier 1 for ATLAS – Two (BNL, Wisconsin) Tier 3 for ATLAS – Other (LBNE, LSST, etc) Over 2,200 servers, 23,000 physical cores and 16 PB of worker node-based storage Robotic storage and other disk-based distributed storage services

Usage Profiles Monte Carlo – minimal local dependencies – long-running, cpu-bound, low I/O requirements Analysis – some local dependencies – Variable length and I/O requirements Interactive – significant local dependencies – short-running, high I/O requirements

Amazon EC2 Full details at aws.amazon.com/ec2/pricing. Linux virtual instance – 1 ECU = 1.2 GHz Xeon processor from 2007 (HS06 ~ 8/core) – 2.2 GHz Xeon (Sandybridge) in 2013  HS06 ~ 38/core Pricing is dynamic and region-based. Above prices were current on August 23, 2013 for Eastern US. TypeECURAM (GB)Storage (GB) Network I/O Cost/hr (US$) spotm1.small low0.007 spotm1.medium moderate0.013 On-demandm1.medium moderate0.12

BNL Experience with EC2 Ran ~5000 EC2 jobs for ~3 weeks (January 2013) – Tried m1.small with spot instance – Spent US $13k Strategy – Declare maximum acceptable price, but pay current, variable spot price. When spot price exceeds maximum acceptable price, instance (and job) is terminated without warning – Maximum acceptable price = 3 x baseline  $0.021/hr Low efficiency for long jobs due to eviction policy EC2 jobs took ~50% longer (on average) to run when compared to dedicated facility

Google Compute Engine (GCE) Standard Instance type (equivalent to EC2’s On- Demand) – Linux (Ubuntu or CentOS) – $0.132/hr (1 virtual core, 3.75 GB RAM, 420 GB storage)  similar to EC2’s on-demand m1.medium – Evaluation 458k jobs (mix of evgen, fast sim and full sim) Ran for 7 weeks (March-April 2013) Custom image based on Google’s CentOS 6 Sergey Panitkin’s presentation—ATLAS Cloud Computing R&D.

Costs of Dedicated Facilities Direct Costs – Hardware (servers, network, etc) – Software (licenses, support, etc) Indirect Costs – Staff – Infrastructure Power Space (includes cooling) – Other?

Growth of RACF Computing Cluster

Server Costs Standard 1-U or 2-U servers Includes server, rack, rack pdu’s, rack switches, all hardware installation (does not include network cost) Hardware configuration changes (ie, more RAM, storage, etc) not decoupled from server costs  partly responsible for fluctuations

Network Network 1 GbE connectivity (switch, line cards, cabling, etc) for 900 hosts costs ~$450k Cost per core is ~$45 assuming our newest configuration (dual 8-core Sandybridge cpu’s) Assume network equipment is used during the lifetime of the Linux Farm hosts (4 years for USATLAS and 6 years for RHIC) Calculate cost per core per year by amortizing cost of network equipment over the lifetime of the servers

Software Computing cluster uses mostly open-source software (free) Three exceptions – Ksplice (reboot-less kernel patching software) – Synapsense (real-time 3-D heat map of data center for environmental monitoring) – presentation by Alexandr Zaytsev at CHEP2013 – Sensaphone (power monitoring and alert system used to detect when on UPS power) Negligibly small combined cost (~$3/core each for RHIC and USATLAS)

Electrical Costs Increasingly power-efficient hardware has decreased power consumption per core at the RACF in recent years RHIC costs higher than USATLAS due to differences in hardware configuration and usage patterns Average instantaneous power consumption per core was ~25 W in 2012

Overall Data Center Space Charges Overhead charged to program funds to pay for data center infrastructure (cooling, UPS, building lights, cleaning, physical security, repairs, etc) maintenance—upward trend a concern Based on footprint (~13,000 ft 2 or ~1200 m 2 ) and other factors USATLAS occupies ~60% of the total area. Rate reset on a yearly basis – not predictable

Space Charges for Servers Space charges incurred by Linux Farm clusters Estimated on approximate footprint occupied Charges/core dropping but overall footprint expanding

Historical Cost/Core Includes data BNL-imposed overhead included Amortize server and network over 4 or 6 (USATLAS/RHIC) years and use only physical cores RACF Compute Cluster staffed by 4 FTE ($200k/FTE) About 25-31% contribution from other-than-server USATLASRHIC Server$228/yr$277/yr Network$28/yr$26/yr Software$3/yr Staff$34/yr Electrical$12/yr$16/yr Space$27/yr$13/yr Total$332/yr ($0.038/hr)$369/yr ($0.042/hr)

Trends Hardware cost/core is flattening out Space charges are trending higher due to internal BNL dynamics Network cost trends in medium-term will be affected by technology choices – Remain at 1 GbE for now – Transition to 10 GbE in 2015? $63/core (at 2013 prices)  substantially lower in 2 years? – Is Infiniband a realistic alternative? Evaluation underway (estimate ~30-50% lower than 10 GbE)

Resiliency of Data Storage What if duplication of derived data at Tier1 is a requirement? – to avoid catastrophic loss of data (ie, fire, hurricane, etc) – to overcome hardware failure BNL is investigating: – Robotic tapes – NAS (BlueArc, DDN, MAPr, etc) – Worker nodes (via dCache, Hadoop, MAPr, etc) – Cloud (EC2, GCE, etc)

Cost of Data Duplication Raw costs – Cloud (EC2)  $0.05/GB/month – Worker nodes  $0.08/GB – NAS  $0.20/GB – Tape (T10K)  $0.03/GB Example of 15 PB initial deployment on worker nodes – Current usage at RACF – Assume 100% duplication – Add $39/yr to cost/core of worker nodes – High-Availability requirement (UPS back-up costs?)

Summary Cost of computing/core at dedicated data centers compare favorably with cloud costs – $0.04/hr (RACF) vs. $0.12/hr (EC2) – Near-term trends Hardware Infrastructure Staff Data duplication Data duplication requirements will raise costs and complexity – not a free ride

Back-up slides

Cost per HS06

Additional RACF Details All RACF calculations assumed physical cores – In 2013, the RACF average physical core is rated at ~16 HS06 – 2 ECU rating directly comparable to some EC2 instances – In practice, RACF hyperthreads all its physical cores, doubling the number of compute cores Hyperthreaded cores – adds ~30% to HS06 rating (~21 for 2013) – Approximately ~2.6 ECU