CERN Data Centre Evolution Gavin SDCD12: Supporting Science with Cloud Computing Bern 19 th November 2012.

Slides:



Advertisements
Similar presentations
University of St Andrews School of Computer Science Experiences with a Private Cloud St Andrews Cloud Computing co-laboratory James W. Smith Ali Khajeh-Hosseini.
Advertisements

System Center 2012 R2 Overview
Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,
Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
Ben Jones 12/9/2013 NEC'20132.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Microsoft Virtual Server 2005 Product Overview Mikael Nyström – TrueSec AB MVP Windows Server – Setup/Deployment Mikael Nyström – TrueSec AB MVP Windows.
Tim 23/07/2014 2OSCON - CERN Mass and Agility.
Virtualization and the Cloud
M.A.Doman Model for enabling the delivery of computing as a SERVICE.
European Organization for Nuclear Research Virtualization Review and Discussion Omer Khalid 17 th June 2010.
Virtualization for Cloud Computing
Welcome Course 20410B Module 0: Introduction Audience
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
5205 – IT Service Delivery and Support
Public and Private Clouds: Working Together
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Deploying and Managing Windows Server 2012
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
Opensource for Cloud Deployments – Risk – Reward – Reality
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
CERN Computing Infrastructure Evolution Tim Bell PH/SFT Group Meeting 18 th February CERN Infrastructure EvolutionTim Bell, CERN.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Planning and Designing Server Virtualisation.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
Rackspace Analyst Event Tim Bell
COMS E Cloud Computing and Data Center Networking Sambit Sahu
From Virtualization Management to Private Cloud with SCVMM 2012 Dan Stolts Sr. IT Pro Evangelist Microsoft Corporation
Configuration Management Evolution at CERN Gavin
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Jose Castro Leon CERN – IT/OIS CERN Agile Infrastructure Infrastructure as a Service.
Building Hosted Private and Public Clouds Using Windows Server 2012 Yigal Edery Principal Program Manager Microsoft Corporation Joshua Adams Senior Program.
CERN IT Department CH-1211 Genève 23 Switzerland t The Agile Infrastructure Project Part 1: Configuration Management Tim Bell Gavin McCance.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Agile Infrastructure IaaS Compute Jan van Eldik CERN IT Department Status Update 6 July 2012.
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
VMware vSphere Configuration and Management v6
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 Automate your way to.
CERN IT Department CH-1211 Genève 23 Switzerland t IT Configuration Activities Gavin McCance Online Cross-experiment Meeting, 14 June 2012.
Agile Infrastructure: an updated overview of IaaS at CERN
1 CERN IT Department CH-1211 Genève 23 Switzerland t Puppet in the CERN CC Tomas Karasek Steve Traylen Oct
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Tim Bell 04/07/2013 Intel Openlab Briefing2.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Turn Bare Metal Into Silver Lining With SCVMM 2012, Today! Mark Rhodes OBS SESSION CODE: SEC313 (c) 2011 Microsoft. All rights reserved.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
OpenStack Chances and Practice at IHEP Haibo, Li Computing Center, the Institute of High Energy Physics, CAS, China 2012/10/15.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
Virtual Server Server Self Service Center (S3C) JI July.
Windows Certification Paths OR MCSA Windows Server 2012 Installing and Configuring Windows Server 2012 Exam (20410) Administering Windows Server.
A Measured Approach to Virtualization Don Mendonsa Lawrence Livermore National Laboratory NLIT 2008 by LLNL-PRES
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
CERN Computing Infrastructure Evolution Tim Bell IN2P3 2 nd April CERN Infrastructure EvolutionTim Bell, CERN.
Smart Cities and Communities and Social Innovation
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
SCD Cloud at STFC By Alexander Dibbo.
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Accelerating Science with OpenStack
Accelerating Science with OpenStack
Accelerating Science with OpenStack
Understanding the Universe with help from OpenStack, CERN and Budapest
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
* Introduction to Cloud computing * Introduction to OpenStack * OpenStack Design & Architecture * Demonstration of OpenStack Cloud.
OpenStack Summit Berlin – November 14, 2018
Presentation transcript:

CERN Data Centre Evolution Gavin SDCD12: Supporting Science with Cloud Computing Bern 19 th November 2012

What is CERN ? Gavin McCance, CERN2 Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics Between Geneva and the Jura mountains, straddling the Swiss-French border Founded in 1954 with an international treaty Our business is fundamental physics, what is the universe made of and how does it work

Gavin McCance, CERN3 Answering fundamental questions… How to explain particles have mass? We have theories and accumulating experimental evidence.. Getting close… What is 96% of the universe made of ? We can only see 4% of its estimated mass! Why isn’t there anti-matter in the universe? Nature should be symmetric… What was the state of matter just after the « Big Bang » ? Travelling back to the earliest instants of the universe would help…

4 The Large Hadron Collider (LHC) tunnel Gavin McCance, CERN

5

6 Data Centre by Numbers – Hardware installation & retirement ~7,000 hardware movements/year; ~1,800 disk failures/year High Speed Routers (640 Mbps → 2.4 Tbps) 24 Ethernet Switches Gbps ports2,000 Switching Capacity4.8 Tbps 1 Gbps ports16, Gbps ports558 Racks828 Servers11,728 Processors15,694 Cores64,238 HEPSpec06482,507 Disks64,109 Raw disk capacity (TiB)63,289 Memory modules56,014 Memory capacity (TiB)158 RAID controllers3,749 Tape Drives160 Tape Cartridges45,000 Tape slots56,000 Tape Capacity (TiB)73,000 IT Power Consumption2,456 KW Total Power Consumption3,890 KW

Current infrastructure Around 12k servers – Dedicated compute, dedicated disk server, dedicated service nodes – Majority Scientific Linux (RHEL5/6 clone) – Mostly running on real hardware – Last couple of years, we’ve consolidated some of the service nodes onto Microsoft HyperV – Various other virtualisation projects around In 2002 we developed our own management toolset – Quattor / CDB configuration tool – Lemon computer monitoring – Open source, but a small community Gavin McCance, CERN7

Many diverse applications (”clusters”) Managed by different teams (CERN IT + experiment groups) Gavin McCance, CERN8

New data centre to expand capacity Gavin McCance, CERN9 Data centre in Geneva at the limit of electrical capacity at 3.5MW New centre chosen in Budapest, Hungary Additional 2.7MW of usable power Hands off facility Deploying from 2013 with 200Gbit/s network to CERN

Time to change strategy Rationale – Need to manage twice the servers as today – No increase in staff numbers – Tools becoming increasingly brittle and will not scale as-is Approach – CERN is no longer a special case for compute – Adopt an open source tool chain model – Our engineers rapidly iterate Evaluate solutions in the problem domain Identify functional gaps and challenge old assumptions Select first choice but be prepared to change in future – Contribute new function back to the community Gavin McCance, CERN10

Building Blocks Gavin McCance, CERN11 Bamboo Koji, Mock AIMS/PXE Foreman AIMS/PXE Foreman Yum repo Pulp Yum repo Pulp Puppet-DB mcollective, yum JIRA Lemon / Hadoop Lemon / Hadoop git OpenStack Nova OpenStack Nova Hardware database Puppet Active Directory / LDAP Active Directory / LDAP

Choose Puppet for Configuration The tool space has exploded in last few years – In configuration management and operations Puppet and Chef are the clear leaders for ‘core tools’ Many large enterprises now use Puppet – Its declarative approach fits what we’re used to at CERN – Large installations: friendly, wide-based community – You can buy books on it – You can employ people who know it better than do Gavin McCance, CERN12

Puppet Experience Excellent: basic puppet is easy to setup and can be scaled-up well Well documented, configuring services with it is easy Handle our cluster diversity and dynamic clouds well Lots of resource (“modules”) online, though of varying quality Large, responsive community to help Lots of nice tooling for free – Configuration version control and branching: integrates well with git – Dashboard: we use the Foreman dashboard We’re moving all our production service over in 2013 Gavin McCance, CERN13

Gavin McCance, CERN14

Preparing the move to cloud Improve operational efficiency and dynamicness – Dynamic multiple operating system demand – Dynamic temporary load spikes for special activities – Hardware interventions with long running programs (live migration) Improve resource efficiency – Exploit idle resources, especially waiting for disk and tape I/O – Highly variable load such as interactive or build machines Enable cloud architectures – Gradual migration from traditional batch + disk to cloud interfaces and workflows Improve responsiveness – Self-Service with coffee break response time Gavin McCance, CERN15

What is OpenStack ? OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface Gavin McCance, CERN16

Service Model Gavin McCance, CERN17 Pets are given names like pussinboots.cern.ch They are unique, lovingly hand raised and cared for When they get ill, you nurse them back to health Cattle are given numbers like vm0042.cern.ch They are almost identical to other cattle When they get ill, you get another one Future application architectures should use Cattle but Pets with strong configuration management are viable and still needed

Basic Openstack Components Gavin McCance, CERN18 Compute Scheduler Network Volume Registry Image KEYSTONE HORIZON NOVA GLANCE Each component has an API and is pluggable Other non-core projects interact with these components

Supporting the Pets with OpenStack Network – Interfacing with legacy site DNS and IP management – Ensuring Kerberos identity before VM start Puppet – Ease use of configuration management tools with our users – Exploit mcollective for orchestration/delegation External Block Storage – Currently using nova-volume with Gluster backing store Live migration to maximise availability – KVM live migration using Gluster – KVM and Hyper-V block migration Gavin McCance, CERN19

Current Status of OpenStack at CERN Working on an Essex code base from the EPEL repository – Excellent experience with the Fedora cloud-sig team – Cloud-init for contextualisation, oz for images with RHEL/Fedora Components – Current focus is on Nova with KVM and Hyper-V – Keystone running with Active Directory and Glance for Linux and Windows images Pre-production facility with around 200 Hypervisors, with 2000 VMs integrated with CERN infrastructure – used for simulation of magnet placement using and batch physics programs Gavin McCance, CERN20

Gavin McCance, CERN21

Next Steps Deploy into production at the start of 2013 with Folsom running production services and compute on top of OpenStack IaaS Support multi-site operations with 2 nd data centre in Hungary Exploit new functionality – Ceilometer for metering – Bare metal for non-virtualised use cases such as high I/O servers – X.509 user certificate authentication – Load balancing as a service Ramping to 15K hypervisors with 100K VMs by 2015 Gavin McCance, CERN22

Conclusions CERN computer centre is expanding We’re in the process of refurbishing the tools we use to manage the centre based on Openstack for IaaS and Puppet for configuration management Production at CERN in next few months on Folsom – Gradual migration of all our services Community is key to shared success – CERN contributes and benefits Gavin McCance, CERN23

BACKUP SLIDES Gavin McCance, CERN24

Training and Support Buy the book rather than guru mentoring Follow the mailing lists to learn Newcomers are rapidly productive (and often know more than us) Community and Enterprise support means we’re not on our own Gavin McCance, CERN25

Staff Motivation Skills valuable outside of CERN when an engineer’s contracts end Gavin McCance, CERN26

When communities combine… OpenStack’s many components and options make configuration complex out of the box Puppet forge module from PuppetLabs does our configuration Puppet forge The Foreman adds OpenStack provisioning for user kiosk to a configured machine in 15 minutes Gavin McCance, CERN27

Foreman to manage Puppetized VM Gavin McCance, CERN28

Active Directory Integration CERN’s Active Directory – Unified identity management across the site – 44,000 users – 29,000 groups – 200 arrivals/departures per month Full integration with Active Directory via LDAP – Uses the OpenLDAP backend with some particular configuration settings – Aim for minimal changes to Active Directory – 7 patches submitted around hard coded values and additional filtering Now in use in our pre-production instance – Map project roles (admins, members) to groups – Documentation in the OpenStack wiki Gavin McCance, CERN29

What are we missing (or haven’t found yet) ? Best practice for – Monitoring and KPIs as part of core functionality – Guest disaster recovery – Migration between versions of OpenStack Roles within multi-user projects – VM owner allowed to manage their own resources (start/stop/delete) – Project admins allowed to manage all resources – Other members should not have high rights over other members VMs Global quota management for non-elastic private cloud – Manage resource prioritisation and allocation centrally – Capacity management / utilisation for planning Gavin McCance, CERN30

Opportunistic Clouds in online experiment farms The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tape When the accelerator is not running, these machines are currently idle – Accelerator has regular maintenance slots of several days – Long Shutdown due from March 2013-November 2014 One of the experiments are deploying OpenStack on their farm – Simulation (low I/O, high CPU) – Analysis (high I/O, high CPU, high network) Gavin McCance, CERN31

New architecture data flows Gavin McCance, CERN32