Accelerating Science with OpenStack

Slides:



Advertisements
Similar presentations
System Center 2012 R2 Overview
Advertisements

Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,
Ben Jones 12/9/2013 NEC'20132.
The Biggest Experiment in History. Well, a tiny piece of it at least… And a glimpse 12bn years back in time To the edge of the observable universe So.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
CERN Data Centre Evolution Gavin SDCD12: Supporting Science with Cloud Computing Bern 19 th November 2012.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Rackspace Analyst Event Tim Bell
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Tim Bell 24/09/2015 2Tim Bell - RDA.
Jose Castro Leon CERN – IT/OIS CERN Agile Infrastructure Infrastructure as a Service.
Tim 18/09/2015 2Tim Bell - Australian Bureau of Meteorology Visit.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Agile Infrastructure IaaS Compute Jan van Eldik CERN IT Department Status Update 6 July 2012.
Tim Bell 04/07/2013 Intel Openlab Briefing2.
LHC Computing, CERN, & Federated Identities
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
C Loomis (CNRS/LAL) and V. Floros (GRNET)
Resource Provisioning Services Introduction and Plans
Introduction to CERN F. Hahn / CERN PH-DT1 10. May 2007.
Grid site as a tool for data processing and data analysis
Dag Toppe Larsen UiB/CERN CERN,
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Dag Toppe Larsen UiB/CERN CERN,
European Organization for Nuclear Research
ATLAS Cloud Operations
Openlab Compute Provisioning Topics Tim Bell 1st March 2017
CERN presentation & CFD at CERN
OpenStack Switzerland User Group
Database Services at CERN Status Update
SCD Cloud at STFC By Alexander Dibbo.
Dagmar Adamova, NPI AS CR Prague/Rez
LHC DATA ANALYSIS INFN (LNL – PADOVA)
The LHC Computing Grid Visit of Her Royal Highness
Grid Computing.
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Southwest Tier 2.
Статус ГРИД-кластера ИЯФ СО РАН.
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
US CMS Testbed.
AWS. Introduction AWS launched in 2006 from the internal infrastructure that Amazon.com built to handle its online retail operations. AWS was one of the.
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Accelerating Science with OpenStack
Accelerating Science with OpenStack
Understanding the Universe with help from OpenStack, CERN and Budapest
Nuclear Physics Data Management Needs Bruce G. Gibbard
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
Brian Matthews STFC EOSCpilot Brian Matthews STFC
GRIF : an EGEE site in Paris Region
OpenStack Summit Berlin – November 14, 2018
Development of LHCb Computing Model F Harris
The LHC Computing Grid Visit of Professor Andreas Demetriou
The StarlingX Story Learn, Try, Get Involved!
Presentation transcript:

Accelerating Science with OpenStack Jan van Eldik Jan.van.Eldik@cern.ch October 10, 2013

What is CERN ? Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics Founded in 1954 with an international treaty Between Geneva and the Jura mountains, straddling the Swiss-French border Our business is fundamental physics , what is the universe made of and how does it work Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental research Nuclear is part of the name but our world is particle physics Jan van Eldik, CERN

Answering fundamental questions… How to explain particles have mass? We have theories and now we have experimental evidence! What is 96% of the universe made of ? We can only see 4% of its estimated mass! Why isn’t there anti-matter in the universe? Nature should be symmetric… What was the state of matter just after the « Big Bang » ? Travelling back to the earliest instants of the universe would help… Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions - Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence. Our theory of forces does not explain how Gravity works Cosmologists can only find 4% of the matter in the universe, we have lost the other 96% We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two annihilaaliate each other) ? When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this? Jan van Eldik, CERN

The Large Hadron Collider The LHC is CERN’s largest accelerator. A 17 mile ring 100 meters underground where two beams of particles are sent in opposite directions and collided at the 4 experiments, Atlas, CMS, LHCb and ALICE. Lake Geneva and the airport are visible in the top to give a scale. Jan van Eldik, CERN

- At 4 points around the ring, the beams are made to cross at points where detectors, the size of cathedrals and weighing up to 12,500 tonnes surround the pipe. These are like digital camera, but they take 100 mega pixel photos 40 million times a second. This produces up to 1 petabyte/s. OpenStack in Action - 3 Jan van Eldik, CERN

To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each other Software close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved Jan van Eldik, CERN

- We cannot record 1PB/s so there are hardware filters to remove uninteresting collisions such as those whose physics we understand already. The data is then sent to the CERN computer centre for recording via 10Gbit optical connections. Jan van Eldik, CERN

Initial data reconstruction Data distribution Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~200 centres): Simulation End-user analysis The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs over 2 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe. Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs Jan van Eldik, CERN

Jan van Eldik, CERN

The CERN Data Centre in Numbers Hardware installation & retirement ~7,000 hardware movements/year; ~1800 disk failures/year Racks 1127 Servers 10,070 Processors 17,259 Cores 88,414 HEPSpec06 744,277 Disks 79,505 Raw disk capacity (TiB) 124,660 Memory modules 63,326 Memory capacity (TiB) 298 RAID controllers 3,091 Tape Drives 120 Tape Cartridges 52000 Tape slots 66000 Data on Tape (PiB) 75 High Speed Routers 29 Ethernet Switches 874 10 Gbps/100Gbps ports 1396/74 Switching Capacity 6 Tbps 1 Gbps ports 27984 10 Gbps ports 5664 So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit. The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town. With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google. Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration. Memory (TiB) 314.14 Number of 10G ports 2787 Number of 1G ports 18662 Number of cores 91632 Number of disks 78567 Number of memory modules 64747 Number of not 10G or 1G ports 0 Number of processors 17737 Number of systems 9630 Raw HDD capacity (TiB) 121476.89 IT Power Consumption 2392 KW Total Power Consumption 3929 KW Jan van Eldik, CERN

Not too long ago…. Around 10k servers Dedicated compute, dedicated disk server, dedicated service nodes Majority Scientific Linux (RHEL5/6 clone) Mostly running on real hardware Last couple of years, we’ve consolidated some of the service nodes onto Microsoft HyperV Various other virtualisation projects around Many diverse applications (”clusters”) Managed by different teams (CERN IT + experiment groups) …. … using our own management toolset Quattor / CDB configuration tool Lemon computer monitoring Jan van Eldik, CERN

Public Procurement Purchase Model Step Time (Days) Elapsed (Days) User expresses requirement Market Survey prepared 15 Market Survey for possible vendors 30 45 Specifications prepared 60 Vendor responses 90 Test systems evaluated 120 Offers adjudicated 10 130 Finance committee 160 Hardware delivered 250 Burn in and acceptance 30 days typical 380 worst case 280 Total 280+ Days Jan van Eldik, CERN

New data centre to expand capacity Data centre in Geneva at the limit of electrical capacity at 3.5MW New centre chosen in Budapest, Hungary Additional 2.7MW of usable power Hands off facility with 200Gbit/s network to CERN Asked member states for offers 200Gbit/s links connecting the centres Expect to double computing capacity compared to today by 2015 Jan van Eldik, CERN

Time to change strategy Rationale Need to manage twice the servers as today No increase in staff numbers Tools becoming increasingly brittle and will not scale as-is Approach CERN is no longer a special case for compute Adopt an open source tool chain model Our engineers rapidly iterate Evaluate solutions in the problem domain Identify functional gaps and challenge them Select first choice but be prepared to change in future Contribute new functionality back to the community Double the capacity, same manpower Need to rethink how to solve the problem… look at how others approach it We had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break. Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community Jan van Eldik, CERN

Prepare the move to the clouds Improve operational efficiency Machine ordering, reception and testing Hardware interventions with long running programs Multiple operating system demand Improve resource efficiency Exploit idle resources, especially waiting for disk and tape I/O Highly variable load such as interactive or build machines Enable cloud architectures Gradual migration to cloud interfaces and workflows Improve responsiveness Self-Service with coffee break response time Standardise hardware … buy in bulk and pile it up then work out what to use it for Memory, motherboards, cables or disks interventions Users waiting for I/O means wasted cycles. Build machines at night unused during the day. Interactive machines mainly during the day Move to cloud APIs … need to support them but also maintain our existing applications Details later on reception and testing Jan van Eldik, CERN

Service Model Pets are given names like pussinboots.cern.ch They are unique, lovingly hand raised and cared for When they get ill, you nurse them back to health Cattle are given numbers like vm00042.cern.ch They are almost identical to other cattle When they get ill, you get another one Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management. Future application architectures should use Cattle but Pets with strong configuration management are viable and still needed Jan van Eldik, CERN

Supporting the Pets with OpenStack Network Interfacing with legacy site DNS and IP management Ensuring Kerberos identity before VM start Puppet Ease use of configuration management tools with our users Exploit mcollective for orchestration/delegation External Block Storage Looking to use Cinder with Ceph backing store Live migration to maximise availability KVM live migration using Ceph Jan van Eldik, CERN

Current Status of OpenStack at CERN Grizzly-release, based on RDO packages Excellent experience with the Fedora Cloud SIG and RDO teams Cloud-init for contextualisation, oz for images (Linux and Windows) Components Current focus is on Nova with KVM and Hyper-V Glance (Ceph backend), Keystone (Active Directory), Horizon, Ceilometer WIP: Cinder, with Ceph and NetAPP backends Stackforge Puppet components to configure OpenStack Production service since Summer 2013 2 Nova cells with ~600 hypervisors, adding 100 servers/week ~1800 VMs integrated with CERN infrastructure Jan van Eldik, CERN

Jan van Eldik, CERN

OpenStack production service since August 2013 Cell Nodes Cores VMs Geneva 373 10912 1359 Wigner 291 9312 368 Total 664 20224 1727 Jan van Eldik, CERN

Jan van Eldik, CERN

When communities combine… OpenStack’s many components and options make configuration complex out of the box Puppet forge module from PuppetLabs does our configuration The Foreman adds OpenStack provisioning for user kiosk to a configured machine in 15 minutes Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration… Jan van Eldik, CERN

Active Directory Integration CERN’s Active Directory Unified identity management across the site 44,000 users 29,000 groups 200 arrivals/departures per month Full integration with Active Directory via LDAP Uses the OpenLDAP backend with some particular configuration settings Aim for minimal changes to Active Directory 7 patches submitted to Folsom release Now in use in our production instance Map project roles (admins, members) to groups Documentation in the OpenStack wiki Jan van Eldik, CERN

What about Hyper-V? We have used Hyper-V/System Centre for our server consolidation activities 3400 VMs (2000 Linux, 1400 Windows) But need to scale to 100x current installation size Choice of hypervisors should be tactical Performance Compatibility/Support with integration components Image migration from legacy environments CERN is working closely with the Hyper-V OpenStack team Puppet to configure hypervisors on Windows Most functions work well but further work on Console, Ceilometer, … Jan van Eldik, CERN

Opportunistic Clouds in online experiment farms The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tape When the accelerator is not running, these machines are currently idle Accelerator has regular maintenance slots of several days Long Shutdown due from March 2013-November 2014 ATLAS and CMS have deployed OpenStack on their farm Simulation (low I/O, high CPU) Analysis (high I/O, high CPU, high network) Cloud Hypervisors Cores ATLAS Sim@P1 1,200 28,800 (HT) CMS OOOO cloud 1,300 13,000 CERN IT Ibex and Grizzly clouds 873 20,952 (HT) - See more at: http://puppetlabs.com/blog/tale-three-openstack-clouds-50000-cores-production-cern?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+PuppetLabs+(Puppet+Labs)#sthash.zpqqhwVr.dpuf ATLAS Sim@P1 1,200 Servers 28,800 cores (HT) CMS OOOO cloud 1,300 Servers 13,000 cores CERN IT Grizzly cloud 664 Servers 20,224 cores (HT) Jan van Eldik, CERN

Ramping to 15,000 hypervisors with 100,000 VMs by end-2015 Upcoming challenges Exploit new functionality Deploy Cinder, with Ceph and NetAPP backends Replace nova-network by Neutron Kerberos, X.509 user certificate authentication Federated clouds Keystone Domains to devolve administration Bare metal for non-virtualised use cases such as high I/O servers OpenStack Heat Load Balancing as a service … Ramping to 15,000 hypervisors with 100,000 VMs by end-2015 Jan van Eldik, CERN

Conclusions OpenStack is in production at CERN Work together with others on scaling improvements Community is key to shared success Our problems are often resolved before we raise them Packaging teams are producing reliable builds promptly CERN contributes and benefits Thanks to everyone for their efforts and enthusiasm Not just code but documentation, tests, blogs, … Jan van Eldik, CERN

Backup Slides Jan van Eldik, CERN

References CERN http://public.web.cern.ch/public/ Scientific Linux http://www.scientificlinux.org/ Worldwide LHC Computing Grid http://lcg.web.cern.ch/lcg/ http://rtm.hep.ph.ic.ac.uk/ Jobs http://cern.ch/jobs Detailed Report on Agile Infrastructure http://cern.ch/go/N8wp HELiX Nebula http://helix-nebula.eu/ EGI Cloud Taskforce https://wiki.egi.eu/wiki/Fedcloud-tf Jan van Eldik, CERN

CERN is more than just the LHC CNGS neutrinos to Gran Sasso CLOUD demonstrating impacts of cosmic rays on weather patterns Anti-hydrogen atoms contained for minutes in a magnetic vessel However, for those of you who have read Dan Brown’s Angels and Demons or seen the film, there are no maniacal monks with pounds of anti-matter running around the campus Jan van Eldik, CERN

Community collaboration on an international scale Biggest international scientific collaboration in the world, over 10,000 scientists from 100 countries Annual Budget around 1.1 billion USD Funding for CERN, the laboratory, itself comes from the 20 member states, in ratio to the gross domestic product… other countries contribute to experiments including substantial US contribution towards the LHC experiments Jan van Eldik, CERN

Training for Newcomers Buy the book rather than guru mentoring Jan van Eldik, CERN

Job Opportunities Jan van Eldik, CERN