Tim 23/07/2014 2OSCON - CERN Mass and Agility.

Slides:



Advertisements
Similar presentations
CERN STAR TAP June 2001 Status of the EU DataGrid Project Fabrizio Gagliardi CERN EU-DataGrid Project Leader June 2001
Advertisements

Programme: 145 sessions & social events
Accelerating Science and Innovation Welcome to CERN.
Ben Jones 12/9/2013 NEC'20132.
Welcome to CERN Accelerating Science and Innovation 2 nd March 2015 – Bidders Conference – DO-29161/EN.
Welcome to CERN Research Technology Training Collaborating.
Knowledge Management LXV International Council Meeting Qawra, Malta 16 th - 23 rd of March 2014.
CERN Cloud Infrastructure Report 2 Bruno Bompastor for the CERN Cloud Team HEPiX Spring 2015 Oxford University, UK Bruno Bompastor: CERN Cloud Report.
CERN Data Centre Evolution Gavin SDCD12: Supporting Science with Cloud Computing Bern 19 th November 2012.
Accelerating Science and Innovation. Science for Peace CERN was founded in 1954 as a Science for Peace Initiative by 12 European States Member States:
13 October 2014 Eric Grancher, head of database services, CERN IT Manuel Martin Marquez, data scientist, CERN openlab.
EXperimental Infrastructures for the Future Internet Process for Joining Infrastructure Owners Training - Basic.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Resources and Financial Plan Sue Foffano WLCG Resource Manager C-RRB Meeting, 12 th October 2010.
CERN Computing Infrastructure Evolution Tim Bell PH/SFT Group Meeting 18 th February CERN Infrastructure EvolutionTim Bell, CERN.
From Open Science to (Open) Innovation Markus Nordberg, Marzio Nessi (CERN) About CERN, Open Physics, Innovation and IdeaSquare.
Ceph Storage in OpenStack Part 2 openstack-ch,
Welcome – Benvenuti Carlo Verdone to Accelerating Science and Innovation to Accelerating Science and Innovation.
Bidders’ conference IT-4123 Supply, Replacement and Repair of Crane Rails Introduction to CERN and rails consolidation program Handling Engineering Group.
Grants LXIV International Council Meeting 19th – 26th October, Bodrum Turkey.
Rackspace Analyst Event Tim Bell
Cloud Computing Infrastructure at CERN
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
2 OpenStack Design Summit Summary Swiss and Rhone Alpes - OpenStack User Group Meeting 6 th December, CERN Belmiro Moreira
Tim Bell 24/09/2015 2Tim Bell - RDA.
26 September 2013 Federating OpenStack: a CERN and Rackspace Collaboration Tim Bell Toby Owen
Infrastructure Manager, CERN Clouds and Research Collide at CERN TIM BELL.
Jose Castro Leon CERN – IT/OIS CERN Agile Infrastructure Infrastructure as a Service.
Tim 18/09/2015 2Tim Bell - Australian Bureau of Meteorology Visit.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Weblogic at CERN now and in the future SOUG2 Swiss Oracle User Group Artur Wiecek Infrastructure and Middleware Services CERN IT Department.
Agile Infrastructure IaaS Compute Jan van Eldik CERN IT Department Status Update 6 July 2012.
Agile Infrastructure: an updated overview of IaaS at CERN
WelcomeWelcome CSEM – CERN Day 23 rd May 2013 CSEM – CERN Day 23 rd May 2013 to Accelerating Science and Innovation to Accelerating Science and Innovation.
CERN as a World Laboratory: From a European Organization to a global facility CERN openlab Board of Sponsors July 2, 2010 Rüdiger Voss CERN Physics Department.
CERN openlab Overview CERN openlab Introduction Alberto Di Meglio.
Brocade Flow Optimizer CERN openlab
Tim Bell 04/07/2013 Intel Openlab Briefing2.
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
LHC Computing, CERN, & Federated Identities
Stakeholder Relations at Large-Scale Infrastructures The CERN Model Rolf Heuer 7 th Canadian Science Policy Conference, Ottawa, 26 November 2015.
Geography Review On Map 1, please identify: -Spain -France -England -Russia -Ottoman empire -Persia -China -Mughal India -Songhai Empire.
BIDDERS’ CONFERENCE IT-3981 Dismantling, Refurbishment, Replacement and Supply of Electrical Overhead Travelling Cranes Over 10 Tons Capacity January.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
CERN openlab Overview CERN openlab Summer Students 2015 Fons Rademakers.
The Mission of CERN  Push back  Push back the frontiers of knowledge E.g. the secrets of the Big Bang …what was the matter like within the first moments.
25-September-2005 Manjit Dosanjh Welcome to CERN International Workshop on African Research & Education Networking September ITU, UNU and CERN.
Germany and CERN / June 2009Germany and CERN | May Welcome - Willkommen CERN: to CERN: Accelerating Science and Innovation Professor Wolfgang A.
WLCG after 1 year with data: Prospects for the future Ian Bird; WLCG Project Leader openlab BoS meeting CERN4 th May 2011.
Figure 1. PARTICIPATING STEM CELL DONOR REGISTRIES Number of registries Year ©BMDW.
WLCG: The 1 st year with data & looking to the future WLCG: Ian Bird, CERN WLCG Project Leader WLCG Project LeaderLCG-France; Strasbourg; 30 th May 2011.
Global Aluminium Pipe and Tube Market to 2018 (Market Size, Growth, and Forecasts in Nearly 60 Countries) Published Date: Jul-2014 Reports and Intelligence.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
INDIGO – DataCloud CERN CERN RIA
CERN Computing Infrastructure Evolution Tim Bell IN2P3 2 nd April CERN Infrastructure EvolutionTim Bell, CERN.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Best Sustainable Development Practices for Food Security UV-B radiation: A Specific Regulator of Plant Growth and Food Quality in a Changing Climate The.
CERN August 2013CERN Teacher Programmes1 Designing Effective Outreach Programmes for Teachers at CERN Inspiring the next generation of scientists and engineers.
5-minutes tour of CERN (based on official CERN slides) 5-minutes tour of CERN (based on official CERN slides) Christian Joram / CERN EIROfrum Topical Workshop.
CERN Computing Infrastructure
European Organization for Nuclear Research
The 5 minutes tour of CERN The 5 minutes race of CERN
Options for association and collaboration with CERN
The 5 minutes tour of CERN The 5 minutes race of CERN
ALIGNMENT RULE PROCEDURE
CERN Teacher Programmes
Understanding the Universe with help from OpenStack, CERN and Budapest
ALIGNMENT RULE PROCEDURE
CERN: from fundamental sciences to daily applications
Presentation transcript:

Tim 23/07/2014 2OSCON - CERN Mass and Agility

About Tim Runs IT Infrastructure group at CERN Member of OpenStack management board and user committee Previously worked at Deutsche Bank running European Private Banking Infrastructure IBM as a consultant and kernel developer 23/07/2014 3OSCON - CERN Mass and Agility

23/07/ CERN was founded 1954: 12 European States “Science for Peace” “Science for Peace” Today: 21 Member States Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Israel, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: Romania Associate Members in Pre-Stage to Membership: Serbia Applicant States for Membership or Associate Membership: Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America; European Commission and UNESCO Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Israel, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: Romania Associate Members in Pre-Stage to Membership: Serbia Applicant States for Membership or Associate Membership: Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America; European Commission and UNESCO ~ 2,300 staff ~ 2,300 staff ~ 1,000 other paid personnel ~ 1,000 other paid personnel > 11,000 users > 11,000 users Budget (2013) ~1,000 MCHF Budget (2013) ~1,000 MCHF ~ 2,300 staff ~ 2,300 staff ~ 1,000 other paid personnel ~ 1,000 other paid personnel > 11,000 users > 11,000 users Budget (2013) ~1,000 MCHF Budget (2013) ~1,000 MCHF OSCON - CERN Mass and Agility

What are the Origins of Mass ? 23/07/ OSCON - CERN Mass and Agility

Matter/Anti Matter Symmetric? 23/07/ OSCON - CERN Mass and Agility

Where is 95% of the Universe? 23/07/ OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility

Collisions 23/07/ OSCON - CERN Mass and Agility

A Big Data Challenge 23/07/ In 2014, ~ 100PB archive with additional 35PB/year ~ 11,000 servers ~ 75,000 disk drives ~ 45,000 tapes Data should be kept for at least 20 years In 2015, we start the accelerator again Upgrade to double the energy of the beams Expect a significant increase in data rate OSCON - CERN Mass and Agility

LHC data growth Plan to record 400PB/year by 2023 Compute needs expected to be around 50x current levels if budget available 23/07/2014 OSCON - CERN Mass and Agility PB per year

23/07/ Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-2 (~200 centres): Simulation End-user analysis Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs OSCON - CERN Mass and Agility

The CERN Meyrin Data Centre 23/07/ OSCON - CERN Mass and Agility

New Data Centre in Budapest 23/07/ OSCON - CERN Mass and Agility

Good News, Bad News 23/07/2014 OSCON - CERN Mass and Agility17 Additional data centre in Budapest now online Increasing use of facilities as data rates increase But… Staff numbers are fixed, no more people Materials budget decreasing, no more money Legacy tools are high maintenance and brittle User expectations are for fast self-service

Public Procurement Cycle StepTime (Days)Elapsed (Days) User expresses requirement0 Market Survey prepared15 Market Survey for possible vendors3045 Specifications prepared1560 Vendor responses3090 Test systems evaluated30120 Offers adjudicated10130 Finance committee30160 Hardware delivered90250 Burn in and acceptance30 days typical with 380 worst case280 Total280+ Days 23/07/2014 OSCON - CERN Mass and Agility18

Approach There is no Moore’s Law for people Automation needs APIs, not documented procedures Focus on high people effort activities Are those requirements really justified ? Accumulating technical debt stifles agility Find open source communities and contribute Understand ethos and architecture Stay mainstream 23/07/2014 OSCON - CERN Mass and Agility19

O’Reilly Consideration 23/07/2014 OSCON - CERN Mass and Agility20

Indeed.Com Consideration 23/07/2014 OSCON - CERN Mass and Agility21

23/07/2014 Bamboo Koji, Mock AIMS/PXE Foreman AIMS/PXE Foreman Yum repo Pulp Yum repo Pulp Puppet-DB mcollective, yum JIRA Lemon / Hadoop / LogStash / Kibana Lemon / Hadoop / LogStash / Kibana git OpenStack Nova OpenStack Nova Hardware database Puppet Active Directory / LDAP Active Directory / LDAP 22OSCON - CERN Mass and Agility

Puppet Configuration 23/07/2014 OSCON - CERN Mass and Agility 23 Over 10,000 hosts in Puppet 160 different hostgroups Tool chain using PuppetDB Foreman Git Scaling issues resolved with the communities

Monitoring - Flume, Elastic Search, Kibana 24 HDFS Flume gateway Flume gateway elasticsearch Kibana OpenStack infrastructure 23/07/2014 OSCON - CERN Mass and Agility

23/07/ Microsoft Active Directory CERN DB on Demand CERN Network Database Account mgmt system Horizon Keystone Glance Network Compute Scheduler Cinder Nova Block Storage Ceph & NetApp CERN Accounting Ceilometer OSCON - CERN Mass and Agility

compute-nodes controllers compute-nodes Scaling Architecture Overview 26 Child Cell Geneva, Switzerland Child Cell Budapest, Hungary Top Cell - controllers Geneva, Switzerland Load Balancer Geneva, Switzerland controllers 23/07/2014 OSCON - CERN Mass and Agility

Status Multi-data centre cloud in production since July 2013 (Geneva and Budapest) with nearly 1,000 users Currently running OpenStack Havana KVM and Hyper-V deployed All configured automatically with Puppet ~70,000 cores on ~3,000 servers 3PB Ceph pool available for volumes, images and other physics storage 23/07/ OSCON - CERN Mass and Agility

The Agile Experience 23/07/2014 OSCON - CERN Mass and Agility 28

Cultural Barriers 23/07/2014 OSCON - CERN Mass and Agility 29

Agility and Elasticity Limits Communities help to set good behaviour Internal demonstrations build momentum Finding the right speed is key Keeping up with releases takes focus Coping with legacy requires compromise Travel budget needs significant increase! 23/07/2014 OSCON - CERN Mass and Agility30

Next Steps: Scale with Physics Scaling to >100,000 cores by 2015 Around 100 hypervisors per week with fixed staff Deploying and configuring latest releases Need to stay close … but not too close Legacy systems retirement Server consolidation Home grown configuration and monitoring Analytics of processor, disk and network Focus on efficiency 23/07/ OSCON - CERN Mass and Agility

IN2P3 Lyon Next Steps: Federated Clouds Public Cloud such as Rackspace CERN Private Cloud 70K cores ATLAS Trigger 28K cores CMS Trigger 12K cores Brookhaven National Labs NecTAR Australia Many Others on Their Way 23/07/2014 OSCON - CERN Mass and Agility32

Summary Open source tools have successfully replaced CERN’s legacy fabric management system Scaling to 100,000s of cores with OpenStack and Puppet is in sight Cultural change to an Agile approach has required time and patience but is paying off Community collaboration needed to reach 400PB/year 23/07/ OSCON - CERN Mass and Agility

Questions ? 23/07/ Details at production.blogspot.fr production.blogspot.fr Previous presentations at technology.web.cern.ch/boo k/cern-private-cloud-user- guide/openstack-information technology.web.cern.ch/boo k/cern-private-cloud-user- guide/openstack-information CERN code is at OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility

23/07/ cloudstack OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility

Monitoring - Kibana 39 23/07/2014 OSCON - CERN Mass and Agility

Monitoring - Kibana 40 23/07/2014 OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility

Architecture Components 42 rabbitmq - Keystone - Nova api - Nova conductor - Nova scheduler - Nova network - Nova cells - Nova api - Nova conductor - Nova scheduler - Nova network - Nova cells - Glance api - Ceilometer agent-central - Ceilometer collector - Ceilometer agent-central - Ceilometer collector Controller - Flume - Nova compute - Ceilometer agent-compute Compute node - Flume - HDFS - Elastic Search - Kibana - MySQL - MongoDB - Glance api - Glance registry - Glance api - Glance registry - Keystone - Nova api - Nova consoleauth - Nova novncproxy - Nova cells - Nova api - Nova consoleauth - Nova novncproxy - Nova cells - Horizon - Ceilometer api - Cinder api - Cinder volume - Cinder scheduler - Cinder api - Cinder volume - Cinder scheduler rabbitmq Controller Top CellChildren Cells - Stacktach - Ceph - Flume 23/07/2014 OSCON - CERN Mass and Agility

Upgrade Strategy Surely “OpenStack can’t be upgraded” Our Essex, Folsom and Grizzly clouds were ‘tear-down’ migrations Puppet managed VMs are typical Cattle cases – re-create User VMs snapshot, download image and upload to new instance One month window to migrate Users of production services expect more Physicists accept not creating/changing VMs for a short period Running VMs must not be affected 23/07/ OSCON - CERN Mass and Agility

Phased Migration Migrated by Component Choose an approach (online with load balancer, offline) Spin up ‘teststack’ instance with production software Clone production databases to test environment Run through upgrade process Validate existing functions, Puppet configuration and monitoring Order by complexity and need Ceilometer, Glance, Keystone Cinder, Client CLIs, Horizon Nova 23/07/ OSCON - CERN Mass and Agility

Upgrade Experience No significant outage of the cloud During upgrade window, creation not possible Small incidents (see blog for details)blog Puppet can be enthusiastic! - we told it to be Community response has been great Bugs fixed and points are in Juno design summit Rolling upgrades in Icehouse will make it easier 23/07/ OSCON - CERN Mass and Agility

Duplication and Divergence Service SilosFunctional Layers 23/07/2014 OSCON - CERN Mass and Agility46 Network Hardware Facilities Storage Compute Windows Web Database Custom Network Hardware Facilities Infrastructure as a Service Platform as a Service Storage ComputeWindows

Service Models 23/07/ Pets are given names like pussinboots.cern.ch They are unique, lovingly hand raised and cared for When they get ill, you nurse them back to health Cattle are given numbers like vm0042.cern.ch They are almost identical to other cattle When they get ill, you get another one OSCON - CERN Mass and Agility

23/07/ OSCON - CERN Mass and Agility