Infrastructure Manager, CERN Clouds and Research Collide at CERN TIM BELL.

Slides:



Advertisements
Similar presentations
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
Advertisements

Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,
A successful public- private partnership Alberto Di Meglio CERN openlab CTO.
Ben Jones 12/9/2013 NEC'20132.
OpenStack Update Infrastructure as a Service May 23 nd 2012 Rob Hirschfeld, Dell.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
CLOUD FEDERATION Are We There Yet?. Tim Bell - CERN Why Do We Federate?
Tim 23/07/2014 2OSCON - CERN Mass and Agility.
CERN openlab IT Challenges Workshop Alberto Di Meglio CERN openlab CTO office.
Opensource for Cloud Deployments – Risk – Reward – Reality
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
OFC 200 Microsoft Solution Accelerator for Intranets Scott Fynn Microsoft Consulting Services National Practices.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
Rackspace Analyst Event Tim Bell
Cloud Computing Infrastructure at CERN
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
2 OpenStack Design Summit Summary Swiss and Rhone Alpes - OpenStack User Group Meeting 6 th December, CERN Belmiro Moreira
Tim Bell 24/09/2015 2Tim Bell - RDA.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
26 September 2013 Federating OpenStack: a CERN and Rackspace Collaboration Tim Bell Toby Owen
Jose Castro Leon CERN – IT/OIS CERN Agile Infrastructure Infrastructure as a Service.
Tim 18/09/2015 2Tim Bell - Australian Bureau of Meteorology Visit.
CERN IT Department CH-1211 Genève 23 Switzerland t The Agile Infrastructure Project Part 1: Configuration Management Tim Bell Gavin McCance.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Technical Workshop 5-6 November 2015 Alberto Di Meglio CERN openlab Head.
Helix Nebula The Science Cloud CERN – 13 June 2014 Alberto Di MEGLIO on behalf of Bob Jones (CERN) This document produced by Members of the Helix Nebula.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
CERN openlab Overview CERN openlab Introduction Alberto Di Meglio.
Brocade Flow Optimizer CERN openlab
Tim Bell 04/07/2013 Intel Openlab Briefing2.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
LHC Computing, CERN, & Federated Identities
Mark Gilbert Microsoft Corporation Services Taxonomy Building Block Services Attached Services Finished Services.
Possibilities for joint procurement of commercial cloud services for WLCG WLCG Overview Board Bob Jones (CERN) 28 November 2014.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Interoperability and Integration of EGI with Helix Nebula - Workshop Sergio Andreozzi Strategy and Policy Manager (EGI.eu) 11/04/2013 EGI Community.
CERN openlab Overview CERN openlab Summer Students 2015 Fons Rademakers.
CERN News on Grid and openlab François Fluckiger, Manager, CERN openlab for DataGrid Applications.
Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA,
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
INDIGO – DataCloud CERN CERN RIA
The Helix Nebula marketplace 13 May 2015 Bob Jones, CERN.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
CERN IT Systems Management Gavin McCance CERN IT-CM.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
A successful public- private partnership Maria Girone CERN openlab CTO.
Collaboration Board Meeting 11 March 2016 Alberto Di Meglio CERN openlab Head.
READ ME FIRST Use this template to create your Partner datasheet for Azure Stack Foundation. The intent is that this document can be saved to PDF and provided.
A successful public-private partnership
CERN Computing Infrastructure
A successful public-private partnership
Openlab Compute Provisioning Topics Tim Bell 1st March 2017
A successful public-private partnership
Grid related projects CERN openlab LCG EDG F.Fluckiger
SCD Cloud at STFC By Alexander Dibbo.
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Huawei : Clouds at Scale
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Understanding the Universe with help from OpenStack, CERN and Budapest
OpenStack Summit Berlin – November 14, 2018
Presentation transcript:

Infrastructure Manager, CERN Clouds and Research Collide at CERN TIM BELL

About CERN CERN is the European Organization for Nuclear Research in Geneva Particle accelerators and other infrastructure for high energy physics (HEP) research Worldwide community ‒ 21 member states (+ 2 incoming members) ‒ Observers: Turkey, Russia, Japan, USA, India ‒ About 2300 staff ‒ >10’000 users (about 5’000 on-site) ‒ Budget (2014) ~1000 MCHF Birthplace of the World Wide Web

COLLISIONS

TIER-1: permanent storage, re-processing, analysis TIER-0 (CERN): data recording, reconstruction and distribution TIER-2: Simulation, end-user analysis > 2 million jobs/day ~350’000 cores 500 PB of storage nearly 170 sites, 40 countries Gb links The Worldwide LHC Computing Grid

LHC Data Growth Expecting to record 400PB/year by 2023 Compute needs expected to be around 50x current levels if budget available PB per year

THE CERN MEYRIN DATA CENTRE

Public Procurement Cycle StepTime (Days)Elapsed (Days) User expresses requirement0 Market Survey prepared15 Market Survey for possible vendors3045 Specifications prepared1560 Vendor responses3090 Test systems evaluated30120 Offers adjudicated10130 Finance committee30160 Hardware delivered90250 Burn in and acceptance30 days typical with 380 worst case280 Total280+ Days

CERN Tool Chain

OpenStack Private Cloud Status 4 OpenStack clouds at CERN Largest is ~120,000 cores in ~5,000 servers 3 other instances with 45,000 cores total 1,100 active users Running OpenStack Juno 3PB Ceph block storage for attached volumes All non-CERN specific code is contributed back to the community

Source: eBay Upstream OpenStack on its own does not give you a cloud Packaging Integration Burn In SLA Monitoring Cloud is a service! Monitoring and alerting Metering and chargeback AutoscalingRemediationScale out Capacity planning UpgradesSLA Customer support User experience Incident resolution Alerting Cloud monitoring Metrics Log processing High availability Config management Infra onboarding CIBuilds Net/info sec Network design OpenStack APIs

Marketplace Options Build your own High level of programming skills needed Complete the service circles Use a distribution Good linux admin skills License costs Use a public cloud or hosted private cloud Quick start Varying pricing models available, check for “OpenStack powered”

Hooke’s Law for Cultural Change Under load, an organization can extend proportional to external force Too much stretching leads to permanent deformation

THE AGILE EXPERIENCE

CULTURAL BARRIERS

CERN Openlab in a Nutshell A science – industry partnership to drive R&D and innovation with over a decade of success Evaluate state-of-the-art technologies in a challenging environment and improve them Test in a research environment today what will be used in many business sectors tomorrow Train next generation of engineers/employees Disseminate results and outreach to new audiences

Phase V Members PARTNERS CONTRIBUTORS ASSOCIATES RESEARCH

IN2P3 Lyon Brookhaven National Labs NecTAR Australia ATLAS Trigger 28K cores ALICE Trigger 12K cores CMS Trigger 12K cores Many Others on Their Way Public Cloud such as Rackspace Onwards the Federated Clouds CERN Private Cloud 120K cores

Openlab Past Activities Developed OpenStack identity federation Demo’d between CERN and Rackspace at the Paris summit Validated Rackspace private and public clouds for physics simulation workloads Performance and reliability comparable with private cloud

Openlab Ongoing Activities Expand federation functionality Images Orchestration Expand workloads Reconstruction (High I/O and network) Bare-metal testing Exploit federation

Summary OpenStack clouds are in production at scale for High Energy Physics Cultural change to an Agile approach has required time and patience but is paying off CERN’s computing challenges and collaboration with Rackspace has fostered sustainable innovation Many options for future computing models according to budget and needs

For Further Information ? Technical details at production.blogspot.fr production.blogspot.fr CERN code is upstream or at cernops cernops

04/06/ Tim Bell - Rackspace::Solve

New Data Centre in Budapest 04/06/

Cultural Transformations Technology change needs cultural change Speed Are we going too fast ? Budget Cloud quota allocation rather than CHF Skills inversion Legacy skills value is reduced Hardware ownership No longer a physical box to check 04/06/2015 Tim Bell - Rackspace::Solve29

Good News, Bad News 04/06/2015 Tim Bell - Rackspace::Solve30 Additional data centre in Budapest now online Increasing use of facilities as data rates increase But… Staff numbers are fixed, no more people Materials budget decreasing, no more money Legacy tools are high maintenance and brittle User expectations are for fast self-service

Innovation Dilemma How can we avoid the sustainability trap ? Define requirements No solution available that meets those requirements Develop our own new solution Accumulate technical debt How can we learn from others and share ? Find compatible open source communities Contribute back where there is missing functionality Stay mainstream Are CERN computing needs really special ? 04/06/2015 Tim Bell - Rackspace::Solve31

O’Reilly Consideration 04/06/2015 Tim Bell - Rackspace::Solve32

Job Trends Consideration 04/06/2015 Tim Bell - Rackspace::Solve33

OpenStack Cloud Platform 04/06/2015 Tim Bell - Rackspace::Solve34

OpenStack Governance 04/06/

04/06/ Tim Bell - Rackspace::Solve

The LHC timeline L~7x10 33 Pile-up~20-35 L=1.6x10 34 Pile-up~30-45 L=2-3x10 34 Pile-up~50-80 L=5x10 34 Pile-up~ L.Rossi 37 04/06/2015

38 cloudstack Tim Bell - Rackspace::Solve

compute-nodes controllers compute-nodes Scaling Architecture Overview 39 Child Cell Geneva, Switzerland Child Cell Budapest, Hungary Top Cell - controllers Geneva, Switzerland Load Balancer Geneva, Switzerland controllers 04/06/2015 Tim Bell - Rackspace::Solve

Monitoring - Kibana 40 04/06/2015 Tim Bell - Rackspace::Solve

Architecture Components 41 rabbitmq - Keystone - Nova api - Nova conductor - Nova scheduler - Nova network - Nova cells - Nova api - Nova conductor - Nova scheduler - Nova network - Nova cells - Glance api - Ceilometer agent-central - Ceilometer collector - Ceilometer agent-central - Ceilometer collector Controller - Flume - Nova compute - Ceilometer agent-compute Compute node - Flume - HDFS - Elastic Search - Kibana - MySQL - MongoDB - Glance api - Glance registry - Glance api - Glance registry - Keystone - Nova api - Nova consoleauth - Nova novncproxy - Nova cells - Nova api - Nova consoleauth - Nova novncproxy - Nova cells - Horizon - Ceilometer api - Cinder api - Cinder volume - Cinder scheduler - Cinder api - Cinder volume - Cinder scheduler rabbitmq Controller Top CellChildren Cells - Stacktach - Ceph - Flume 04/06/2015 Tim Bell - Rackspace::Solve

04/06/ Microsoft Active Directory Database Services CERN Network Database Account mgmt system Horizon Keystone Glance Network Compute Scheduler Cinder Nova Block Storage Ceph & NetApp CERN Accounting Ceilometer Tim Bell - Rackspace::Solve

04/06/ Helix Nebula Atos Cloud Sigma T- Systems Broker(s) EGI Fed Cloud Front-end Academic Other market sectors Big ScienceSmall and Medium Scale Science Publicly funded Commercial GovernmentManufacturingOil & gas, etc. Network Commercial/GEANT Interoute Front-end