CERN Cloud Service Update

Slides:



Advertisements
Similar presentations
Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
Advertisements

1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
CERN Cloud Infrastructure Report 2 Bruno Bompastor for the CERN Cloud Team HEPiX Spring 2015 Oxford University, UK Bruno Bompastor: CERN Cloud Report.
Opensource for Cloud Deployments – Risk – Reward – Reality
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Ceph Storage in OpenStack Part 2 openstack-ch,
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Jose Castro Leon CERN – IT/OIS CERN Agile Infrastructure Infrastructure as a Service.
SC2012 Infrastructure Components Management Justin Cook (Data # 3) Principal Consultant, Systems Management Noel Fairclough (Data # 3) Consultant, Systems.
Agile Infrastructure IaaS Compute Jan van Eldik CERN IT Department Status Update 6 July 2012.
Using Heat to Deploy and Manage Applications in OpenStack Trevor Roberts Jr, VMware, Inc. CNA1763 #CNA1763.
CoprHD and OpenStack Ideas for future.
How Adobe Built An OpenStack Cloud
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
How Adobe Has Built An OpenStack Cloud
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Juraj Sucik, Michal Kwiatek, Rafal.
CERN Cloud Infrastructure Report 2 Arne Wiebalck for the CERN Cloud Team HEPiX Spring Meeting DESY, Zeuthen, Germany Apr 19, 2019 Numbers Operations What’s.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
CON8473 – Oracle Distribution of OpenStack Ronen Kofman Director of Product Management Oracle OpenStack September, 2014 Copyright © 2014, Oracle and/or.
The StratusLab Distribution and Its Evolution 4ème Journée Cloud (Bordeaux, France) 30 November 2012.
Docker for Ops: Operationalize Your Apps in Production Vivek Saraswat Sr. Product Evan Hazlett Sr. Software
EPAM Cloud Orchestration
OPENSTACK Presented by Jordan Howell and Katie Woods.
Canadian Bioinformatics Workshops
Security on OpenStack 11/7/2013
Building ARM IaaS Application Environment
WLCG Workshop 2017 [Manchester] Operations Session Summary
Organizations Are Embracing New Opportunities
Smart Cities and Communities and Social Innovation
The advances in IHEP Cloud facility
Web application hosting with Openshift, and Docker images
Resource Provisioning Services Introduction and Plans
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
Web application hosting with Openshift, and Docker images
Helge Meinhard, CERN-IT Grid Deployment Board 04-Nov-2015
Data Analytics and CERN IT Hadoop Service
Docker and Azure Container Service
Openlab Compute Provisioning Topics Tim Bell 1st March 2017
Wigner Datacenter’s New Software Defined Datacenter Architecture
Ops Manager API, Puppet and OpenStack – Fully automated orchestration from scratch! MongoDB World 2016.
Managing your IT Environment
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
Using OpenStack Sahara & Manila to run Analytics
SCD Cloud at STFC By Alexander Dibbo.
Tomi Juvonen SW Architect, Nokia
Interoperability in Modern Clouds using DevOps
WLCG Collaboration Workshop;
Red Hat User Group June 2014 Marco Berube, Cloud Solutions Architect
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
OPNFV Arno Installation & Validation Walk-Through
Kubernetes Container Orchestration
Introduction to Docker
Ease OpenStack : Non-Containerized to Containerized
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
11/23/2018 3:03 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Mix & Match: Resource Federation
Module 01 ETICS Overview ETICS Online Tutorials
OpenShift vs. Vanilla k8s on OpenStack IaaS
OpenStack Summit Berlin – November 14, 2018
Azure Container Service
Productive + Hybrid + Intelligent + Trusted
Nolan Leake Co-Founder, Cumulus Networks Paul Speciale
PayPal Cloud Journey & Architecture
Containers on Azure Peter Lasne Sr. Software Development Engineer
John Taylor, Deputy CISO Martin Myers, IT Architect
Presentation transcript:

CERN Cloud Service Update Luis Pigueiras On behalf of the CERN Cloud Team HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update CERN Cloud Recap Policy: servers in CERN IT shall be virtual Based on OpenStack Production service since July 2013 Currently between Mitaka/Newton (depending on the component) HEPIX Spring 2017 – CERN Cloud Service Update

CERN Cloud Architecture Composed by two data centers 1 region (1 API) and 50+ cells Cells map different uses cases Shared: 5 AVZ Project cells: special requirements (hardware, location) Batch cells: optimized for batch workload (90% of HW) Remember to explain: What is a cell? Remark size of batch cell HEPIX Spring 2017 – CERN Cloud Service Update

CERN Cloud Architecture HEPIX Spring 2017 – CERN Cloud Service Update

CERN Cloud in Numbers (1) 7000 hypervisors in production + ~2000 to be added (5.8K 1yr ago) 220K cores + 86K to be added (155k) 530 TB of RAM (350 TB) 3.7K volumes with 1.2 PB allocated (Cinder) (2.8K) 3.8K images/snapshots (Glance) (2.7K) 27 fileshares with 18 TB allocated (Manila) (new) 71 container clusters (Magnum) (new) HEPIX Spring 2017 – CERN Cloud Service Update

CERN Cloud in Numbers (2) ~ 400 operations/min in Nova API Mainly creation/deletion of VMs ~27K VMs at the moment (+5k 6 months ago) HEPIX Spring 2017 – CERN Cloud Service Update

Operations and new features HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Operations: Nova Upgrade from Liberty to Mitaka (done) and then from Mitaka to Newton (next week) Required to have all nodes in CC7 Done in two steps due to DB migration constraints Steps: Block APIs Backup DBs Upgrade Nova control plane Upgrade DB schema Validate Enable API Upgrade compute nodes (via Puppet and Yum) HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Operations: Nova HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Operations: Nova Hypervisors upgrade from CC7.2 to CC7.3 Upgrade done per cell Limit potential impact First batch cells, then project cells and finally shared cells Upgrade went fine, some minor issues: Few VMs lost network connectivity Kernel BUG soft lockup Around ~200 AMD nodes affected (1 cell) Various SELinux problems HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Operations: Nova Consolidation of KVM under CentOS on the compute nodes (on-going) Ceph as only backend for Volumes (NetApp decommission) RHEL and HyperV decommission Reduction of complexity Remove dependency on Windows expertise HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Operations: Neutron Networking service Started as a pilot in Q3 2015 Available in 10 cells Deployed with a custom driver Based on the linux bridge driver Integration with CERN’s network infrastructure Migration from deprecated nova-network to Neutron networks (during this year) Neutron server upgraded from Mitaka to Newton Neutron server moves but agents stay in Mitaka as they follow the Nova upgrade cycle HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Keystone Fernet tokens deployment Much better performance and less load on the server No need to store tokens in a database Additional project attributes Accounting group and project type for accounting purposes Cell mapping of the project (being prepared) HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Accounting Collecting cloud accounting data via cASO Storing data into S3 Processing data with Spark Accounting data visualization through webpage HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Rally Cloud benchmarking system Used to check status of the cloud since 2015 Cherry-pick to run as a non-admin user in the cloud Current release: Newton Stress-test the orchestration service and all the other services Dashboards with cell or service status HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Rally HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Magnum OpenStack project to treat orchestration engines as 1st class resources Docker Swarm, Kubernetes, Mesos and DC/OS Current release is in Newton (with cherry-picks from master) Timeline Pilot service Production service Container investigations Magnum tests Upstream development Mesos support 11/2015 02/2016 10/2016 HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Magnum Interaction with Magnum client to create the cluster Docker client to interact with the created client (or kubectl , REST API…) $ magnum cluster-create --name myswarmcluster --cluster-template swarm --node-count 100 $ magnum cluster-list +------+----------------+------------+--------------+-----------------+ | uuid | name | node_count | master_count | status | | .... | myswarmcluster | 100 | 1 | CREATE_COMPLETE | $ $(magnum cluster-config myswarmcluster --dir magnum/myswarmcluster) $ docker info / ps / ... $ docker run --volume-driver cvmfs -v atlas.cern.ch:/cvmfs/atlas -it centos /bin/bash [root@32f4cf39128d /]# Should take around 1 0/15 minutes to execute first command HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Magnum Several use cases at CERN Gitlab CI ROOTaaS / Swan (Jupyter Notebooks) FTS Batch ATLAS RECAST (Analysis Catalog) – Next slide … and many others ??? HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update What’s new? Magnum ATLAS RECAST use case for Magnum Software distribution in HEP Development environment parity with distributed/batch systems Reproducible, interactive development environment for personal development/software tutorials Continuous integration and release testing Analysis preservation and reusability In HEP we develop code locally, but compute globally. We need efficient ways to distribute our code, reproducibly and efficiently https://indico.cern.ch/event/625939/contributions/2527657/attachments/1443125/2222545/CERNIT_Talk.pdf The impact of RECAST depends entirely on the incorporation and integration of existing analyses into the framework. It builds on the efforts of experimental searches by extending and expanding their relevance to the community, all under the auspices of the collaborations. The design and considerations put forth here work aim to initiate a community wide effort to bring such a framework to life. HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Magnum feature plans Rolling upgrades of clusters Upgrade to new versions of Kubernetes, Swarm, … Heterogeneous clusters Mix of VMs and baremetal, spread across AVZs Container Monitoring (cAdvisor, Prometheus and Grafana) Deployed for K8S and WIP for Swarm LBaaS for HA Better storage support (EOS, CVMFS, …) HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Upcoming services Baremetal service Ironic API server, conductor and first node deployed last month Workflow service Mistral Will simplify operations Already deployed with testing workflow prototypes Will play along with Rundeck for workflows HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Upcoming services Fileshare service Manila Pilot since Q4 2016 CephFS as the backend Off-the-shelf integration with Kubernetes, Swarm… Need for a highly available FS ( to replace NFS filer service) Collaboration with FILER service Share configuration, certificates, etc HEPIX Spring 2017 – CERN Cloud Service Update

HEPIX Spring 2017 – CERN Cloud Service Update Summary Cloud service continues to grow Adding 2000 compute nodes with ~86K cores New components added to our cloud like with several use cases like Magnum and Manila Expansion planned (bare metal provisioning) Confronting some major operational challenges Scaling all of our services with the new HW additions Replacement of network layer http://openstack-in-production.blogspot.com HEPIX Spring 2017 – CERN Cloud Service Update