ATLAS & CMS Online Clouds

Slides:



Advertisements
Similar presentations
© 2012 IBM Corporation Architecture of Quantum Folsom Release Yong Sheng Gong ( 龚永生 ) gongysh #openstack-dev Quantum Core developer.
Advertisements

Alessandro Di Girolamo CERN IT-SDC-OL  Many reports in the last weeks: 4 th July ATLAS DAQ/HLT Software and Operations (
CloudStack Scalability Testing, Development, Results, and Futures Anthony Xu Apache CloudStack contributor.
Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
VoipNow Core Solution capabilities and business value.
1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Zhipeng (Howard) Huang
CMS Diverse use of clouds David Colling
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Agenda Network Infrastructures LCG Architecture Management
Opensource for Cloud Deployments – Risk – Reward – Reality
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
1 port BOSS on Wenjing Wu (IHEP-CC)
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Ceph Storage in OpenStack Part 2 openstack-ch,
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
OpenStack cloud at Oxford Kashif Mohammad University of Oxford.
Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Agile Infrastructure IaaS Compute Jan van Eldik CERN IT Department Status Update 6 July 2012.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
Predrag Buncic (CERN/PH-SFT) Virtualizing LHC Applications.
Tim Bell 04/07/2013 Intel Openlab Briefing2.
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen Budapest
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1 how to profit of the ATLAS HLT farm during the LS1 & after Sergio Ballestrero.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka,
CON8473 – Oracle Distribution of OpenStack Ronen Kofman Director of Product Management Oracle OpenStack September, 2014 Copyright © 2014, Oracle and/or.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Course: Cluster, grid and cloud computing systems Course author: Prof
Security on OpenStack 11/7/2013
Use of HLT farm and Clouds in ALICE
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
Blueprint of Persistent Infrastructure as a Service
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Virtualization in the gLite Grid Middleware software process
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
Managing allocatable resources
Presentation transcript:

ATLAS & CMS Online Clouds exploit the Experiments’ online farms for offline activities during the LS1 & beyond Olivier Chaze (CERN-PH-CMD) & Alessandro Di Girolamo (CERN IT-SDC-OL)

The Large Hadron Collider ~ 100 m Alessandro Di Girolamo 6 Dec 2013

The experiment data flow 40 MHz (1000 TB/sec) Trigger Level 1 - Special Hardware 75 kHz (75 GB/sec) Trigger Level 2 - Embedded Processors 5 kHz (5 GB/sec) Trigger Level 3 - Farm of commodity CPUs …similar for each experiment... 400 Hz (400 MB/sec) Tier0 (CERN Computing Centre) Data Recording & Offline Analysis Alessandro Di Girolamo 6 Dec 2013

Resources overview High Level Trigger Experiment farms ATLAS P1: 15k cores (28k Hyper Threading, 25% reserved for TDAQ) CMS P5: 13k cores (21k Hyper Threading) when available: ~50% bigger than the Tier0, doubling the capacity of biggest Tier1 of the Experiments Network connectivity to the IT Computing Centre (Tier0) Type Current status P1 ↔ CERN IT CC ( so called Castor link) 70 Gbps (20 Gbps reserved for Sim@P1) P5 ↔ CERN IT CC 20Gbps (80 Gbps foreseen in the next months) Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Why Experiments always resource hungry: ATLAS + CMS: more than 250k jobs running in parallel … exploit all the available resources! Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Teamwork Experts from the Trigger & Data Acquisition teams of the Experiments Experts from other institutes BNL RACF, Imperial College … Experts of WLCG (Worldwide LHC Computing Grid) Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Why Cloud? Cloud as an overlay infrastructure provides necessary management of VM resources support & control of physical hosts remain with TDAQ delegate Grid support  easy to quickly switch from HLT ↔ Grid during LS1: periodic full-scale test of TDAQ sw upgrade can be used in the future also during short LHC stop OpenStack: common solution, big community! CMS, ATLAS, BNL, CERN IT…. sharing experiences …and support if needed Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo OpenStack Glance: VM base image storage and management Central image repository (and distribution) for Nova Nova: Central operations controller for hypervisors and VMs CLI tools, VM scheduler, Compute node client Network in multi-host mode for CMS Horizon/ High level control tools WebUI for Openstack infrastructure/project/VM control (limited use) RabbitMQ ATLAS: OpenStack version currently used: Folsom CMS: OpenStack version currently used: Grizzly Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Network challenges Avoid any interference with: Detector Control System operations internal Control network Each compute node is connected to two networks One subnet per rack per network Routers allow traffic to registered machines only ATLAS A new dedicated VLAN has been setup VMs are registered on this network CMS VMs aren’t registered SNAT rules defined on the hypervisors to bypass network limitations. Source NAT Alessandro Di Girolamo 6 Dec 2013

CMS online Cloud Cloud infrastructure for CMS 4 controllers hosting : GRID services CVMFS, Glideins, Condor, Castor/EOS network to IT Computing Centre 20Gb Data network (10.179.0.0/22) Nova network (10.29.0.0/16) br928 br928 NAT SNAT Controller (x4) Compute Node (x 1.3k) Corosync/Pacemaker Corosync/Pacemaker Corosync/Pacemaker Corosync/Pacemaker VM Dashboard api VM VM Horizon Dashboard Dashboard api Nova APIs api CMS Site MySQL Cluster group2 group1 db4 db3 db1 db2 mgmt1 mgmt2 Keystone Keystone Keystone conductor conductor Keystone Nova Scheduler conductor Libvirt/KVM Gateways 10.29.0.1 and 10.29.0.2 Nova Network Cloud infrastructure for CMS 4 controllers hosting : RabbitMQ using parallel queues. For ATLAS for example minimum two nodes was required in order to scale beyond 1k hypervisors per single Nova Controller. Openstack services : each node run Corosync (manage communication between nodes)/Pacemaker (library to check the healtyness of the system)+ virtual IPs and DNS round robin aliases on the 4 controllers Default gateways for VMs (Corosync/Pacemaker + virtual IPs) The flat Nova network is on top of the Data network and isolated in a vlan MySQL Cluster for the database backend, 4 machines for hosting the data in 2 groups for performance and reliability reasons Glance server Nova Compute NAT NAT Rabbit MQ NAT Nova Metadata ethX ethX ethX GPN network Control network (10.176.0.0/25) Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo VM image SL5 x86_64 based (KVM HyperVisor) Post-boot contextualization method: script injected into the base image (Puppet in the future) Pre-caching of images on HyperVisors bzip2 compressed QCOW2 images that are about 350 MB Experiment specific SW distribution with CVMFS CVMFS: network file system based on HTTP and optimized to deliver experiment software. Alessandro Di Girolamo 6 Dec 2013

ATLAS: Sim@P1 running jobs: 1 June – 20 Oct Last month dedicated to interventions on the infrastructure 17 k ATLAS TDAQ TR August LHC P1 Cooling Intervention ATLAS TDAQ TR July Deployment Sim@P1 Overall: ~ 55% of time available for Sim@P1 Alessandro Di Girolamo 6 Dec 2013

ATLAS: Sim@P1: Start & Stop 45 min, 6 Hz 17.1k job slots Restoring Sim@P1: VMs all up and running within 45min (6Hz) MC Production jobs flow 0.8 Hz now improved to almost 1.5 Hz Shutdown: 10min (29Hz) the infrastructure is back to TDAQ CPUs Aug 27, 2013 Job flow: 0.8 Hz Restoring the VM group from the dormant state Sep 2, 2013 10 min, 29 Hz Ungraceful shutdown of the VM group (rack by rack) Alessandro Di Girolamo 6 Dec 2013 21 October 2013 13

ATLAS: Sim@P1 completed jobs: 1 June – 20 Oct Overall: ~ 55% of time available for Sim@P1 Total successful jobs: 1.65M Efficiency: 80% Total WallClock: 63.8 G seconds WallClock failed jobs: 10.3% 78% Lost Heartbeat: Intrinsic to the opportunistic nature of resources Comparison with CERN-PROD Total WallClock: 83.3 Gsec WallClock failed jobs: 6% Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Conclusions Experiments online Clouds are a reality Cloud solution: no impact on data taking, easy switch of activity: Quick onto them, quick out from them, e.g.: from 0 to 17k Sim@P1 jobs running in 3.5hours from 17k Sim@P1 jobs running to TDAQ ready in10 mins contributing to computing as one big Tier1 or CERN-PROD! Operations: still a lot of (small) things to do Integrate OpenStack with the online control to allow dynamic allocation of resources to the Cloud open questions not unique to the experiments’ online clouds: opportunity to unify solutions to minimize manpower! Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo BackUp Alessandro Di Girolamo 6 Dec 2013

Sim@P1: Dedicated Network Infrastructure Sim@P1 VMs will use a dedicated 1 Gbps physical network connecting the P1 rack data switches to the “Castor router” Data Core ATLAS HLT SubFarm Output Nodes (SFOs) 10 Gbps Racks in SDX1 ATCN VLAN isolation P1 Castor Router CN 1 Gbps Ctrl Core 1 Gbps ACLs CN GPN Services ATLAS: Puppet, REPO During this winter the sysadmin installed this new link between each rack and the castor router, on this link there is a vlan used by the VM. In this way all the traffic of the VMs is completely separated from the rest of the P1 traffic. Through this castor link are accessible all the configuration (e.g. puppet) and GRID services Control switch Data switch CN 20-80 Gbps ... 1 Gbps 1 Gbps IT Castor Router IT GRID: Condor, Panda EOS/Castor, CvmFS CN Alessandro Di Girolamo 6 Dec 2013

Cloud Infrastructure of Point1 (SDX1) Keystone Horizon RabbitMQ Cluster Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Keystone (2012.2.4) Slide from Alex Zaytsev (BNL RACF) No issues with stability/performance observed of any scale Initial configuration of tenant / users / services / endpoints might deserve some higher level automation Some automatic configuration scripts were already available in 2013Q1 from the third parties, but we found that using the Keystone CLI directly is more convenient & transparent Simple replication of the keystone MysQL DB works fine for maintaining redundant Keystone instances Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Nova (2012.2.4) Slide from Alex Zaytsev (BNL RACF) Once bug fix needed to be applied Back port from Grizzly release: “Handle compute node records with no timestamp”: https://github.com/openstack/nova/commit/fad69df25ffcea2a44cbf3ef636a68863a2d64d9 The prefix for the VMs’ MAC addresses had to be changed in order to match the range pre-allocated for Sim@P1 project No configuration option for this, direct patch to Python code was needed Configuring the server environment for Nova Controller supporting more than 1k hypervisors / 1k of VMs requires rising the default limits for maximum number of open files per several system users Not documented / handled automatically by Openstack recommended configuration procedures, but pretty straightforward to figure out RabbitMQ cluster consisting of minimum two nodes was required in order to scale beyond 1k hypervisors per single Nova Controller RabbitMQ configuration procedure / stability is version sensitive We had to try several version (currently v3.1.3-1) before achieving a stable cluster configuration Overall: stable long term operations with only one Cloud controller (plus one hot spare backup instance) for the entire Point 1 Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Glance (2012.2.4) Slide from Alex Zaytsev (BNL RACF) Single Glance instance (provided with single 1 Gbps uplink) works nicely as an central distribution point up to the scale of about 100 hypervisors / 100 VM instances Scaling beyond that (1.3k hypervisors, 2.1k VM instances) requires either A dedicated group of cache servers between Glance and hypervisors Custom made mechanism for pre-deployment of the base images on all compute nodes (multi-level replication) Since we operate with only one base image at the time which changes rarely (approximately once a month) we built a custom image deployment mechanism, living the central Glance instances with functionality of image repositories, but not the image central distribution points No additional cache servers needed We distribute bzip2 compressed QCOW2 images that re only about 350 MB in size Pre-placement of the new image to all the hypervisors take in total only about 15 minutes despite 1 Gbps network limitations on both Glance instances and on the level of every rack of compute nodes Snapshot functionality of Glance is used only for making persistent changes in the base image No changes are saved for VM instances during production operations Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo Horizon (2012.2.4) Slide from Alex Zaytsev (BNL RACF) Very early version of the web interface Many security features are missing Such as no native HTTPS support Not currently used for production at Sim@P1 Several configuration / stability issues encountered Such as debug mode must be enabled in order for Horizon to function properly Limited feature set of the web interface No way to perform non-trivial network configuration purely via web-interface No way to handle large groups of VMs (1-2k+) in a conveniently, such as to display VM instances in a tree structured according to the configuration of the availability zones / instance names No convenient way to perform bulk operations on large subgroups of VMs (hundreds) within the production group of VMs consisting of 1-2k All of these problems, presumably, already addressed in the resent Openstack releases Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo … … .. Alessandro Di Girolamo 6 Dec 2013

Alessandro Di Girolamo … … .. Alessandro Di Girolamo 6 Dec 2013