DAQ LHC Workshop Configuration management systems

Slides:



Advertisements
Similar presentations
About Me CTO, Individual Digital, Inc. (Startup) Author of ext/tidy, PHP 5 Unleashed, Zend Ent. PHP Patterns
Advertisements

Update on Version Control Systems: GitLab, SVN, Git, Trac, CERNforge
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Tools and software process for the FLP prototype B. von Haller 9. June 2015 CERN.
KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
Summary of Functions Coordinator of the physics lists working group in Geant4 and contributor to physics lists. (20%) Developer of Geant4 hadronic physics,
Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.
Michael Still Google Inc. October, Managing Unix servers the slack way Tools and techniques for managing large numbers of Unix machines Michael.
Continuous Integration and Code Review: how IT can help Alex Lossent – IT/PES – Version Control Systems 29-Sep st Forum1.
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
The ATLAS DAQ System Online Configurations Database Service Challenge J. Almeida, M. Dobson, A. Kazarov, G. Lehmann-Miotto, J.E. Sloper, I. Soloviev and.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
CERN AI Config Management 16/07/15 AI for INFN visit2 Overview for INFN visit.
Ansible and Ansible Tower 1 A simple IT automation platform November 2015 Leandro Fernandez and Blaž Zupanc.
Cloud Installation & Configuration Management. Outline  Definitions  Tools, “Comparison”  References.
Overview of cluster management tools Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
1 Diana Scannicchio on behalf of ALICE, ATLAS, CMS, LHCb System Administration Diana Scannicchio on behalf of ALICE, ATLAS, CMS, LHCb System Administration.
SCDB Update Michel Jouvin LAL, Orsay March 17, 2010 Quattor Workshop, Thessaloniki.
DECTRIS Ltd Baden-Daettwil Switzerland Continuous Integration and Automatic Testing for the FLUKA release using Jenkins (and Docker)
OSCAR Symposium – Quebec City, Canada – June 2008 Proposal for Modifications to the OSCAR Architecture to Address Challenges in Distributed System Management.
SPI Report for the LHCC Comprehensive Review Stefan Roiser for the SPI project.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES GIT Service in the Agile Infrastructure Project Vítor.
Distributed Monitoring with Nagios: Past, Present, Future Mike Guthrie
Nov 05, 2008, PragueSA3 Workshop1 A short presentation from Owen Synge SA3 and dCache.
Canadian Bioinformatics Workshops
CERN Openlab Openlab II virtualization developments Havard Bjerke.
Accessing the VI-SEEM infrastructure
Review of the WLCG experiments compute plans
ONAP on Vagrant for ONAPers
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
DPM at ATLAS sites and testbeds in Italy
C Loomis (CNRS/LAL) and V. Floros (GRNET)
Use of HLT farm and Clouds in ALICE
AI How to: System Update and Additional Software
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
Virtualization Review and Discussion
Cloud Challenges C. Loomis (CNRS/LAL) EGI-TF (Amsterdam)
StoRM: a SRM solution for disk based storage systems
Dag Toppe Larsen UiB/CERN CERN,
U.S. ATLAS Grid Production Experience
Marc Dobson On behalf of CMS DAQ Team
Dag Toppe Larsen UiB/CERN CERN,
CMS High Level Trigger Configuration Management
The “Understanding Performance!” team in CERN IT
Quattor Usage at Nikhef
Quality Control in the dCache team.
THE STEPS TO MANAGE THE GRID
Spacewalk and Koji at Fermilab
ETICS Services Management
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Drupal VM and Docker4Drupal For Drupal Development Platform
Deploy OpenStack with Ubuntu Autopilot
Drupal VM and Docker4Drupal as Consistent Drupal Development Platform
Frederic Schaer, Sophie Ferry
Ansible and Zabbix Rushikesh Prabhune (Software Technical Consultant)
Database Services for CERN Deployment and Monitoring
Alan Chalker and Eric Franz Ohio Supercomputer Center
Leigh Grundhoefer Indiana University
Module 01 ETICS Overview ETICS Online Tutorials
Configuration management suite
OpenStack Summit Berlin – November 14, 2018
DBOS DecisionBrain Optimization Server
Client/Server Computing and Web Technologies
Deploying machine learning models at scale
Presentation transcript:

DAQ LHC Workshop Configuration management systems Aymeric Dupont On behalf of Alice, Atlas, CMS and LHCb

Summary Generalities Dropping Quattor ! Why ? Expectations Which candidate to take over ? Why puppet won the vote ? General status Experiments state of the art Experiments specific software deployment Evolution for LS1 Conclusion

Generalities Situation Current tool Why configuration management ? Deploy and maintain Ensure consistency of configuration Guarantee service performance and persistency of a computing infrastructure Which challenges are we facing ? Growing complexity and heterogeneity Increasing mission importance Maintaining this with the same manpower Current tool IT & experiments relying on Quattor (till past 2-3 years at least)

Why dropping Quattor ? IT Atlas Cms LHCb « home-made » solution (no longer supported) Open-source solutions becoming more and more serious alternatives Atlas Various configuration aspects not handled Modules lacking of consistencies (no support) Complex packages management (dependencies, workload) Lack of flexibility in configuration instructions (execution order) Cms To follow industry standards To support much varied configurations aspects. LHCb No dropping scheduled yet Pour LHCd, mentionner le fait que quattor fait le job même si certains composants sont buggés !

Expectations Which expectations for a new tool ? Wider features : Installing/Configuring software Configuring hardware (bonding, routes) Handling every aspects of the administration of a heterogeneous infrastructure (virtual hosts declaration...) Software support : Continuous development Bug tracking/fixes Pour LHCd, mentionner le fait que quattor fait le job même si certains composants sont buggés !

Potential candidate evaluation Cms CFEngine (2 months) Grand-daddy solution for configuration management (Used by major computing company (Facebook, Cisco, AMD...)) Alice & Atlas CFEngine Chef Actively contributing community (recipes repositories) LHCb First Puppet tests last year New evaluation campaign to be launched Migration to another CMT still being discussed (Experiment no longer mentionned in the next slides)

Why Puppet won the vote ? IT Modular configuration Migration to Puppet as core of Agile Infrastructure (triggering factor) Modular configuration Modules much easier to create Finer grain management Additional facilities offered (files tidy-up, ssh keys...) Large community support Lots of users feedbacks Plenty of contributors (puppet forge) Syntax Clear and easy to understand Declarative approach similar to quattor Behaviour Quattor-like (unlike CFEngine requiring successive runs) First experience Small cluster administration from 2008 @ Univ. Johannesburg

Experiments State of the art

General status Production experience Cms Atlas Alice ~3 years TDAQ testbed (~300 systems) All Point 1 machines administered. Alice ~1 year (production farm) Cms Spare SLC6 templates (1 machine only) Storage manager (1 machine only)

State of the art Aspects Alice Atlas CMS Supported systems SLC5+6 SLC6 Changes versioning SVN Local git Changes propagation On master then pushed to SVN SVN itself Modified Puppet-sync Repositories management and setup Flat repository Home-built RPMs versioned with SVN Snapshotted by home-made scripts Experiment specific packages deployment See next slide

Specific packages deployment Alice Explicit declaration for each subsystems Atlas Explicit segregation based on environment vars for DAQ and Offline Installation handled by Trigger and DAQ librarians Yum repos pointing to specific snapshots for OS packages Cms Dedicated repositories snapshotted by versions for DAQ Timestamped snapshots for OS repositories (like IT)

State of the art (2) Aspects Alice Atlas CMS Certificates for new machines Removed during install Autosigned at first puppet run Created on the fly during install Monitoring Logs Puppet dashboard None Scalability Single machine Serverless setup Manifests browsing GUI Auto-generated inclusion tree Modules layout One module/function (auth...) One module/service (sssd/krb5/autofs/...)

LS1 Evolution Alice Atlas Cms LHCb Full scale migration Migration to puppet 3 + puppetDB Dashboard deprecated => Foreman Pulp (Katello UI) for repo management ? Cms Full scale migration (SLC6 only) Improve scalability (still running on Webrick !) Monitoring => Foreman LHCb SLC6 migration (Quattor server)

Conclusion Expectations fulfilled Puppet does the job (valid production experience) Having IT migrating is a great asset Collaborating and sharing informations Good contact with IT Nice collaboration between experiments (Atlas scripts&docs to mirror&snapshot repositories) Xxperiment meetings (IT attends them) New challenges Setting up pre-production systems Writing migration policies and validation procedures

Questions ? Thanks to... Alice Atlas Cms LHCb Ulrich Fuchs Adriana Telesca Atlas Sergio Ballestrero Cms Marc Dobson LHCb Niko Neufeld Loïc Brarda Questions ?

Extra slides

Atlas inclusion tree graph

Atlas inclusion tree graph (detail)

Puppet code example Class declaration Node declaration