Germán Cancio CERN IT/FIO LCG workshop, 24/3/04

Slides:



Advertisements
Similar presentations
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Advertisements

ELFms status and deployment, 25/5/2004 ELFms, status, deployment Germán Cancio for CERN IT/FIO HEPiX spring 2004 Edinburgh 25/5/2004.
German Cancio – WP4 developments Partner Logo WP4-install plans WP6 meeting, Paris project conference
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
ASIS et le projet EU DataGrid (EDG) Germán Cancio IT/FIO.
The CERN Computer Centres October 14 th 2005 CERN.ch.
Current Status of Fabric Management at CERN, 26/7/2004 Current Status of Fabric Management at CERN CHEP 2004 Interlaken, 27/9/2004 CERN IT/FIO: G. Cancio,
Distributed Systems: Client/Server Computing
Automating Linux Installations at CERN G. Cancio, L. Cons, P. Defert, M. Olive, I. Reguero, C. Rossi IT/PDP, CERN presented by G. Cancio.
Understanding and Managing WebSphere V5
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
EGEE is a project funded by the European Union under contract IST Quattor Installation of Grid Software C. Loomis (LAL-Orsay) GDB (CERN) Sept.
ELFms meeting, 2/3/04 German Cancio, 2/3/04 Proxy servers in CERN-CC.
DataGrid is a project funded by the European Commission under contract IST IT Post-C5, Managing Computer Centre machines with Quattor.
EDG LCFGng: concepts Fabric Management Tutorial - n° 2 LCFG (Local ConFiGuration system)  LCFG is originally developed by the.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
Olof Bärring – WP4 summary- 6/3/ n° 1 Partner Logo WP4 report Status, issues and plans
Large Computer Centres Tony Cass Leader, Fabric Infrastructure & Operations Group Information Technology Department 14 th January and medium.
quattor NCM components introduction tutorial German Cancio CERN IT/FIO.
EDG WP4: installation task LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
German Cancio – WP4 developments Partner Logo System Management: Node Configuration & Software Package Management
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
G. Cancio, L. Cons, Ph. Defert - n°1 October 2002 Software Packages Management System for the EU DataGrid G. Cancio Melia, L. Cons, Ph. Defert. CERN/IT.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
SPMA & SWRep: Basic exercises HEPiX hands-on, NIKHEF 5/03 German Cancio
Software Management with Quattor German Cancio CERN/IT.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
Fabric Management with ELFms BARC-CERN collaboration meeting B.A.R.C. Mumbai 28/10/05 Presented by G. Cancio – CERN/IT.
German Cancio – WP4 developments Partner Logo WP4-install progress CERN, 19/6/2002 for WP4-install.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
ASIS + RPM: ASISwsmp German Cancio, Lionel Cons, Philippe Defert, Andras Nagy CERN/IT Presented by Alan Lovell.
Linux Operations and Administration
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
Linux Configuration using April 12 th 2010 L. Brarda / CERN (some slides & pictures taken from the Quattor website) ‏
Automated management…, 26/7/2004 Automated management of large fabrics with ELFms Germán Cancio for CERN IT/FIO LCG-Asia Workshop Taipei, 26/7/2004
Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.
Fabric Management: Progress and Plans PEB Tim Smith IT/FIO.
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
Quattor installation and use feedback from CNAF/T1 LCG Operation Workshop 25 may 2005 Andrea Chierici – INFN CNAF
Quattor: An administration toolkit for optimizing resources Marco Emilio Poleggi - CERN/INFN-CNAF German Cancio - CERN
Jean-Philippe Baud, IT-GD, CERN November 2007
System Monitoring with Lemon
AII v2 Ronald Starink Luis Fernando Muñoz Mejías
Netscape Application Server
Dag Toppe Larsen UiB/CERN CERN,
High Availability Linux (HA Linux)
Dag Toppe Larsen UiB/CERN CERN,
Status of Fabric Management at CERN
WP4 Fabric Management 3rd EU Review Maite Barroso - CERN
Consulting Services JobScheduler Architecture Decision Template
WP4-install status update
Status and plans of central CERN Linux facilities
German Cancio CERN IT .quattro architecture German Cancio CERN IT.
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
Quattor Usage at Nikhef
Leanne Guy EGEE JRA1 Test Team Manager
Oracle Solaris Zones Study Purpose Only
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Module 01 ETICS Overview ETICS Online Tutorials
The EU DataGrid Fabric Management Services
Sending data to EUROSTAT using STATEL and STADIUM web client
Deploying Production GRID Servers & Services
Presentation transcript:

Germán Cancio CERN IT/FIO LCG workshop, 24/3/04 http://quattor.org Quattor, an overview Germán Cancio CERN IT/FIO LCG workshop, 24/3/04 http://quattor.org

Outline Concepts Architecture and Functionality Deployment status

quattor in a nutshell : fabric management system Configuration, installation and management of fabric nodes Used to manage most of the Linux nodes in the CERN CC >2000 nodes out of ~ 2200 Multiple functionality (batch nodes, disk servers, tape servers, DB, web, …) Heterogeneous hardware (memory, HD size,..) Started in the scope of EDG WP4 Part of , together with LEMON monitoring system LEAF Hardware and State Mgmt system

quattor architecture - overview Node Configuration Management Node Management Configuration Management Configuration Database Configuration access and caching Graphical and Command Line Interfaces Node and Cluster Management Automated node installation Node Configuration Management Software distribution and management

Configuration Management

Configuration Information Configuration is expressed using a language called Pan Information is arranged into templates Common properties set only once Using templates it is possible to create hierarchies to match service structures CERN CC name_srv1: 137.138.16.5 time_srv1: ip-time-1 lxbatch cluster_name: lxbatch master: lxmaster01 pkg_add (lsf5.1) lxplus cluster_name: lxplus disk_srv lxplus001 eth0/ip: 137.138.4.246 pkg_add (lsf6_beta) lxplus020 eth0/ip: 137.138.4.225 lxplus029

Configuration Management Architecture GUI Scripts CLI S Q L CDB S O A P RDBMS XML pan H T P Cache CCM Node

Configuration Database (CDB) Keeps complete configuration information Configuration describes the desired state of the managed machines. Data consistency is enforced by a transaction mechanism All changes are done in transactions Configuration is validated and kept under version control (CVS) Built-in validation (e.g. types), user defined validation Detects concurrent modification conflicts SQL query interface for properties spanning across nodes eg. get all machines on LXBATCH with more than 512 Mbytes of memory Node-based Configuration Cache Manager (CCM) Fast, network-independent, and local access to configuration Avoid peaks on CDB servers More complete validation of the configuration is possible before applying it (explain in the slide nr 4) Concurrent access, conflict handling

Examples of information in CDB Hardware CPU Hard disk Network card Memory size Physical node location in CC Software Installed software packages (RPMs, PKGs) System Grid services configuration (currently WN’s) System services configuration (NFS mounts, SSH config..) Partition table Load balancing information Cluster information Cluster name and type Batch master Audit information Contract type and number Purchase date

Graphical User Interface - PanGUIn

Node (Cluster) Management

kernel, system, applications.. Managing (cluster) nodes RPM, PKG nfs http ftp Software Servers packages (RPM, PKG) SWRep Standard nodes Managed nodes cache SW package Manager (SPMA) System services AFS,LSF,SSH,accounting.. Installed software kernel, system, applications.. Install server Vendor System installer RH73, RHES, Fedora,… nfs/http CCM Node Configuration Manager (NCM) base OS Install Manager Node (re)install dhcp pxe CDB

Install Manager Sits on top of the standard vendor installer, and configures it OS version Which OS version to install Network and partition information What core packages Custom post-installation instructions Automated generation of control file (KickStart) It also takes care of managing DHCP (and TFTP/PXE) entries Can get its configuration information from CDB or via command line Available for RedHat Linux (Anaconda installer) Allows for plugins for other distributions (SuSE, Debian) or Solaris

Node Configuration NCM (Node Configuration Manager) is responsible for ensuring that reality on a node reflects the desired state in CDB. Framework system, where service specific plug-ins called Components make the necessary system changes Regenerate local config files (eg. /etc/sshd/sshd_config) Restard/reload services (SysV scripts) configuration dependencies (eg. configure network before sendmail) Components invoked on boot, via cron or on CDB config changes Component support libraries for ease of component development A subset of NCM components already available, full set will be available for the next certified CERN Linux (CEL3) end of April.

Software Management (I - Server) SWRep = Software Repository Universal repository for storing Software: Extendable to multiple platforms and packagers (RH Linux RPM, Solaris PKG, others like Debian pkg) Multiple package versions/releases Management (“product maintainers”) interface: ACL based mechanism to grant/deny modification rights (packages associated to “areas”) Client access: via standard protocols HTTP, AFS/NFS, FTP Replication for load balancing/redundancy: using standard tools Apache mod_proxy rsync

Software Management (II - Clients) SPMA = Software Package Management Agent Manage all or a subset of packages on the nodes On production nodes: full control - wipe out unknown packages, (re)install missing ones. On development nodes (or desktops): non-intrusive, configurable management of system and security updates. Package manager, not only upgrader Can roll back package versions Transactional verification of operations Portability: Generic plug-in framework Plug-ins available for Linux RPM and Solaris PKG, (can be extended) Scalability: Supports HTTP (also FTP, AFS/NFS) time smearing Package pre-caching Possible to access multiple repositories (division/experiment specific)

Improvements wrt EDG-LCFG New and powerful configuration language True hierarchical structures Extendable data manipulation language (user defined) typing and validation SQL query backend Portability Plug-in architecture -> Linux and Solaris Enhanced components Sharing of configuration data between components now possible New component support libraries Native configuration access API (NVA-API) Stick to the standards where possible Installation subsystem uses system installer Components don’t replace SysV init.d subsystem Modularity Clearly defined interfaces and protocols Mostly independent modules “light” functionality built in (eg. package management) Improved scalability Enabled for proxy technology NFS mounts not necessary any longer Enhanced management of software packages ACL’s for SWRep Multiple versions installable No need for RPM ‘header’ files Last but not least…: Support! EDG-LCFG is frozen and obsoleted (no ports to newer Linux versions) LCFG -> EDG-LCFGng -> quattor

Deployment

Quattor deployment @ CERN (I) Quattor is used by FIO to manage most CC Linux nodes: >1700 nodes, 15 clusters – to be scaled up to >5000 in 2006-8 (LHC) LXPLUS, LXBATCH, LXBUILD, disk and tape servers, Oracle DB servers RedHat 7.3 and RHES 2.1 CEL3 / RHES30 (also on IA64) to come soon (porting now) Server cluster (LXSERV) hosting replicated CDB and SWRep Started now: head nodes using Apache proxy technology for software and configuration distribution (see next slide) Quattor will be available on Linux desktops for CEL3 Solaris clusters, server nodes and desktops to come for Solaris9

DNS-load balanced HTTP Proxy architecture LXSERV backend M M’ frontend Installation images, RPMs, configuration profiles DNS-load balanced HTTP Level-1 and Level-2 proxies. Level-1 proxies still needed for serving non-head node clients, and as fail-over for head nodes which are down H H H

Quattor deployment @ CERN (II) LCG-2 WN configuration: > 400 nodes configured as LCG-2 Worker Nodes (250 for CMS) Configuration components for RM, EDG/LCG setup, Globus Usage examples: Upgrade from LSF 4.2 to LSF 5.1 on >1000 nodes within 15 minutes, without service interruption All sw (functional and security) upgrades are done by SPMA openssl/ssh security updates KDE upgrades (~ 400 MB per node) on >700 nodes etc … (~once a week!) Kernel upgrades: SPMA can handle multiple versions of the same package -> Allows to separate in time installation and activation (after reboot) of new kernel

Deployment outside CERN-CC EDG: no time for wide deployment Estimated effort for moving from LCFG to quattor exceeded remaining EDG lifetime EDG focus on stability rather than middleware functionality Tutorials held at HEPiX and EDG conferences have caused positive feedback and interests: Experiments: CMS, LHCb, Atlas HEP institutes: UAM Madrid, LAL/IN2P3, NIKHEF, Liverpool University Projects: Grille 5K (CNRS France) Community driven effort to use quattor for general LCG-2 configuration Workshop this Friday to define initial steps Based on already existing WN config components CERN will help with deployment at other sites Collaboration for providing missing pieces, eg. configuration components, GUI’s, beginner’s user guides?

http://quattor.org

Differences with ROCKS Rocks: better documentation, nice GUI, easy to setup Design principle: reinstall nodes in case of configuration changes No configuration or software updates on running systems Suited for production? Efficiency on batch nodes, upgrades / reconfigs on 24/24,7/7 servers (eg. gzip security fix, reconfig of CE address on WN’s) Assumptions on network structure (private,public parts) and node naming No indication on how to extend the predefined node types or extend the configured services Limited configuration capacities (key/value) No multiple package versions (neither on repository, nor simultaneously on a single node) Eg. different kernel versions on specific node types Works only for RH Linux (Anaconda installer extensions)

Differences with ASIS/SUE ASIS: See post-C5 14/3/2003 Scalability HTTP vs. shared file system Supports native packaging system (RPM, PKG) Manages all software on the node ‘real’ Central Configuration database (But: no end-user GUI, no package generation tool) SUE: Focus on configuration, not installation Powerful configuration language True hierarchical structures Extendable data manipulation language (user defined) typing and validation Sharing of configuration data between components now possible Central Configuration Database Supports unconfiguring services Improved depenency model Pre/post dependencies Revamped component support libraries

NCM Component example sub Configure { my ($self,$config) = @_; [...] sub Configure { my ($self,$config) = @_; # access configuration information my $arch=$config->getValue('/system/architecture’); # CDB API $self->Fail (“not supported") unless ($arch eq ‘i386’); # (re)generate and/or update local config file(s) open (myconfig,’/etc/myconfig’); … # notify affected (SysV) services if required if ($changed) { system(‘/sbin/service myservice reload’); … } sub Unconfigure { ... }

Key concepts behind quattor Autonomous nodes: Local configuration files No remote management scripts No reliance on global file systems AFS/NFS Central control: Primary configuration is kept centrally (and replicated on the nodes) A single source for all configuration information Reproducibility: Idempotent operations Atomicity of operations Scalability: Load balanced servers, scalable protocols Use of standards: HTTP, XML, RPM/PKG, SysV init scripts, … Portability: Linux, Solaris