Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quattor and ELFms An introduction for the new Sysadmins (and others) Sophie Lemaitre Véronique Lefébure April 2011 CF-ASI.

Similar presentations


Presentation on theme: "Quattor and ELFms An introduction for the new Sysadmins (and others) Sophie Lemaitre Véronique Lefébure April 2011 CF-ASI."— Presentation transcript:

1 Quattor and ELFms An introduction for the new Sysadmins (and others) Sophie Lemaitre Véronique Lefébure April 2011 CF-ASI

2 ELFms Quattor: CDB, SWREP, NCM PrepareInstall SINDES SMS LEMON/LAS Notd, wassh LEAF, HMS SLS, SDB SLSSDB CLUMAN (in production since September 2010) CLUMAN

3 See http://quattor.orghttp://quattor.org Developed 2002-3: Grid project Now, maintained by the “Quattor Community” 2 workshops per year quattor-discuss@lists.sourceforge.net Code in SourceForge

4 Quattor – Set of tools to manage large clusters – Configuration in templates – Using Pan language From Wikipedia – “Quattor is a large scale fabric management system for managing medium to very large clusters. It is largely similar to LCFG and is particularly well suited to Grid computing sites. Quattor was developed at CERN for use in the LCG project. Quattor is used by many Grid sites around the world today. A [...]site will use profiles to configure machines (virtual and physical) using a language called PAN, this is then compiled using panc [...], and creates XML profiles, which the NCM (node configuration manager) on each machine uses to configure the machine. Typical configuration includes software package installation and machine settings (grub for instance).”

5 1.CDB: Configuration Database – Based on CVS – SCDB exists, based on SVN, does not scale for the number of users we have at CERN, but is used by nearly all other Quattor sites 2.SWREP: Software Repository – SPMA is the link between CDB and Swrep 3.NCM: Node Configuration Modules (or “components”) – Modules (scripts) that apply the configuration onto the client nodes

6 CDB: Configuration Database Host per host configuration, including: – List of RPM packages – Service configurations: eg. grub, chkconfig, network... ~Everything – Monitoring Configuration (LEMON) Hardware monitoring (ex: disk failures) Software monitoring (ex.: daemon restart) – Access control configuration Interactive, sudo, root access  Every host has its own.xml profile Size: lxplus: 400kB, lxbatch: 620kB

7 CDB: Configuration Database Common configuration bits usable by many hosts: Inheritance and Hierarchy of “templates” Templates are edited by – Service managers, via “cdbop” (from lxadm or lxvoadm) – Automated scripts, eg: HMS  LEAF scripts Declarative Language: PAN – Includes field validations From PAN template to.xml: PANC compiler Stages possible: “prod”, “preprod”, “test”, ”usertest”

8 CERN CC Typical Host Profile Template As created by HMS LEAFAddHost: – See for ex. http://tpl-viewer.cern.ch/cdb-tpl-view/tpl_view.php?profile=profiles/profile_lxplus310http://tpl-viewer.cern.ch/cdb-tpl-view/tpl_view.php?profile=profiles/profile_lxplus310 object template profile_lxplus310; include { 'stages/preprod' };  stage (prod/preprod/test/usertest) variable ELFMS_OS = "slc5"; variable ELFMS_ARCH = "x86_64"; variable ELFMS_SVCCLASS = "lxplus"; variable ELFMS_RESOURCE = "c3"; variable ELFMS_CUSTOMIZATION = "lxplusSLC5"; include { 'quattor/profile_declarations' }; include { 'netinfo/lxplus310' };  generated from LANDB data include { 'hardware/machines/e4_08_60' };  everything about the hw,incl. monitoring include { 'vpd/lxplus310' };  serial numbers, mac addresses,... include { 'serial_map_lxc1rk17' };  console configuration include { 'cluster/'+ELFMS_SVCCLASS+'/config' };  CLUSTER configuration include { if (exists('customization/'+ELFMS_RESOURCE+'/'+ELFMS_CUSTOMIZATION+'/config')) 'customization/'+ELFMS_RESOURCE+'/'+ELFMS_CUSTOMIZATION+'/config' };  SUBCLUSTER config if any # machine moved on 30.04.09 from = "sp15"; "/system/oldnames/1/" = nlist("name", "lxbsp1535", "date", "30.04.09" ); "/hardware/rack/name" = "rk17";  rack location “/hardware/u_position” = 10;stages/preprodquattor/profile_declarationsnetinfo/lxplus310hardware/machines/e4_08_60vpd/lxplus310serial_map_lxc1rk17cluster/'+ELFMS_SVCCLASS+'/configcustomization/'+ELFMS_RESOURCE+'/'+ELFMS_CUSTOMIZATION+'/config # MUST BE LAST LINE: include { if(exists("/software/repositories")) 'quattor/repository_cleanup' };quattor/repository_cleanup

9 Netinfo CDB Template template netinfo/cdbserv01; "/system/network/hostname" = "cdbserv01"; "/system/network/domainname" = "cern.ch"; "/system/responsible/forename" = "E-GROUP"; "/system/responsible/familyname" = "IT-DEP-FIO-SMOD"; "/hardware/location" = "0513 S-0034"; "/system/network/interfaces/eth0/ip" = "137.138.4.169"; "/system/network/interfaces/eth0/gateway" = "137.138.1.1"; "/system/network/interfaces/eth0/netmask" = "255.255.0.0"; "/system/network/interfaces/eth0/switchmedium" = 1024000;  Data taken from LANDB with “LEAFGenerateNetinfo “LANDB

10 Hardware CDB Template template hardware/machines/e4_08_60; include { 'hardware/vendors/e4' }; "/hardware/model" = "e4_08_60"; "/hardware/contract/warrantyid" = "cd1001333";  Link to CDBW "/hardware/mperf" = 11318.0; "/hardware/hepspec06" = 70.17; # BIOS "/hardware/bios/vendor" = "Phoenix Technologies LTD"; "/hardware/bios/version" = "1.1a (04/09/2009)"; # BMC "/hardware/cards/bmc/0/manufacturer" = "Super Micro"; "/hardware/cards/bmc/0/name" = "AOC-IPMI20-E"; "/hardware/cards/bmc/0/type" = "2.0"; "/hardware/cards/bmc/0/version" = "1.60.0 Feb-13-2009-16-00-NonKVM"; # CPUs "/hardware/cpu/0/vendor" = "GenuineIntel"; "/hardware/cpu/0/model" = "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz"; "/hardware/cpu/0/speed" = 2500; "/hardware/cpu/0/cores" = 4;... # RAM "/hardware/ram/0/vendor" = "Samsung"; "/hardware/ram/0/model" = "M3 93T5160QZA-CE6"; "/hardware/ram/0/size" = 4096; "/hardware/ram/0/type" = "DDR2"; "/hardware/ram/0/data_rate" = "667"; "/hardware/ram/0/location" = "CH0_DIMM0";...  Managed by the Procurement team,  Maintained by sysadmins # ATA "/hardware/cards/sata/_0/manufacturer" = "Intel Corporation"; "/hardware/cards/sata/_0/model" = "82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller"; "/hardware/cards/sata/_0/location" = "/sys/devices/pci0000:00/0000:00:1f.2"; "/hardware/cards/sata/_0/numberports" = 2; "/hardware/cards/sata/_0/ports/_0/manufacturer" = "Hitachi"; "/hardware/cards/sata/_0/ports/_0/model" = "HDP725025GLA380"; "/hardware/cards/sata/_0/ports/_0/capacity" = 238475; "/hardware/cards/sata/_0/ports/_0/version" = "GM2OA5TA"; "/system/blockdevices/physical_devs/sda/device_path" = "sata/_0/ports/_0"; "/system/blockdevices/physical_devs/sda/label" = "none"; "/hardware/cards/sata/_0/ports/_1/manufacturer" = "Hitachi"; "/hardware/cards/sata/_0/ports/_1/model" = "HDP725025GLA380"; "/hardware/cards/sata/_0/ports/_1/capacity" = 238475; "/hardware/cards/sata/_0/ports/_1/version" = "GM2OA5TA"; "/system/blockdevices/physical_devs/sdb/device_path" = "sata/_0/ports/_1"; "/system/blockdevices/physical_devs/sdb/label" = "none"; # NICs "/hardware/cards/nic/0/manufacturer" = "Intel Corporation"; "/hardware/cards/nic/0/model" = "82573E Gigabit Ethernet Controller (Copper)"; "/hardware/cards/nic/0/bus" = "pci"; "/hardware/cards/nic/0/location" = "/sys/devices/pci0000:00/0000:00:1c.4 /0000:04:00.0"; "/hardware/cards/nic/0/maxspeed" = 1024000; "/hardware/cards/nic/0/media" = "ethernet"; "/hardware/cards/nic/0/version" = "0.15-5";... # FCAs include { if (is_quattor_managed() ) 'hardware/monitoring/machines/‘ +value("/hardware/model") else null}; include { if (exists('hardware/blockdevices/'+value("/hardware/model"))) 'hardware/blockdevices/'+value("/hardware/model") };

11 Vpd CDB Template template vpd/lxdev38; # Machine serial number "/hardware/serialnumber" = "CR006025"; # MAC addresses "/hardware/cards/nic/0/hwaddr" = "00:30:48:xx:xx:xx"; "/hardware/cards/nic/1/hwaddr" = "00:30:48:xx:xx:xx"; "/hardware/cards/bmc/0/hwaddr" = "00:30:48:xx:xx:xx"; # RAM info "/hardware/ram/0/serialnumber" = "28126D5A"; "/hardware/ram/1/serialnumber" = "28126C39"; "/hardware/ram/2/serialnumber" = "28126D13"; "/hardware/ram/3/serialnumber" = "28126D15"; "/hardware/ram/4/serialnumber" = "28126C19"; "/hardware/ram/5/serialnumber" = "28126BD7"; "/hardware/ram/6/serialnumber" = "28126C32"; "/hardware/ram/7/serialnumber" = "28126BA5"; # Disk info "/hardware/cards/sata/_0/ports/_0/serialnumber" = "PVG904Z9T1DGVV"; "/hardware/cards/sata/_0/ports/_1/serialnumber" = "PVG904Z9SZSAXV";  Managed by the Sysadmins  Corrected when there is a “hwscan_wrong” alarm

12 Cluster CDB Template (an example) template cluster/lxplus/config; # Minimal configuration include {'site/cern_cc/configuration/'+ELFMS_OS+'/config'}; "/system/cluster/name" = ELFMS_SVCCLASS "/system/contract" = D; "/system/importance" = 30;  if > 50, piquet call "/system/service/infrastructure" = "plus";  link to SDB "/system/rootmail" = "sophie.lemaitre@cern.ch"; include { ‘cluster/lxplus/filesystem’ }; "/system/landbset/it_cc_lxplus/active" = true;  link to LANDB include { ‘services/afs_client/config’};... And much more Maintained by the Service Manager or VOC

13 Core CDB Templates Example : “ site/cern_cc/configuration/'+ELFMS_OS+'/config” Maintained by CF-ASI and a few IT Service Managers (mainly in PES-PS)

14 CDB template hierarchy (simplified) Functions Types tree includes creates profile_xxx “External” (normally autogenerated by SWRep) ‘restricted w access’ templates editable templates Object templates /repository/cern_x86_64_slc5.tpl /prod/site/cern_cc/configuration / /config.tpl /prod/cluster/ /config.tpl ELFMS_OS=“slc5|rhes5” ELFMS_ARCH=“x86_64|i386” ELFMS_SVCCLASS= ELFMS_RESOURCE= ELFMS_CUSTOMIZATION= /prod/customization/resource>/ /config.tpl /prod/os/ x86_64_slc5/rpms/defaults.tpl

15 Node Configuration Modules (NCM) List : ncm-ncd –list (node configuration deployer) Configure : ncm-ncd –conf grub [--noaction] Details: ncm-query –comp grub More: ncm-query –dump /system/monitoring Dependencies Logs: /var/log/ncm/ncd.log /var/log/ncm/component-.log Code: /usr/lib/perl/NCM/Components/*.pm

16 Quattor Configuration Modules Over 100 NCM configuration components are available: Configure basic Quattor and core system services – Quattor services: ccm, spma, cdp – System services: useraccess, accounts, cron, filecopy, grub, iptables, logrotate, mailaliases, netdriver, nfs, ntpd, portmap, profile, serialclient, smartd, ssh, sysctl Configure advanced system services – Including castor, chkconfig, fiberchannel, fmonagent, gdmconf, ipmi, lsfclient, named, quota, screensaver, sysacct

17 CDB “commits” CDB HLD Pan compilation GUI Scripts CLI XML SOAPSOAP CDB LLD CDB server CVS commit Dependency computation cvs CDB CVS udp notification Quattor- Managed client Atomic transaction Protection against concurrent modifications (CVS) Queuing (but no order) Fetch.xml profile

18 CDB Activity Christmas Holidays: Cron jobs (sw upgrades) Note: user jobs queue one after the other  apparent “commit” time may be long; includes also resources used for un-successful compilations Time for a full recompilation: ~10 minutes if no load on the server

19 Numbers >11000 “objects” are today in CDB ( object templates =>.xml profiles) – 75% are quattor-managed – 25% are not, but created for inventory purpose (see more later) Users: ~230 registered CDB users SLS: http://sls.cern.ch/sls/service.php?id=CDBhttp://sls.cern.ch/sls/service.php?id=CDB

20 SWrep Swrep (lxservb01) lxadm swrep-soap-client put swrep-soap-client list LinuxSoft AFS repository And EPEL repo PES/PS CDB server Weekly Cron job: sw upgrades (OSDATE) Lxservb05 Cron job: repository CDB tpl update Cron job: LS repository mirroring onto SWrep See also new system: https://twiki.cern.ch/twiki/bin/view/ELFms/OsUpdates Linux and EPEL YUM repo 1 2 3

21 SPMA ( Software Package Management Agent ) Run on the quattor-managed clients – Service manager notifies or runs “ spma_wrapper.sh ” Gets, from.xml profile, the list of packages to be managed ( /var/lib/spma- target.cf ) – “/software/packages” = pkg_add(“myrpm”,”version”,”arch”); – “/software/packages” = pkg_repl(“myrpm”,”version”,”arch”); – “/software/packages” = pkg_del(“myrpm”,”version”,”arch”); Adds, removes, upgrades packages accordingly Can be configured to not touch packages installed “by-hand” ( ncm- spma  /etc/spma.conf ) – “/software/components/spma/userpkgs” = “yes”; – “/software/components/spma/userprio” = “yes”; Gets packages from Swrep via http – Access restricted for RHES packages Monitored (“spma_error” exceptions) – Log: /var/log/spma.log Not recommended !

22 Quattor: Basics CDB SWREP 1. Get latest.xml profile from CDB (“ccm-fetch”) 2. Read.xml profile and get software packages from Swrep (“SPMA”) 3. Read.xml profile and configure services (“NCM”) Quattor-managed node ex: lxplus310 $ ccm-fetch $ spma_wrapper.sh $ ncm_wrapper.sh

23 Notifications: 2 (3) ways CDB server (“cdbserv”) Quattor- Managed client triggers “ccm-fetch” (we also have a cron job) Notification server (“notdserver”) lxadm nc-client (reduced access) triggers what is configured in /etc/not.d cdispd 1 2 3 Run ncm component by hand on the client

24 “cdb2sql” CDB server (“cdbserv”) lxservb06 After each commit: udp packet from CDB to lxservb06 triggers “cdb2sql” cdb@itcore CDBSQL.xml profile timestamp comparison, Data comparison, DB update Software packages and monitoring configuration skipped. Data update, and views update via triggers

25 CDBSQL use SINDES SMS ITCM (Remedy) LEMONLAS Console config XML cdb@itcore CDBSQL Other service config (ex. LSF) CLUMAN PrepareInstall CDB LEAF WasshNOTD SDB HMS(Remedy) AIMS CDB HLD CDBDump, CDBHosts

26 CDBSQL An SQL interface provides a simple query access We can ask about properties spanning across machines We can run SQL queries (SELECT) based on predefined views: – “give me all machines with more than 2GB of memory” – “give me all machines that belong to lxplus” – “give me all machines that have this head node” Examples: – “CDBHosts –cl lxplus” – “CDBHosts –cl all –q “hostname=‘lxb0501’” – “CDBHosts -cl all -d "clustername" -view vwcluster “ This document explains all the existing views This document A web-based frontend exists as well for the most common queries: https://cdbweb.cern.ch/ https://cdbweb.cern.ch/

27 CDB as Inventory DB CDB contains also profiles for Inventory purpose only: – Non-quattor-managed objects, but info needed for: Power consumption Location Vendor calls Ping monitoring (for windows machines) – Examples: air cooled rack, water cooled rack, switch, router, rps, power distribution, kvm, disk array, enclosure, server, rms, pdu, ups, temperature sensor, ventilation unit, brush, filler panel, kvm switch, robot, fibre channel switch Work in progress

28 HMS  LEAF  CDB Install Move Repair Retire Used by Sysadmins only

29 LEAF LEAFAddHost, LEAFGenerateNetinfo LEAFMoveHost LEAFRenameHost LEAFRetireHost Create/update CDB templates via the CDB SOAP interface Used by HMS (remedy) – Plus: LANDBAddHost,... Available on lxadm Use “—help” for details about usage

30 SMS (“State Management System”) STATEEFFECT MaintenanceNot alarmed on LAS No login (according to configuration) Not selected for DNS load-balanced alias StandbyAlarmed No login (according to configuration) Not selected for DNS load-balanced alias ProductionAlarmed Login DNS load-balanced alias candidate UnmanagedNot managed with SMS Stack of states on top of a “default state” defined in CDB:  sms set  sms clear sms showstack sms history sms get

31 SMS (“State Management System”) lxadm cdb@itcore CDBSQL 1. Sms set maintenance “kernel upgrade” Quattor- Managed client Notification server (“notdserver”) 4. Sms-set-state  SetToDesiredState CDB server (“cdbserv”) 2. SMS state check & update 3. Client notification

32 Useful tools Available on – lxadm – lxvoadm (VOCs), lxtnadm (technical network) CommandPurpose wasshRun command on nodes or cluster cdbopCDB command line CDBDump, CDBHostsGet information about node and/or cluster PrepareInstallReinstall node(s) LEAFRenameHost, LEAFUpdateNetInfo, etc. Create or modify info in CDB smsSet state (production, maintenance, standy) nc-clientNotify nodes for SPMA or NCM hwvpd2cdbUpdate VPD info in CDB swrep-soap-clientUpload RPMs into SWREP

33 Useful tools On a given node CommandPurpose ccm-fetchFetch node configuration from CDB spmaInstall RPMs as defined in CDB ncm-ncdConfigure components ncm-queryCheck NCM configuration

34 Need help? What you can do – Search on TwikiTwiki – Ask your colleagues – Subscribe to Forum project-elfms@cern.chproject-elfms@cern.ch Announcements for Service Managers cern-quattor-announce@cern.ch cern-quattor-announce@cern.ch Ask ELFms support – Open a ticket using the Service PortalService Portal – For access requests, etc. – For general questions


Download ppt "Quattor and ELFms An introduction for the new Sysadmins (and others) Sophie Lemaitre Véronique Lefébure April 2011 CF-ASI."

Similar presentations


Ads by Google