Download presentation
Presentation is loading. Please wait.
Published byUrsula Wheeler Modified over 8 years ago
1
Quattor and ELFms An introduction for the new Sysadmins (and others) Sophie Lemaitre Véronique Lefébure April 2011 CF-ASI
2
ELFms Quattor: CDB, SWREP, NCM PrepareInstall SINDES SMS LEMON/LAS Notd, wassh LEAF, HMS SLS, SDB SLSSDB CLUMAN (in production since September 2010) CLUMAN
3
See http://quattor.orghttp://quattor.org Developed 2002-3: Grid project Now, maintained by the “Quattor Community” 2 workshops per year quattor-discuss@lists.sourceforge.net Code in SourceForge
4
Quattor – Set of tools to manage large clusters – Configuration in templates – Using Pan language From Wikipedia – “Quattor is a large scale fabric management system for managing medium to very large clusters. It is largely similar to LCFG and is particularly well suited to Grid computing sites. Quattor was developed at CERN for use in the LCG project. Quattor is used by many Grid sites around the world today. A [...]site will use profiles to configure machines (virtual and physical) using a language called PAN, this is then compiled using panc [...], and creates XML profiles, which the NCM (node configuration manager) on each machine uses to configure the machine. Typical configuration includes software package installation and machine settings (grub for instance).”
5
1.CDB: Configuration Database – Based on CVS – SCDB exists, based on SVN, does not scale for the number of users we have at CERN, but is used by nearly all other Quattor sites 2.SWREP: Software Repository – SPMA is the link between CDB and Swrep 3.NCM: Node Configuration Modules (or “components”) – Modules (scripts) that apply the configuration onto the client nodes
6
CDB: Configuration Database Host per host configuration, including: – List of RPM packages – Service configurations: eg. grub, chkconfig, network... ~Everything – Monitoring Configuration (LEMON) Hardware monitoring (ex: disk failures) Software monitoring (ex.: daemon restart) – Access control configuration Interactive, sudo, root access Every host has its own.xml profile Size: lxplus: 400kB, lxbatch: 620kB
7
CDB: Configuration Database Common configuration bits usable by many hosts: Inheritance and Hierarchy of “templates” Templates are edited by – Service managers, via “cdbop” (from lxadm or lxvoadm) – Automated scripts, eg: HMS LEAF scripts Declarative Language: PAN – Includes field validations From PAN template to.xml: PANC compiler Stages possible: “prod”, “preprod”, “test”, ”usertest”
8
CERN CC Typical Host Profile Template As created by HMS LEAFAddHost: – See for ex. http://tpl-viewer.cern.ch/cdb-tpl-view/tpl_view.php?profile=profiles/profile_lxplus310http://tpl-viewer.cern.ch/cdb-tpl-view/tpl_view.php?profile=profiles/profile_lxplus310 object template profile_lxplus310; include { 'stages/preprod' }; stage (prod/preprod/test/usertest) variable ELFMS_OS = "slc5"; variable ELFMS_ARCH = "x86_64"; variable ELFMS_SVCCLASS = "lxplus"; variable ELFMS_RESOURCE = "c3"; variable ELFMS_CUSTOMIZATION = "lxplusSLC5"; include { 'quattor/profile_declarations' }; include { 'netinfo/lxplus310' }; generated from LANDB data include { 'hardware/machines/e4_08_60' }; everything about the hw,incl. monitoring include { 'vpd/lxplus310' }; serial numbers, mac addresses,... include { 'serial_map_lxc1rk17' }; console configuration include { 'cluster/'+ELFMS_SVCCLASS+'/config' }; CLUSTER configuration include { if (exists('customization/'+ELFMS_RESOURCE+'/'+ELFMS_CUSTOMIZATION+'/config')) 'customization/'+ELFMS_RESOURCE+'/'+ELFMS_CUSTOMIZATION+'/config' }; SUBCLUSTER config if any # machine moved on 30.04.09 from = "sp15"; "/system/oldnames/1/" = nlist("name", "lxbsp1535", "date", "30.04.09" ); "/hardware/rack/name" = "rk17"; rack location “/hardware/u_position” = 10;stages/preprodquattor/profile_declarationsnetinfo/lxplus310hardware/machines/e4_08_60vpd/lxplus310serial_map_lxc1rk17cluster/'+ELFMS_SVCCLASS+'/configcustomization/'+ELFMS_RESOURCE+'/'+ELFMS_CUSTOMIZATION+'/config # MUST BE LAST LINE: include { if(exists("/software/repositories")) 'quattor/repository_cleanup' };quattor/repository_cleanup
9
Netinfo CDB Template template netinfo/cdbserv01; "/system/network/hostname" = "cdbserv01"; "/system/network/domainname" = "cern.ch"; "/system/responsible/forename" = "E-GROUP"; "/system/responsible/familyname" = "IT-DEP-FIO-SMOD"; "/hardware/location" = "0513 S-0034"; "/system/network/interfaces/eth0/ip" = "137.138.4.169"; "/system/network/interfaces/eth0/gateway" = "137.138.1.1"; "/system/network/interfaces/eth0/netmask" = "255.255.0.0"; "/system/network/interfaces/eth0/switchmedium" = 1024000; Data taken from LANDB with “LEAFGenerateNetinfo “LANDB
10
Hardware CDB Template template hardware/machines/e4_08_60; include { 'hardware/vendors/e4' }; "/hardware/model" = "e4_08_60"; "/hardware/contract/warrantyid" = "cd1001333"; Link to CDBW "/hardware/mperf" = 11318.0; "/hardware/hepspec06" = 70.17; # BIOS "/hardware/bios/vendor" = "Phoenix Technologies LTD"; "/hardware/bios/version" = "1.1a (04/09/2009)"; # BMC "/hardware/cards/bmc/0/manufacturer" = "Super Micro"; "/hardware/cards/bmc/0/name" = "AOC-IPMI20-E"; "/hardware/cards/bmc/0/type" = "2.0"; "/hardware/cards/bmc/0/version" = "1.60.0 Feb-13-2009-16-00-NonKVM"; # CPUs "/hardware/cpu/0/vendor" = "GenuineIntel"; "/hardware/cpu/0/model" = "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz"; "/hardware/cpu/0/speed" = 2500; "/hardware/cpu/0/cores" = 4;... # RAM "/hardware/ram/0/vendor" = "Samsung"; "/hardware/ram/0/model" = "M3 93T5160QZA-CE6"; "/hardware/ram/0/size" = 4096; "/hardware/ram/0/type" = "DDR2"; "/hardware/ram/0/data_rate" = "667"; "/hardware/ram/0/location" = "CH0_DIMM0";... Managed by the Procurement team, Maintained by sysadmins # ATA "/hardware/cards/sata/_0/manufacturer" = "Intel Corporation"; "/hardware/cards/sata/_0/model" = "82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller"; "/hardware/cards/sata/_0/location" = "/sys/devices/pci0000:00/0000:00:1f.2"; "/hardware/cards/sata/_0/numberports" = 2; "/hardware/cards/sata/_0/ports/_0/manufacturer" = "Hitachi"; "/hardware/cards/sata/_0/ports/_0/model" = "HDP725025GLA380"; "/hardware/cards/sata/_0/ports/_0/capacity" = 238475; "/hardware/cards/sata/_0/ports/_0/version" = "GM2OA5TA"; "/system/blockdevices/physical_devs/sda/device_path" = "sata/_0/ports/_0"; "/system/blockdevices/physical_devs/sda/label" = "none"; "/hardware/cards/sata/_0/ports/_1/manufacturer" = "Hitachi"; "/hardware/cards/sata/_0/ports/_1/model" = "HDP725025GLA380"; "/hardware/cards/sata/_0/ports/_1/capacity" = 238475; "/hardware/cards/sata/_0/ports/_1/version" = "GM2OA5TA"; "/system/blockdevices/physical_devs/sdb/device_path" = "sata/_0/ports/_1"; "/system/blockdevices/physical_devs/sdb/label" = "none"; # NICs "/hardware/cards/nic/0/manufacturer" = "Intel Corporation"; "/hardware/cards/nic/0/model" = "82573E Gigabit Ethernet Controller (Copper)"; "/hardware/cards/nic/0/bus" = "pci"; "/hardware/cards/nic/0/location" = "/sys/devices/pci0000:00/0000:00:1c.4 /0000:04:00.0"; "/hardware/cards/nic/0/maxspeed" = 1024000; "/hardware/cards/nic/0/media" = "ethernet"; "/hardware/cards/nic/0/version" = "0.15-5";... # FCAs include { if (is_quattor_managed() ) 'hardware/monitoring/machines/‘ +value("/hardware/model") else null}; include { if (exists('hardware/blockdevices/'+value("/hardware/model"))) 'hardware/blockdevices/'+value("/hardware/model") };
11
Vpd CDB Template template vpd/lxdev38; # Machine serial number "/hardware/serialnumber" = "CR006025"; # MAC addresses "/hardware/cards/nic/0/hwaddr" = "00:30:48:xx:xx:xx"; "/hardware/cards/nic/1/hwaddr" = "00:30:48:xx:xx:xx"; "/hardware/cards/bmc/0/hwaddr" = "00:30:48:xx:xx:xx"; # RAM info "/hardware/ram/0/serialnumber" = "28126D5A"; "/hardware/ram/1/serialnumber" = "28126C39"; "/hardware/ram/2/serialnumber" = "28126D13"; "/hardware/ram/3/serialnumber" = "28126D15"; "/hardware/ram/4/serialnumber" = "28126C19"; "/hardware/ram/5/serialnumber" = "28126BD7"; "/hardware/ram/6/serialnumber" = "28126C32"; "/hardware/ram/7/serialnumber" = "28126BA5"; # Disk info "/hardware/cards/sata/_0/ports/_0/serialnumber" = "PVG904Z9T1DGVV"; "/hardware/cards/sata/_0/ports/_1/serialnumber" = "PVG904Z9SZSAXV"; Managed by the Sysadmins Corrected when there is a “hwscan_wrong” alarm
12
Cluster CDB Template (an example) template cluster/lxplus/config; # Minimal configuration include {'site/cern_cc/configuration/'+ELFMS_OS+'/config'}; "/system/cluster/name" = ELFMS_SVCCLASS "/system/contract" = D; "/system/importance" = 30; if > 50, piquet call "/system/service/infrastructure" = "plus"; link to SDB "/system/rootmail" = "sophie.lemaitre@cern.ch"; include { ‘cluster/lxplus/filesystem’ }; "/system/landbset/it_cc_lxplus/active" = true; link to LANDB include { ‘services/afs_client/config’};... And much more Maintained by the Service Manager or VOC
13
Core CDB Templates Example : “ site/cern_cc/configuration/'+ELFMS_OS+'/config” Maintained by CF-ASI and a few IT Service Managers (mainly in PES-PS)
14
CDB template hierarchy (simplified) Functions Types tree includes creates profile_xxx “External” (normally autogenerated by SWRep) ‘restricted w access’ templates editable templates Object templates /repository/cern_x86_64_slc5.tpl /prod/site/cern_cc/configuration / /config.tpl /prod/cluster/ /config.tpl ELFMS_OS=“slc5|rhes5” ELFMS_ARCH=“x86_64|i386” ELFMS_SVCCLASS= ELFMS_RESOURCE= ELFMS_CUSTOMIZATION= /prod/customization/resource>/ /config.tpl /prod/os/ x86_64_slc5/rpms/defaults.tpl
15
Node Configuration Modules (NCM) List : ncm-ncd –list (node configuration deployer) Configure : ncm-ncd –conf grub [--noaction] Details: ncm-query –comp grub More: ncm-query –dump /system/monitoring Dependencies Logs: /var/log/ncm/ncd.log /var/log/ncm/component-.log Code: /usr/lib/perl/NCM/Components/*.pm
16
Quattor Configuration Modules Over 100 NCM configuration components are available: Configure basic Quattor and core system services – Quattor services: ccm, spma, cdp – System services: useraccess, accounts, cron, filecopy, grub, iptables, logrotate, mailaliases, netdriver, nfs, ntpd, portmap, profile, serialclient, smartd, ssh, sysctl Configure advanced system services – Including castor, chkconfig, fiberchannel, fmonagent, gdmconf, ipmi, lsfclient, named, quota, screensaver, sysacct
17
CDB “commits” CDB HLD Pan compilation GUI Scripts CLI XML SOAPSOAP CDB LLD CDB server CVS commit Dependency computation cvs CDB CVS udp notification Quattor- Managed client Atomic transaction Protection against concurrent modifications (CVS) Queuing (but no order) Fetch.xml profile
18
CDB Activity Christmas Holidays: Cron jobs (sw upgrades) Note: user jobs queue one after the other apparent “commit” time may be long; includes also resources used for un-successful compilations Time for a full recompilation: ~10 minutes if no load on the server
19
Numbers >11000 “objects” are today in CDB ( object templates =>.xml profiles) – 75% are quattor-managed – 25% are not, but created for inventory purpose (see more later) Users: ~230 registered CDB users SLS: http://sls.cern.ch/sls/service.php?id=CDBhttp://sls.cern.ch/sls/service.php?id=CDB
20
SWrep Swrep (lxservb01) lxadm swrep-soap-client put swrep-soap-client list LinuxSoft AFS repository And EPEL repo PES/PS CDB server Weekly Cron job: sw upgrades (OSDATE) Lxservb05 Cron job: repository CDB tpl update Cron job: LS repository mirroring onto SWrep See also new system: https://twiki.cern.ch/twiki/bin/view/ELFms/OsUpdates Linux and EPEL YUM repo 1 2 3
21
SPMA ( Software Package Management Agent ) Run on the quattor-managed clients – Service manager notifies or runs “ spma_wrapper.sh ” Gets, from.xml profile, the list of packages to be managed ( /var/lib/spma- target.cf ) – “/software/packages” = pkg_add(“myrpm”,”version”,”arch”); – “/software/packages” = pkg_repl(“myrpm”,”version”,”arch”); – “/software/packages” = pkg_del(“myrpm”,”version”,”arch”); Adds, removes, upgrades packages accordingly Can be configured to not touch packages installed “by-hand” ( ncm- spma /etc/spma.conf ) – “/software/components/spma/userpkgs” = “yes”; – “/software/components/spma/userprio” = “yes”; Gets packages from Swrep via http – Access restricted for RHES packages Monitored (“spma_error” exceptions) – Log: /var/log/spma.log Not recommended !
22
Quattor: Basics CDB SWREP 1. Get latest.xml profile from CDB (“ccm-fetch”) 2. Read.xml profile and get software packages from Swrep (“SPMA”) 3. Read.xml profile and configure services (“NCM”) Quattor-managed node ex: lxplus310 $ ccm-fetch $ spma_wrapper.sh $ ncm_wrapper.sh
23
Notifications: 2 (3) ways CDB server (“cdbserv”) Quattor- Managed client triggers “ccm-fetch” (we also have a cron job) Notification server (“notdserver”) lxadm nc-client (reduced access) triggers what is configured in /etc/not.d cdispd 1 2 3 Run ncm component by hand on the client
24
“cdb2sql” CDB server (“cdbserv”) lxservb06 After each commit: udp packet from CDB to lxservb06 triggers “cdb2sql” cdb@itcore CDBSQL.xml profile timestamp comparison, Data comparison, DB update Software packages and monitoring configuration skipped. Data update, and views update via triggers
25
CDBSQL use SINDES SMS ITCM (Remedy) LEMONLAS Console config XML cdb@itcore CDBSQL Other service config (ex. LSF) CLUMAN PrepareInstall CDB LEAF WasshNOTD SDB HMS(Remedy) AIMS CDB HLD CDBDump, CDBHosts
26
CDBSQL An SQL interface provides a simple query access We can ask about properties spanning across machines We can run SQL queries (SELECT) based on predefined views: – “give me all machines with more than 2GB of memory” – “give me all machines that belong to lxplus” – “give me all machines that have this head node” Examples: – “CDBHosts –cl lxplus” – “CDBHosts –cl all –q “hostname=‘lxb0501’” – “CDBHosts -cl all -d "clustername" -view vwcluster “ This document explains all the existing views This document A web-based frontend exists as well for the most common queries: https://cdbweb.cern.ch/ https://cdbweb.cern.ch/
27
CDB as Inventory DB CDB contains also profiles for Inventory purpose only: – Non-quattor-managed objects, but info needed for: Power consumption Location Vendor calls Ping monitoring (for windows machines) – Examples: air cooled rack, water cooled rack, switch, router, rps, power distribution, kvm, disk array, enclosure, server, rms, pdu, ups, temperature sensor, ventilation unit, brush, filler panel, kvm switch, robot, fibre channel switch Work in progress
28
HMS LEAF CDB Install Move Repair Retire Used by Sysadmins only
29
LEAF LEAFAddHost, LEAFGenerateNetinfo LEAFMoveHost LEAFRenameHost LEAFRetireHost Create/update CDB templates via the CDB SOAP interface Used by HMS (remedy) – Plus: LANDBAddHost,... Available on lxadm Use “—help” for details about usage
30
SMS (“State Management System”) STATEEFFECT MaintenanceNot alarmed on LAS No login (according to configuration) Not selected for DNS load-balanced alias StandbyAlarmed No login (according to configuration) Not selected for DNS load-balanced alias ProductionAlarmed Login DNS load-balanced alias candidate UnmanagedNot managed with SMS Stack of states on top of a “default state” defined in CDB: sms set sms clear sms showstack sms history sms get
31
SMS (“State Management System”) lxadm cdb@itcore CDBSQL 1. Sms set maintenance “kernel upgrade” Quattor- Managed client Notification server (“notdserver”) 4. Sms-set-state SetToDesiredState CDB server (“cdbserv”) 2. SMS state check & update 3. Client notification
32
Useful tools Available on – lxadm – lxvoadm (VOCs), lxtnadm (technical network) CommandPurpose wasshRun command on nodes or cluster cdbopCDB command line CDBDump, CDBHostsGet information about node and/or cluster PrepareInstallReinstall node(s) LEAFRenameHost, LEAFUpdateNetInfo, etc. Create or modify info in CDB smsSet state (production, maintenance, standy) nc-clientNotify nodes for SPMA or NCM hwvpd2cdbUpdate VPD info in CDB swrep-soap-clientUpload RPMs into SWREP
33
Useful tools On a given node CommandPurpose ccm-fetchFetch node configuration from CDB spmaInstall RPMs as defined in CDB ncm-ncdConfigure components ncm-queryCheck NCM configuration
34
Need help? What you can do – Search on TwikiTwiki – Ask your colleagues – Subscribe to Forum project-elfms@cern.chproject-elfms@cern.ch Announcements for Service Managers cern-quattor-announce@cern.ch cern-quattor-announce@cern.ch Ask ELFms support – Open a ticket using the Service PortalService Portal – For access requests, etc. – For general questions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.