Grid Management Challenge - M. Jouvin Grid Site Setup Michel Jouvin LAL, Orsay jouvin@lal.in2p3.fr http://grif.fr Grid Administration Training LAL, Orsay, September 2008, 15-19 Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin Agenda General site parameters Site BDII CE WNs SE Grid Management Challenge - M. Jouvin 06/04/2019
General Site Parameters Main documentation for configuring site parameters is available at https://trac.lal.in2p3.fr/LCGQWG/wiki/Doc/gLite/Templat eCustomization Edit template defining site network parameters cfg/sites/tutorial/site/pro_site_global_variables.tpl Edit template defining machine IP addresses and HW cfg/sites/tutorial/site/pro_site_databases.tpl Edit template defining per machine OS version cfg/sites/tutorial/site/pro_os_version.tpl Edit gLite site parameters (review defaults) cfg/sites/tutorial/site/glite/config.tpl Create cluster glite-3.1 from examples, review cluster specific templates cfg/clusters/glite-3.1/site/pro_site_cluster_info.tpl : cluster defaults Grid Management Challenge - M. Jouvin 06/04/2019
Site BDII Configuration Create a HW template for BDII box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid281.lal.in2p3.fr.tpl Start with a similar profile from example cluster Compile and deploy Don’t forget svn commit Configure initial installation of the machine aii-shellfe –configure grid281.lal.in2p3.fr aii-shellfe –install grid281.lal.in2p3.fr Start grid281… Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin BDII Management Logs /opt/bdii/var/bdii.log /opt/bdii/var/tmp/* Restart : service bdii restart Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin CE Configuration Create a HW template for CE box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid282.lal.in2p3.fr.tpl Start with a similar profile from example cluster Define MAUI configuration (if using it) and check gLite site parameters cfg/sites/tutorial/site/glite/maui.tpl Choose between shared and non shared home dirs Default is shared home directories No user account other than those related to grid By default account are locked (no interactive access) Compile and deploy Configure initial installation of the machine and start grid282… Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin CE Management Main Torque/MAUI commands: List of jobs : showq (MAUI) List of WNs (-n) and queues (-c)… : diagnose (MAUI) Detailed information about a job (MAUI) : checkjob jobid Detailed information about a job (Torque) : qstat –f jobid Main log files /var/log/globus-gatekeeper.log /var/log/globus*marshall.log Torque/MAUI configuration and logs Torque : /var/spool/pbs MAUI : /var/spool/maui + /var/log/maui.log MAUI default configuration : Fairshare : take into account VO usage in the previous days 2 job slots per CPU : 1 dedicated to dteam/ops (tests) et short deadline jobs Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin WN Configuration Create a HW template for WN box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid283.lal.in2p3.fr.tpl Start with a similar profile from example cluster Update WN list in gLite site parameters WORKER_NODES and WN_CPUS in cfg/sites/tutorial/site/glite/config.tpl Compile and deploy Don’t forget svn commit Configure initial installation of the machine aii-shellfe –configure grid283.lal.in2p3.fr aii-shellfe –install grid283.lal.in2p3.fr Start grid283… Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin WN Management No daemon, no logs Except Torque client (pbs_mom) but never a problem… Source of information is job stdout/stderr Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin User Interface (UI) Same configuration procedure… Use 1 account per user It is not possible to share .globus Never share a certificate Unlock user accounts created by Quattor if you want to be able to log in No daemon, no logs With 64-bit OS, edit 4 scripts replacing ‘python2’ by ‘python’ /opt/glite/bin/glite-wms-jobs-xxx Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin SE Configuration Create a HW template for SE DPM box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid284.lal.in2p3.fr.tpl Start with a similar profile from example cluster Configuration : cfg/sites/tutorial/site/dpm/config.tpl May define a specific VO list (different from CE) in a template referred by NODE_VO_CONFIG variable Compile and deploy Don’t forget svn commit Configure initial installation of the machine aii-shellfe –configure grid284.lal.in2p3.fr aii-shellfe –install grid284.lal.in2p3.fr Start grid280… Grid Management Challenge - M. Jouvin 06/04/2019
Grid Management Challenge - M. Jouvin SE Management Commands to display configuration dpm-qryconf: legacy command, doesn’t display everything dpm-listspaces: new command more friendly Configuration commands : dpm-xxx : modify pools, file systems configuration Requires environment varialbe DPM_HOST=dpm_host_name dpns-xxx : management of DPM « namespace » DPM ls, rm… Very few reasons to use these commands Requires environment variable DPNS_HOST=dpm_host_name Logs : 1 file per daemon (6) /var/log/dpm/log : physical operation (main log file) /var/log/srmv1|v2.2/log : SE access (through SRM) /var/log/dpns/log : namespace operations /var/log/rfio/log et /var/log/messages : RFIO + gridftp (transfers) Grid Management Challenge - M. Jouvin 06/04/2019
Checking Quattor Changes Before deployment cp build build.saved in SCDB working copy Compile changes src/utils/profiles/compare_xml [-v] After deployment: Quattor client logs /var/log/ncm-cdispd.log In case of SPMA errors : /var/log/spma.log If nothing happened, troubleshoot SCDB hook script : https://trac.lal.in2p3.fr/LCGQWG/wiki/Doc/SCDB/Server Running a component manually: ncm-ncd --configure [component…|-all] Checking client configuration ncm-query --component component Grid Management Challenge - M. Jouvin 06/04/2019