Grid Management Challenge - M. Jouvin

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

Geoff Quigley, Stephen Childs and Brian Coghlan Trinity College Dublin
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATOR E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
1 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, The EPIKH Project (Exchange Programme.
26/4/2001VMware - HEPix - LAL 2001 Windows/Linux Coexistence : VMware Approach HEPix – LAL Apr Michel Jouvin
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Workload Management System + Logging&Bookkeeping Installation.
INFSO-RI Enabling Grids for E-sciencE Installation and configuration of gLite Resource Broker Emidio Giorgio INFN EGEE-EMBRACE tutorial,
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
9th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
INFSO-RI Enabling Grids for E-sciencE SCDB C. Loomis / Michel Jouvin (LAL-Orsay) Quattor Tutorial LCG T2 Workshop June 16, 2006.
Ariel Garcia LCG cluster installation, EGEE training, Ariel Garcia - IWR LCG Cluster Installation Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America SRM + gLite IO Server install Emidio Giorgio.
EGEE-II INFSO-RI Enabling Grids for E-sciencE YAIM Overview MiMOS Grid tutorial HungChe, ASGC OPS Team.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
23 January 2007WLCG workshop, CERN System Management Working Group Alessandra Forti WLCG workshop CERN, 23 January 2007.
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 in DPM Sophie Lemaitre Jean-Philippe.
CERN Running a LCG-2 Site – Oxford July - 1 LCG2 Administrator’s Course Oxford University, 19 th – 21 st July Developed.
TP: Grid site installation BEINGRID site installation.
User Interface UI TP: UI User Interface installation & configuration.
EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.
II EGEE conference Den Haag November, ROC-CIC status in Italy
RI EGI-TF 2010, Tutorial Managing an EGEE/EGI Virtual Organisation (VO) with EDGES bridged Desktop Resources Tutorial Robert Lovas, MTA SZTAKI.
First South Africa Grid Training Installation and configuration of BDII Gianni M. Ricciardi Consorzio COMETA First South Africa Grid Training Catania,
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
SCDB Update Michel Jouvin LAL, Orsay March 17, 2010 Quattor Workshop, Thessaloniki.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Worker Node & Torque Client Installation.
INFN-T1 migration to scdb Andrea Chierici 8 th Quattor Workshop Bruxelles.
Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009
Jean-Philippe Baud, IT-GD, CERN November 2007
WLCG IPv6 deployment strategy
Gri2Win: Porting gLite to run under Windows XP Platform
The EDG Testbed Deployment Details
IBCP - CNRS STATUS Christelle Eloto Lyon - France
Classic Storage Element
NA4/medical imaging. Medical Data Manager Installation
Security aspects of the CREAM-CE
Andreas Unterkircher CERN Grid Deployment
Installation and configuration of a top BDII
lcg-infosites documentation (v2.1, LCG2.3.1) 10/03/05
Grid2Win: Porting of gLite middleware to Windows XP platform
DPM Python tools Andrea Manzi CERN DPM Workshop 07th December 2015.
Quattor Usage at Nikhef
Short update on the latest gLite status
Quality Control in the dCache team.
Artem Trunov and EKP team EPK – Uni Karlsruhe
Artem Trunov, Günter Quast EKP – Uni Karlsruhe
Gri2Win: Porting gLite to run under Windows XP Platform
Bomgar Remote support software
HTCondor Command Line Monitoring Tool
Quattor : Installation and Configuration Management
Grid Security M. Jouvin / C. Loomis (LAL-Orsay)
Configuration Of A Pull Network.
Quattor Advanced Tutorial, LAL
WMS+LB Server Installation and Configuration
EGEE Operation Tools and Procedures
Welcome to Training at LAL
Information System (BDII)
GRIF : an EGEE site in Paris Region
Site availability Dec. 19 th 2006
Installation/Configuration
Job Submission M. Jouvin (LAL-Orsay)
Pete Gronbech, Kashif Mohammad and Vipul Davda
Presentation transcript:

Grid Management Challenge - M. Jouvin Grid Site Setup Michel Jouvin LAL, Orsay jouvin@lal.in2p3.fr http://grif.fr Grid Administration Training LAL, Orsay, September 2008, 15-19 Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin Agenda General site parameters Site BDII CE WNs SE Grid Management Challenge - M. Jouvin 06/04/2019

General Site Parameters Main documentation for configuring site parameters is available at https://trac.lal.in2p3.fr/LCGQWG/wiki/Doc/gLite/Templat eCustomization Edit template defining site network parameters cfg/sites/tutorial/site/pro_site_global_variables.tpl Edit template defining machine IP addresses and HW cfg/sites/tutorial/site/pro_site_databases.tpl Edit template defining per machine OS version cfg/sites/tutorial/site/pro_os_version.tpl Edit gLite site parameters (review defaults) cfg/sites/tutorial/site/glite/config.tpl Create cluster glite-3.1 from examples, review cluster specific templates cfg/clusters/glite-3.1/site/pro_site_cluster_info.tpl : cluster defaults Grid Management Challenge - M. Jouvin 06/04/2019

Site BDII Configuration Create a HW template for BDII box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid281.lal.in2p3.fr.tpl Start with a similar profile from example cluster Compile and deploy Don’t forget svn commit Configure initial installation of the machine aii-shellfe –configure grid281.lal.in2p3.fr aii-shellfe –install grid281.lal.in2p3.fr Start grid281… Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin BDII Management Logs /opt/bdii/var/bdii.log /opt/bdii/var/tmp/* Restart : service bdii restart Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin CE Configuration Create a HW template for CE box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid282.lal.in2p3.fr.tpl Start with a similar profile from example cluster Define MAUI configuration (if using it) and check gLite site parameters cfg/sites/tutorial/site/glite/maui.tpl Choose between shared and non shared home dirs Default is shared home directories No user account other than those related to grid By default account are locked (no interactive access) Compile and deploy Configure initial installation of the machine and start grid282… Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin CE Management Main Torque/MAUI commands: List of jobs : showq (MAUI) List of WNs (-n) and queues (-c)… : diagnose (MAUI) Detailed information about a job (MAUI) : checkjob jobid Detailed information about a job (Torque) : qstat –f jobid Main log files /var/log/globus-gatekeeper.log /var/log/globus*marshall.log Torque/MAUI configuration and logs Torque : /var/spool/pbs MAUI : /var/spool/maui + /var/log/maui.log MAUI default configuration : Fairshare : take into account VO usage in the previous days 2 job slots per CPU : 1 dedicated to dteam/ops (tests) et short deadline jobs Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin WN Configuration Create a HW template for WN box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid283.lal.in2p3.fr.tpl Start with a similar profile from example cluster Update WN list in gLite site parameters WORKER_NODES and WN_CPUS in cfg/sites/tutorial/site/glite/config.tpl Compile and deploy Don’t forget svn commit Configure initial installation of the machine aii-shellfe –configure grid283.lal.in2p3.fr aii-shellfe –install grid283.lal.in2p3.fr Start grid283… Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin WN Management No daemon, no logs Except Torque client (pbs_mom) but never a problem… Source of information is job stdout/stderr Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin User Interface (UI) Same configuration procedure… Use 1 account per user It is not possible to share .globus Never share a certificate Unlock user accounts created by Quattor if you want to be able to log in No daemon, no logs With 64-bit OS, edit 4 scripts replacing ‘python2’ by ‘python’ /opt/glite/bin/glite-wms-jobs-xxx Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin SE Configuration Create a HW template for SE DPM box cfg/sites/tutorial/hardware/machine/… Create a profile in cluster glite-3.1 cfg/clusters/glite-3.1/profiles/grid284.lal.in2p3.fr.tpl Start with a similar profile from example cluster Configuration : cfg/sites/tutorial/site/dpm/config.tpl May define a specific VO list (different from CE) in a template referred by NODE_VO_CONFIG variable Compile and deploy Don’t forget svn commit Configure initial installation of the machine aii-shellfe –configure grid284.lal.in2p3.fr aii-shellfe –install grid284.lal.in2p3.fr Start grid280… Grid Management Challenge - M. Jouvin 06/04/2019

Grid Management Challenge - M. Jouvin SE Management Commands to display configuration dpm-qryconf: legacy command, doesn’t display everything dpm-listspaces: new command more friendly Configuration commands : dpm-xxx : modify pools, file systems configuration Requires environment varialbe DPM_HOST=dpm_host_name dpns-xxx : management of DPM « namespace » DPM ls, rm… Very few reasons to use these commands Requires environment variable DPNS_HOST=dpm_host_name Logs : 1 file per daemon (6) /var/log/dpm/log : physical operation (main log file) /var/log/srmv1|v2.2/log : SE access (through SRM) /var/log/dpns/log : namespace operations /var/log/rfio/log et /var/log/messages : RFIO + gridftp (transfers) Grid Management Challenge - M. Jouvin 06/04/2019

Checking Quattor Changes Before deployment cp build build.saved in SCDB working copy Compile changes src/utils/profiles/compare_xml [-v] After deployment: Quattor client logs /var/log/ncm-cdispd.log In case of SPMA errors : /var/log/spma.log If nothing happened, troubleshoot SCDB hook script : https://trac.lal.in2p3.fr/LCGQWG/wiki/Doc/SCDB/Server Running a component manually: ncm-ncd --configure [component…|-all] Checking client configuration ncm-query --component component Grid Management Challenge - M. Jouvin 06/04/2019