South African Grid Training WORKER NODE Albert van Eck UFS - ICTS 17 November 2009 Slides by GIUSEPPE PLATANIA
18 Nov – Cape Town South African Grid Training 2 OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING
18 Nov – Cape Town South African Grid Training 3 OVERVIEW The Worker Node is a service where the jobs run. Its main function is to: – execute the jobs – update the status of the jobs to the Computing Element It can run on several kinds of client batch systems: – Torque – LSF – SGE – Condor
18 Nov – Cape Town South African Grid Training 4 TORQUE client The Torque client is composed of: – pbs_mom – pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user
Worker Node installation & configuration using YAIM
There are several kinds of metapackages to install: ig_WN – “Generic” WorkerNode. ig_WN_noafs – Like ig_WN but without AFS. ig_WN_LSF – LSF WorkerNode. IMPORTANT: provided for consistency, it does not install LSF software but it apply some fixes via ig_configure_node. ig_WN_LSF_noafs – Like ig_WN_LSF but without AFS. ig_WN_torque – Torque WorkerNode. ig_WN_torque_noafs – Like ig_WN_torque but without AFS. WHAT KIND OF WN?
18 Nov – Cape Town South African Grid Training 7 Repository settings REPOS=”ca dag glite-wn ig jpackage glite-wn_torque gilda” Download and save the repo files: for name in $REPOS; do wget $name.repo -O /etc/yum.repos.d/$name.repo; done A Worker Node doesn't require a host certificate
18 Nov – Cape Town South African Grid Training 8 INSTALLATION yum remove jdk yum install xml-commons-resolver12 yum install jdk java sun-compat yum install lcg-CA yum install torque-mom cri.slc4 yum install ig_WN_torque_noafs Gilda rpms: yum install gilda_utils gilda_applications In case you want to have AFS installed: – yum install openafs openafs-client kernel-module-openafs- `uname -r` – yum install ig_WN_torque
18 Nov – Cape Town South African Grid Training Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig- users.conf and ig-groups.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customize ig-site-info.def
18 Nov – Cape Town South African Grid Training 10 Copy ig-site-info.def template file provided by ig_yaim into gilda directory and customize it cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/ Open /opt/glite/yaim/etc/gilda/ file using a text editor and set the following values according to your grid environment: CE_HOST= TORQUE_SERVER=$CE_HOST Customize ig-site-info.def
GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION=”/usr/java/latest” JOB_MANAGER=lcgpbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque VOS=”gilda” ALL_VOMS=”gilda” Customize ig-site-info.def
18 Nov – Cape Town South African Grid Training QUEUES=”short long infinite gilda” SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS In case of to configure a queue fo a single VO: QUEUES=”short long infinite gilda” SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE=”gilda” Customize ig-site-info.def
18 Nov – Cape Town South African Grid Training WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf The file specified in WN_LIST has to define the list of all your WNs' full hostnames. WARNING: It’s important to configure the WN file before you run the yaim configure command Customize ig-site-info.def
18 Nov – Cape Town South African Grid Training WN Torque CONFIGURATION Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -c \ -s /opt/glite/yaim/etc/gilda/ \ -n ig_WN_torque_noafs
Worker Node testing
18 Nov – Cape Town South African Grid Training Verify if the pbs_mom is active and if its status is free: root]# /etc/init.d/pbs_mom status pbs_mom (pid 3692) is running... root]# pbsnodes -a wn.localdomain state = free np = 2 properties = lcgpro ntype = cluster status = arch=linux,uname=Linux wn.localdomain EL.cern 1 Tue Oct 4 16:45:05 CEST 2005 i686,sessions= ,3584,nsessions=6,nusers=1,idletime=1569,totmem=254024kb,av ailmem=69852kb,physmem=254024kb,ncpus=1,loadave=0.30,rectim e= Testing
18 Nov – Cape Town South African Grid Training First of all, check if a generic user on WN can ssh to the CE without typing a password: root] su – gilda001 gilda001] ssh ce gilda001] The same test has to be executed between the WNs in order to run MPI jobs: gilda001] ssh wn1 gilda001] Testing
FIREWALL Setup
18 Nov – Cape Town South African Grid Training *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -s --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -p all -s -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables
18 Nov – Cape Town South African Grid Training IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start
Troubleshooting
18 Nov – Cape Town South African Grid Training root]# su – gilda001 gilda001] ssh ce password: probably this WN hostname is not in /etc/ssh/shosts.equiv or its ssh keys were not created and stored in /etc/ssh/ssh_known_hosts on CE Solution (run on CE): Ensure that the WN is in pbs list using: root]# pbsnodes –a And then: root]# /opt/edg/sbin/edg-pbs-shostsequiv root]# /opt/edg/sbin/edg-pbs-known-hosts Troubleshooting
18 Nov – Cape Town South African Grid Training root]# pbsnodes -a wn.localdomain state = down np = 2 properties = lcgpro ntype = cluster Solution: root]# /etc/init.d/pbs_mom restart Troubleshooting
18 Nov – Cape Town South African Grid Training 24