Download presentation
Presentation is loading. Please wait.
Published byDortha Nichols Modified over 8 years ago
1
First South Africa Grid Training WORKER NODE Albert van Eck University of the Free State 25 July, 2008
2
25 July, Capetown South African Grid Training in Capetown 2 OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING
3
25 July, Capetown South African Grid Training in Capetown 3 OVERVIEW The Worker Node is a service where the jobs run. It's main functionalities are: – execute the jobs – update the status of the jobs to the CE It can run on several kinds of client batch systems: – Torque – LSF – SGE – Condor
4
25 July, Capetown South African Grid Training in Capetown 4 TORQUE client The Torque client is composed of a: – pbs_mom – pbs_mom which places the job into execution. Its also responsible for returning the job’s output to the user
5
Worker Node installation & configuration using YAIM
6
There are several kinds of metapackages to install: ig_WN – “Generic” WorkerNode. ig_WN_noafs – Like ig_WN but without AFS. ig_WN_LSF – LSF WorkerNode. IMPORTANT: provided for consistency, it does not install LSF software but it apply some fixes via ig_configure_node. ig_WN_LSF_noafs – Like ig_WN_LSF but without AFS. ig_WN_torque – Torque WorkerNode. ig_WN_torque_noafs – Like ig_WN_torque but without AFS. WHAT KIND OF WN?
7
25 July, Capetown South African Grid Training in Capetown 7 Repository settings REPOS="ca dag glite-wn ig glite-torque_client jpackage gilda" Download and store repo files: for name in $REPOS; do wget http://grid018.ct.infn.it/mrepo/repos/$name.repo -O /etc/yum.repos.d/$name.repo; done
8
25 July, Capetown South African Grid Training in Capetown 8 INSTALLATION yum -y install jdk java-1.5.0-sun-compat yum -y install lcg-CA yum -y install ig_WN_torque_noafs In case you want to install AFS on WN (not for this tutorial) : yum -y install openafs openafs-client kernel-module- openafs-`uname -r` yum -y install ig_WN_torque Gilda rpms: yum -y install gilda_utils gilda_applications
9
25 July, Capetown South African Grid Training in Capetown – ( already performed in previous tutorial) Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig- users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customise ig-site-info.def
10
25 July, Capetown South African Grid Training in Capetown 10 – ( already performed in previous tutorial) Copy ig-site-info.def template file provided by ig_yaim into gilda dir cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/ Open /opt/glite/yaim/etc/gilda/ file using a text editor and set the following values according to your grid environment: CE_HOST= TORQUE_SERVER=$CE_HOST Customise ig-site-info.def
11
25 July, Capetown South African Grid Training in Capetown Customise ig-site-info.def – ( already performed in previous tutorial) WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf Save and close your ig-site-info.def file. The file specified as WN_LIST has to contain a list of all your WNs hostnames. On a clean configuration, this file won't exist and needs to be created. List the full hostnames of the nodes you want to use. One host entry per line: node01.trigrid.it.. node20.trigrid.it WARNING: Its important to configure this file before you run the YAIM configure command later in this document.
12
– ( already performed in previous tutorial) GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION="/usr/java/jdk1.5.0_14" JOB_MANAGER=lcgpbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque-2.1.9-4 VOS="gilda" ALL_VOMS="gilda" Customise ig-site-info.def
13
25 July, Capetown South African Grid Training in Capetown ( already performed in previous tutorial) QUEUES="short long infinite" SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS To configure a queue for a single VO ( as in this tutorial ): QUEUES="short long infinite gilda " SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE= " gilda " Customise ig-site-info.def
14
25 July, Capetown South African Grid Training in Capetown WN Torque CONFIGURATION Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -c -s /opt/glite/yaim/etc/gilda/ -n ig_WN_torque_noafs – The -c parameter specifies to configure – The -s /opt/... specifies where to find the ig-site-info.def file – The -n ig_WN_torque_noafs specifies to configure as a WN node with torque and without AFS – More than one -n node-types can be specified – Optionally one can run the script and replace the -c with -v to verify that the configuration file is valid, before rerunning it with the -c parameter.
15
Worker Node testing
16
25 July, Capetown South African Grid Training in Capetown Verify that the pbs_mom is active and that its free: [root@wn root]# /etc/init.d/pbs_mom status pbs_mom (pid 3692) is running... [root@wn root]# pbsnodes -a wn.localdomain state = free np = 2 properties = lcgpro ntype = cluster status = arch=linux,uname=Linux wn.localdomain 2.4.21-37.EL.cern 1 Tue Oct 4 16:45:05 CEST 2005 i686,sessions=5892 5910 563 1703 2649,3584,nsessions=6,nusers=1,idletime=1569,totmem=254024kb,av ailmem=69852kb,physmem=254024kb,ncpus=1,loadave=0.30,rectim e=1159016111 Testing
17
25 July, Capetown South African Grid Training in Capetown First of all, check if a generic user on the WN can ssh to the CE without typing a password: [root@wn root] su - gilda001 [gilda001@wn gilda001] ssh ce [gilda001@ce gilda001] The same test has to be executed between the WNs in order to run MPI jobs: [gilda001@wn gilda001] ssh wn1 [gilda001@wn1 gilda001] Testing
18
Firewall Setup
19
25 July, Capetown South African Grid Training in Capetown *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -s --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -p all -s -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables
20
25 July, Capetown South African Grid Training in Capetown IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start
21
Troubleshooting
22
25 July, Capetown South African Grid Training in Capetown [root@wn root]# su - gilda001 [gilda001@wn gilda001] ssh ce gilda001@ce’s password: probably this wn hostname is not in /etc/ssh/shosts.equiv or it's ssh keys were not created and stored in /etc/ssh/ssh_known_hosts on CE Solution (to run on CE): Ensure that the wn is in pbs list using: [root@ce root]# pbsnodes –a And then: [root@ce root]# /opt/edg/sbin/edg-pbs-shostsequiv [root@ce root]# /opt/edg/sbin/edg-pbs-known-hosts Troubleshooting
23
25 July, Capetown South African Grid Training in Capetown [root@wn root]# pbsnodes -a wn.localdomain state = down np = 2 properties = lcgpro ntype = cluster Solution (On WN): [root@wn root]# /etc/init.d/pbs_mom restart Troubleshooting
24
25 July, Capetown South African Grid Training in Capetown 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.