Presentation is loading. Please wait.

Presentation is loading. Please wait.

First South Africa Grid Training WORKER NODE Albert van Eck University of the Free State 25 July, 2008.

Similar presentations


Presentation on theme: "First South Africa Grid Training WORKER NODE Albert van Eck University of the Free State 25 July, 2008."— Presentation transcript:

1 First South Africa Grid Training WORKER NODE Albert van Eck University of the Free State 25 July, 2008

2 25 July, Capetown South African Grid Training in Capetown 2 OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING

3 25 July, Capetown South African Grid Training in Capetown 3 OVERVIEW The Worker Node is a service where the jobs run. It's main functionalities are: – execute the jobs – update the status of the jobs to the CE It can run on several kinds of client batch systems: – Torque – LSF – SGE – Condor

4 25 July, Capetown South African Grid Training in Capetown 4 TORQUE client The Torque client is composed of a: – pbs_mom – pbs_mom which places the job into execution. Its also responsible for returning the job’s output to the user

5 Worker Node installation & configuration using YAIM

6 There are several kinds of metapackages to install: ig_WN – “Generic” WorkerNode. ig_WN_noafs – Like ig_WN but without AFS. ig_WN_LSF – LSF WorkerNode. IMPORTANT: provided for consistency, it does not install LSF software but it apply some fixes via ig_configure_node. ig_WN_LSF_noafs – Like ig_WN_LSF but without AFS. ig_WN_torque – Torque WorkerNode. ig_WN_torque_noafs – Like ig_WN_torque but without AFS. WHAT KIND OF WN?

7 25 July, Capetown South African Grid Training in Capetown 7 Repository settings REPOS="ca dag glite-wn ig glite-torque_client jpackage gilda" Download and store repo files: for name in $REPOS; do wget http://grid018.ct.infn.it/mrepo/repos/$name.repo -O /etc/yum.repos.d/$name.repo; done

8 25 July, Capetown South African Grid Training in Capetown 8 INSTALLATION yum -y install jdk java-1.5.0-sun-compat yum -y install lcg-CA yum -y install ig_WN_torque_noafs In case you want to install AFS on WN (not for this tutorial) : yum -y install openafs openafs-client kernel-module- openafs-`uname -r` yum -y install ig_WN_torque Gilda rpms: yum -y install gilda_utils gilda_applications

9 25 July, Capetown South African Grid Training in Capetown – ( already performed in previous tutorial) Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig- users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customise ig-site-info.def

10 25 July, Capetown South African Grid Training in Capetown 10 – ( already performed in previous tutorial) Copy ig-site-info.def template file provided by ig_yaim into gilda dir cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/ Open /opt/glite/yaim/etc/gilda/ file using a text editor and set the following values according to your grid environment: CE_HOST= TORQUE_SERVER=$CE_HOST Customise ig-site-info.def

11 25 July, Capetown South African Grid Training in Capetown Customise ig-site-info.def – ( already performed in previous tutorial) WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf Save and close your ig-site-info.def file. The file specified as WN_LIST has to contain a list of all your WNs hostnames. On a clean configuration, this file won't exist and needs to be created. List the full hostnames of the nodes you want to use. One host entry per line: node01.trigrid.it.. node20.trigrid.it WARNING: Its important to configure this file before you run the YAIM configure command later in this document.

12 – ( already performed in previous tutorial) GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION="/usr/java/jdk1.5.0_14" JOB_MANAGER=lcgpbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque-2.1.9-4 VOS="gilda" ALL_VOMS="gilda" Customise ig-site-info.def

13 25 July, Capetown South African Grid Training in Capetown ( already performed in previous tutorial) QUEUES="short long infinite" SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS To configure a queue for a single VO ( as in this tutorial ): QUEUES="short long infinite gilda " SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE= " gilda " Customise ig-site-info.def

14 25 July, Capetown South African Grid Training in Capetown WN Torque CONFIGURATION Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -c -s /opt/glite/yaim/etc/gilda/ -n ig_WN_torque_noafs – The -c parameter specifies to configure – The -s /opt/... specifies where to find the ig-site-info.def file – The -n ig_WN_torque_noafs specifies to configure as a WN node with torque and without AFS – More than one -n node-types can be specified – Optionally one can run the script and replace the -c with -v to verify that the configuration file is valid, before rerunning it with the -c parameter.

15 Worker Node testing

16 25 July, Capetown South African Grid Training in Capetown Verify that the pbs_mom is active and that its free: [root@wn root]# /etc/init.d/pbs_mom status pbs_mom (pid 3692) is running... [root@wn root]# pbsnodes -a wn.localdomain state = free np = 2 properties = lcgpro ntype = cluster status = arch=linux,uname=Linux wn.localdomain 2.4.21-37.EL.cern 1 Tue Oct 4 16:45:05 CEST 2005 i686,sessions=5892 5910 563 1703 2649,3584,nsessions=6,nusers=1,idletime=1569,totmem=254024kb,av ailmem=69852kb,physmem=254024kb,ncpus=1,loadave=0.30,rectim e=1159016111 Testing

17 25 July, Capetown South African Grid Training in Capetown First of all, check if a generic user on the WN can ssh to the CE without typing a password: [root@wn root] su - gilda001 [gilda001@wn gilda001] ssh ce [gilda001@ce gilda001] The same test has to be executed between the WNs in order to run MPI jobs: [gilda001@wn gilda001] ssh wn1 [gilda001@wn1 gilda001] Testing

18 Firewall Setup

19 25 July, Capetown South African Grid Training in Capetown *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -s --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -p all -s -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables

20 25 July, Capetown South African Grid Training in Capetown IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start

21 Troubleshooting

22 25 July, Capetown South African Grid Training in Capetown [root@wn root]# su - gilda001 [gilda001@wn gilda001] ssh ce gilda001@ce’s password: probably this wn hostname is not in /etc/ssh/shosts.equiv or it's ssh keys were not created and stored in /etc/ssh/ssh_known_hosts on CE Solution (to run on CE): Ensure that the wn is in pbs list using: [root@ce root]# pbsnodes –a And then: [root@ce root]# /opt/edg/sbin/edg-pbs-shostsequiv [root@ce root]# /opt/edg/sbin/edg-pbs-known-hosts Troubleshooting

23 25 July, Capetown South African Grid Training in Capetown [root@wn root]# pbsnodes -a wn.localdomain state = down np = 2 properties = lcgpro ntype = cluster Solution (On WN): [root@wn root]# /etc/init.d/pbs_mom restart Troubleshooting

24 25 July, Capetown South African Grid Training in Capetown 24


Download ppt "First South Africa Grid Training WORKER NODE Albert van Eck University of the Free State 25 July, 2008."

Similar presentations


Ads by Google