Presentation is loading. Please wait.

Presentation is loading. Please wait.

WORKER NODE Alfonso Pardo EPIKH School, System Admin Tutorial Beijing, 2010 August 30th – 2010 September 3th.

Similar presentations


Presentation on theme: "WORKER NODE Alfonso Pardo EPIKH School, System Admin Tutorial Beijing, 2010 August 30th – 2010 September 3th."— Presentation transcript:

1 WORKER NODE Alfonso Pardo EPIKH School, System Admin Tutorial Beijing, 2010 August 30th – 2010 September 3th

2 OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING 2

3 OVERVIEW The Worker Node is a service where the jobs run. Its main functionally are: – execute the jobs – update to Computing Element the status of the jobs It can run several kinds of client batch system: – Torque – LSF – SGE – Condor 3

4 TORQUE client The Torque client is composed by a: – pbs_mom – pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user 4

5 Worker Node installation & configuration using YAIM

6 There are several kinds of metapackages to install: glite_WN – “Generic” WorkerNode. glite_WN_noafs – Like ig_WN but without AFS. glite_WN_LSF – LSF WorkerNode. IMPORTANT: provided for consistency, it does not install LSF software but it apply some fixes via ig_configure_node. glite_WN_LSF_noafs – Like ig_WN_LSF but without AFS. glite_WN_torque – Torque WorkerNode. glite_WN_torque_noafs – Like ig_WN_torque but without AFS. WHAT KIND OF WN?

7 Repository settings REPOS=Ӗcg-ca dag ig gilda glite-wn_torque" Download and store repo files: for name in $REPOS; do wget http://grid- it.cnaf.infn.it/mrepo/repos/sl5/x86_64/$name.repo -O /etc/yum.repos.d/$name.repo; done wget http://grid018.ct.infn.it/mrepo/repos/gilda.repo -O /etc/yum.repos.d/gilda.repohttp://grid018.ct.infn.it/mrepo/repos/gilda.repo -O /etc/yum.repos.d/ wget http://grid-it.cnaf.infn.it/mrepo/repos/jpackage.repo -O /etc/yum.repos.d/jpackage.repohttp://grid-it.cnaf.infn.it/mrepo/repos/jpackage.repo 7

8 INSTALLATION yum install jdk java-1.6.0-sun-compat yum install lcg-CA yum install lcg-WN yum install glite-TORQUE_utils yum install glite-TORQUE_client Gilda rpms: yum install gilda_utils gilda_applications 8

9 Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig- users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customize ig-site-info.def

10 Copy ig-site-info.def template file provided by ig_yaim in to gilda dir and customize it cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/ Open /opt/glite/yaim/etc/gilda/ file using a text editor and set the following values according to your grid environment: CE_HOST= TORQUE_SERVER=$CE_HOST 10 Customize ig-site-info.def

11 WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf The file specified in WN_LIST has to be set with the list of all your WNs hostname. WARNING: It’s important to setup it before to run the configure command Customize ig-site-info.def

12 GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION="/usr/bin/java/jdk“ JOB_MANAGER=lcgpbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque-2.1.9-4 VOS=“gilda” ALL_VOMS=“gilda” Customize ig-site-info.def

13 QUEUES="short long infinite“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS In case of to configure a queue fo a single VO: QUEUES="short long infinite gilda“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE=“gilda” Customize ig-site-info.def

14 WN Torque CONFIGURATION Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -n glite-WN -n glite-TORUQE_client -n glite- TORQUE_utils

15 Worker Node testing

16 Verify if the pbs_mom is active and if its status is free: [root@wn root]# /etc/init.d/pbs_mom status pbs_mom (pid 3692) is running... [root@wn root]# pbsnodes -a wn.localdomain state = free np = 2 properties = lcgpro ntype = cluster status = arch=linux,uname=Linux wn.localdomain 2.4.21-37.EL.cern 1 Tue Oct 4 16:45:05 CEST 2005 i686,sessions=5892 5910 563 1703 2649,3584,nsessions=6,nusers=1,idletime=1569,totmem=254024kb,availme m=69852kb,physmem=254024kb,ncpus=1,loadave=0.30,rectime=115901611 1 Testing

17 First of all, check if a generic user on WN can do ssh to the CE without type the password: [root@wn root] su – gilda001 [gilda001@wn gilda001] ssh ce [gilda001@ce gilda001] The same test has to be executed between the WNs in order to run MPI jobs: [gilda001@wn gilda001] ssh wn1 [gilda001@wn1 gilda001] Testing

18 FIREWALL setup

19 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -s --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -p all -s -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables

20 IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start

21 Troubleshooting

22 [root@wn root]# su – gilda001 [gilda001@wn gilda001] ssh ce gilda001@ce’s password: probably this wn hostname is not in /etc/ssh/shosts.equiv or its ssh keys were not created and stored in /etc/ssh/ssh_known_hosts on CE Solution (to run on CE): Ensure that the wn is in pbs list using: [root@ce root]# pbsnodes –a And then: [root@ce root]# /opt/edg/sbin/edg-pbs-shostsequiv [root@ce root]# /opt/edg/sbin/edg-pbs-known-hosts Troubleshooting

23 [root@wn root]# pbsnodes -a wn.localdomain state = down np = 2 properties = lcgpro ntype = cluster Solution: [root@wn root]# /etc/init.d/pbs_mom restart Troubleshooting

24 24


Download ppt "WORKER NODE Alfonso Pardo EPIKH School, System Admin Tutorial Beijing, 2010 August 30th – 2010 September 3th."

Similar presentations


Ads by Google