Www.eu-eela.org E-science grid facility for Europe and Latin America COMPUTING ELEMENT GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008.

Slides:



Advertisements
Similar presentations
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATOR E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
Advertisements

Instalación y configuración de CE+WN Angelines Alberto CIEMAT Grid Tutorial, Sept
INFSO-RI Enabling Grids for E-sciencE Computing Element installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial.
1 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, The EPIKH Project (Exchange Programme.
Ninth EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America User Interface installation and configuration.
E-science grid facility for Europe and Latin America UI PnP and UI Installation User and Site Admin Tutorial Riccardo Bruno – INFN Catania.
Ninth EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) VOMS Installation and configuration Bouchra
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra
E-science grid facility for Europe and Latin America Installation and configuration of a top BDII Gianni M. Ricciardi – Consorzio COMETA.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
IST E-infrastructure shared between Europe and Latin America VOMS and MyProxy Server installation and configuration Pedro Henrique.
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America BDII Server Installation Vanessa.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Workload Management System + Logging&Bookkeeping Installation.
E-science grid facility for Europe and Latin America LFC Server Installation and Configuration Antonio Calanducci INFN Catania.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
E-science grid facility for Europe and Latin America gLite WMS Installation and configuration Riccardo Bruno – INFN.CT 30/06/2008 – 04/07/2008.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America MyProxy server installation Emidio Giorgio.
9th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
E-science grid facility for Europe and Latin America GridwWin: porting gLite to run under Windows Fabio Scibilia – Consorzio COMETA 30/06/2008.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America BDII Server Installation and Configuration Antonio Juan.
INFSO-RI Enabling Grids for E-sciencE BDII installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial Clermont-Ferrand,
BDII Server Installation and Configuration Manuel Rubio del Solar Extremadura Advanced Research Center (CETA-CIEMAT) 11th EELA Tutorial for Users Sevilla,
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Giuseppe La Rocca INFN – Catania
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America BDII Server Installation and Configuration.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America User Interface (gLite 1.4) Installation.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Worker Node installation & configuration.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Introduction to the tutorial for site managers.
1 Grid2Win: porting of gLite middleware to Windows Dario Russo INFN Catania
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid2Win: Porting of gLite middleware to.
CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America WMS+LB Server Installation Tony Calanducci.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America R-GMA Server Installation Valeria Ardizzone.
Ninth EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America BDII Server Installation Yubiryn Ramírez.
Third EELA Tutorial for Managers and Users E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America WMS+LB Server Installation Eduardo Murrieta.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America BDII Server Installation Claudio Cherubino.
12th EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Presentation of the results khiat abdelhamid
GLite WN Installation Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
First South Africa Grid Training WORKER NODE Albert van Eck University of the Free State 25 July, 2008.
First South Africa Grid Training Installation and configuration of BDII Gianni M. Ricciardi Consorzio COMETA First South Africa Grid Training Catania,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks BDII Server Installation & Configuration.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
INFSO-RI Enabling Grids for E-sciencE The Information System: GRIS, GIIS, BDII, information providers, Installation,Configuration,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) WMS LB BDII Installation and Configuration Salma Saber
Site BDII and CE Installation Muhammad Farhan Sjaugi, UPM 2009 November , UM Malaysia 1.
Overview about other gLite services Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
WORKER NODE Alfonso Pardo EPIKH School, System Admin Tutorial Beijing, 2010 August 30th – 2010 September 3th.
INFSO-RI Enabling Grids for E-sciencE Worker Node installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial Clermont-Ferrand,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
South African Grid Training WORKER NODE Albert van Eck UFS - ICTS 17 November 2009 Slides by GIUSEPPE PLATANIA.
16-26 June 2008, Catania (Italy) First South Africa Grid Training LFC Server Installation and Configuration Antonio Calanducci INFN Catania.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Elisa Ingrà Consortium GARR- Roma WMS LB.
YAIM Giuseppe Platania INFN Catania EMBRACE Tutorial
Grid2Win Porting of gLite middleware to Windows XP platform
LFC Server Installation & Configuration
MyProxy Server Installation
Installation and configuration of a top BDII
(Exchange Programme to advance e-Infrastructure Know-How)
Installation and configuration of a Computing Element
gLite User Interface Installation
Berkley Database Information Index (BDII) Server Installation & Configuration Giuseppe La Rocca INFN – Catania gLite Tutorial Rome, April 2006.
WMS LB topBDII Installation and Configuration
gLite User Interface Installation and configuration
BDII Server Installation and Configuration
WMS+LB Server Installation and Configuration
Presentation transcript:

E-science grid facility for Europe and Latin America COMPUTING ELEMENT GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – OVERVIEW The Computing Element is the central service of a site. Its main functionally are: – manage the jobs (job submission, job control)‏ – update to WMS the status of the jobs – publish all site informations (site location, queues, about the CPUs status, and so on) via ldap (site BDII service)‏ It can run several kinds of batch system: – Torque + MAUI – LSF – SGE – Condor

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – TORQUE + MAUI The Torque server is composed by a: – pbs_server – pbs_server which provides the basic batch services such as receiving/creating a batch job. The Torque client is composed by a: – pbs_mom – pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user The MAUI system is composed by a: – job_scheduler – job_scheduler which contains the site's policies in order to choose which job must be executed.

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Site BDII** – By default it is installed on the CE – It collects all site GRISes* (for example SE,RB,LFC,etc..)‏ – The name of the service is bdii – The list of GRISes you want to publish is:  /opt/glite/etc/gip/site-urls.conf – Log file: /opt/bdii/var/bdii.log *GRIS=Grid Resource Information Service **BDII=Berkely Database Infomatin Index

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Computing Element installation & configuration using YAIM

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – There are several kinds of metapackages to install: ig_CE – LCG ComputingElement without batch system packages. ig_CE_LSF – LCG ComputingElement with LSF. IMPORTANT: providedfor consistency, it does not install LSF but it apply some fixes via ig_configure_node. ig_CE_torque – LCG ComputingElement with Torque+MAUI. WHAT KIND OF CE?

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – HOW TO GET AN HOST CERTIFICATE Host certificate for CE. – Please, request it to your RA Install host certificate (hostcert.pem and hostkey.pem) in /etc/grid-security. – mkdir /etc/grid-security – chmod 644 hostcert.pem – chmod 400 hostkey.pem

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Repository settings REPOS="ca dag ig jpackage gilda glite-lcg_ce_torque glite- bdii" Download and store repo files: for name in $REPOS; do wget -O /etc/yum.repos.d/$name.repo; done

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – INSTALLATION yum install jdk java sun-compat yum install lcg-CA yum install ig_CE_torque If it's also the site bdii collector: yum install ig_BDII Gilda rpms: yum install gilda_utils

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Copy ig-site-info.def template file provided by ig_yaim in to gilda dir and customize it cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/ Open /opt/glite/yaim/etc/gilda/ file using a text editor and set the following values according to your grid environment: CE_HOST= BATCH_SERVER=$CE_HOST Customize ig-site-info.def

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf The file specified in WN_LIST has to be set with the list of all your WNs hostname. WARNING: It’s important to setup it before to run the configure command Customize ig-site-info.def

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customize ig-site-info.def

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION="/usr/java/j2sdk1.4.2_12“ SITE_NAME=GILDA SITE_LOC=“Catania, ITALY" SITE_LAT=37.5 SITE_LONG= SITE_WEB=" SITE_TIER="GILDA Testbed" " Customize ig-site-info.def

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – JOB_MANAGER=lcgpbs CE_BATCH_SYS=pbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque CE_CPU_MODEL=Opteron CE_CPU_VENDOR=AMD CE_CPU_SPEED=3000 CE_OS="Scientific Linux“ CE_OS_RELEASE=4.5 CE_OS_VERSION="SL“ CE_MINPHYSMEM=2048 CE_MINVIRTMEM=4096 CE_SMPSIZE=2 CE_SI00=1000 CE_SF00=1200 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=TRUE Customize ig-site-info.def

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – DPM_HOST=“dpm_hostname” SE_LIST="$DPM_HOST“ SITE_BDII_HOST=$CE_HOST BDII_REGIONS="CE SE“ BDII_CE_URL="ldap://$CE_HOST:2170/mds-vo- name=resource,o=grid“ BDII_SE_URL="ldap://$DPM_HOST:2170/mds-vo- name=resource,o=grid“ VOS=“gilda” ALL_VOMS=“gilda” Customize ig-site-info.def

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – QUEUES="short long infinite“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS In case of to configure a queue fo a single VO: QUEUES="short long infinite gilda“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE=“gilda” Customize ig-site-info.def

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – CE Torque CONFIGURATION Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -c -s /opt/glite/yaim/etc/gilda/ -n ig_CE_torque -n BDII_site

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Computing Element testing

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Check if the local GRIS and the site BDII are running on CE and are publishing the right informations (CPU, site name and so on)‏ ldapsearch -x -h -p b mds-vo- name=resource,o=grid ldapsearch -x -h -p b mds-vo- name=,o=grid Testing

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Become a gilda user # su – gilda001 Edit a file and write: #!/bin/sh sleep 20 #(it's useful to see the job status)‏ hostname Save it and set the permission of execution: chmod 700 test.sh Testing

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – gilda001]$ qsub -q short test.sh gilda001]$ qstat -a ce.localdomain: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time wn.localdo gilda001 short test.sh :15 R -- Testing

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – gilda001]$ qstat -a gilda001]$ The job execution has finished and we have to list the output file: gilda001]$ ls test.sh.e3 test.sh.o3 And show them: gilda001]$ cat test.sh.e3 (error file)‏ gilda001]$ gilda001]$ cat test.sh.o3 (output file)‏ wn.localdomain Testing

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Log on the UI: hostname -> glite-tutor.ct.infn.it Username -> catania Password -> GridCAT Grid passphrase -> CATANIA Testing

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – plt]$ voms-proxy-init –voms gilda plt]$ globus-job-run grid006.ct.infn.it:2119/jobmanager-lcgpbs -q short /bin/hostname wn.localdomain plt]$ edg-job-submit -r grid006.ct.infn.it:2119/jobmanager-lcgpbs-short hostname.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb.ct.infn.it, port 7772 Logging to host glite-rb.ct.infn.it, port 9002 ******************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - ******************************************************************************** Testing

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – FIREWALL SETUP

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – /etc/sysconfig/iptables (1/2)‏ *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport maui -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_mom -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_resmom -j ACCEPT

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3878:3879 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 1020:1023 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 20000: j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 32768: j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 32768: j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables (2/2)‏

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Troubleshooting

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – Troubleshooting plt]$ globus-job-run :2119/jobmanager-lcgpbs -q short /bin/hostname GRAM Job submission failed because the connection to the server failed (check host and port) (error code 12)‏ solution: check if the globus-gatekeeper daemon is up and running on CE plt]$ globus-job-run :2119/jobmanager-lcgpbs -q short /bin/hostname GRAM Job submission failed because authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context.c:171: gss_init_sec_context: SSLv3 handshake problems globus_i_gsi_gss_utils.c:888: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials globus_i_gsi_gss_utils.c:847: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials: Couldn't verify the remote certificate OpenSSL Error: s3_pkt.c:1046: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert bad certificate (error code 7)‏ solution: probably there is no GILDA CA rpm installed on CE

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – plt]$ edg-gridftp-ls gsiftp:// / error the server sent an error response: LCMAPS credential mapping NOT successful solution: check on CE the VO mapping in /opt/edg/etc/lcmaps/gridmapfile /opt/edg/etc/lcmaps/groupmapfile Troubleshooting

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – The CE is publishing wrong informations such as: GlueCEStateFreeCPUs: 0 GlueCEStateRunningJobs: 0 GlueCEStateStatus: Production GlueCEStateTotalJobs: 0 GlueCEStateWaitingJobs: 4444 Run the script: /opt/glite/etc/gip/plugin/glite-info-dynamic-scheduler-wrapper and check if it gives some errors. Often it doesn’t work because the batch system is down or in lock state. In this case restart torque service: /etc/init.d/pbs_server restart Troubleshooting

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, – If a query to the site BDII doesn’t show the information about a site, you have to look at the bdii log file /opt/bdii/var/bdii.log For example: GILDA: ldap_bind: Can't contact LDAP server Check if: – bdii is up & running (ps aux |grep bdii)‏ – That resource url is in the list file /opt/glite/etc/gip/site-urls.conf – Firewall setup Troubleshooting

Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, –