FP6−2004−Infrastructures−6-SSA-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America Giuseppe Platania INFN Catania First EELA.

Slides:



Advertisements
Similar presentations
CHEP 2000, Roberto Barbera NA3, NA4, and NA5 activities Milano, Università di Catania and INFN Catania - Italy ALICE Collaboration.
Advertisements

CHEP 2000, Roberto Barbera Roberto Barbera (*) GENIUS: a Web Portal for the GRID Meeting Grid.it, Bologna, (*) work in collaboration.
Globus Workshop at CoreGrid Summer School 2006 Dipl.-Inf. Hamza Mehammed Leibniz Computing Centre.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
22-Apr-02D.P.Kelsey, Security, UKHEP Sysman1 Grid Security 22 Apr 2002 UK HEP Sysman Meeting David Kelsey CLRC/RAL, UK
Bob Jones – Project Architecture - 1 March n° 1 Information & Monitoring Services Antony Wilson WP3, RAL
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATOR E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
12th EELA Tutorial, Lima, FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America.
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America SRM Installation and Configuration.
John Kewley CCLRC Daresbury Laboratory NW-GRID Training Event 26 th January 2007 GROWL Scripts and Web Services John Kewley Grid Technology Group E-Science.
E-science grid facility for Europe and Latin America gLite 3.1 Windows Worker Node Installation Eng. Dario Russo – I.N.F.N Catania.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Instalación y configuración de CE+WN Alicia Acero Fernández.
Instalación y configuración de CE+WN Angelines Alberto CIEMAT Grid Tutorial, Sept
WMS+LB Server Installation and Configuration Carlos Fuentes Bermejo IRIS-CERT/RedIRIS 11th EELA Tutorial, Madrid de Septiembre de 2007.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
J. Hanton - P. Herquet - F. Lequeux - A. Romeyer1 CONDOR-G Installation July 2004 : one independent PC for Grid FTP as a client to UCL August 2004 : complete.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
INFSO-RI Enabling Grids for E-sciencE Computing Element installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial.
E-science grid facility for Europe and Latin America Installation and configuration of a top BDII Gianni M. Ricciardi – Consorzio COMETA.
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America BDII Server Installation Vanessa.
SEE-GRID-SCI SEE-GRID-SCI Operations Procedures and Tools Antun Balaz Institute of Physics Belgrade, Serbia The SEE-GRID-SCI.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America MyProxy server installation Emidio Giorgio.
INFSO-RI Enabling Grids for E-sciencE Installation and configuration of gLite Resource Broker Emidio Giorgio INFN EGEE-EMBRACE tutorial,
Open Science Grid OSG CE Quick Install Guide Siddhartha E.S University of Florida.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
INFSO-RI Enabling Grids for E-sciencE WMS + LB Installation Emidio Giorgio Giuseppe La Rocca INFN EGEE Tutorial, Rome November.2005.
9th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
INFSO-RI Enabling Grids for E-sciencE BDII installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial Clermont-Ferrand,
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America BDII Server Installation and Configuration.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Worker Node installation & configuration.
Creating and running an application.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid2Win: Porting of gLite middleware to.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America WMS+LB Server Installation Tony Calanducci.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Ninth EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America BDII Server Installation Yubiryn Ramírez.
Third EELA Tutorial for Managers and Users E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America WMS+LB Server Installation Eduardo Murrieta.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System Tutorial Laurence Field.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America BDII Server Installation Claudio Cherubino.
GLite WN Installation Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
SEE-GRID-SCI Grid Operations Procedures Antun Balaz Institute of Physics Belgrade Serbia The SEE-GRID-SCI initiative.
RI EGI-TF 2010, Tutorial Managing an EGEE/EGI Virtual Organisation (VO) with EDGES bridged Desktop Resources Tutorial Robert Lovas, MTA SZTAKI.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
EGEE is a project funded by the European Union under contract IST The Workload Management System: an example Simone Campana LCG Experiment.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
INFSO-RI Enabling Grids for E-sciencE Worker Node installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial Clermont-Ferrand,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Worker Node & Torque Client Installation.
Gri2Win: Porting gLite to run under Windows XP Platform
Grid2Win Porting of gLite middleware to Windows XP platform
AuthN and AuthZ in StoRM A short guide
Classic Storage Element
MyProxy Server Installation
Security aspects of the CREAM-CE
The Workload Management System
CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.
Gri2Win: Porting gLite to run under Windows XP Platform
From Prototype to Production Grid
Presentation transcript:

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Giuseppe Platania INFN Catania First EELA ROC-on-Duty Tutorial Itacuruçá Island, State of Rio de Janeiro, Brazil 29 November December 2006 Troubleshooting of common problems

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 2 Outline SECURITY JOB SUBMISSION SITE BDII

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania SECURITY

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 4 Security (1/5) GRAM Authentication test failure: –Test: globusrun -a -r GRAM Authentication test failure: authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context.c:171: gss_init_sec_context: SSLv3 handshake problems globus_i_gsi_gss_utils.c:888: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials globus_i_gsi_gss_utils.c:847: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials: Couldn't verify the remote certificate OpenSSL Error: s3_pkt.c:1046: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert bad certificate –Solutions: check on CE if the CA rpm is installed or if the 2119 port is closed by a firewall –You find more details at the page 2 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 5 Security (2/5) Invalid CRL: The available CRL has expired: –Test: globusrun -a -r GSS authentication failure GSS Major Status: Authentication Failed GSS Minor Status Error Chain: accept_sec_context.c:170: gss_accept_sec_context: SSLv3 handshake problems globus_i_gsi_gss_utils.c:881: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials globus_i_gsi_gss_utils.c:854: globus_i_gsi_gss_handshake: SSLv3 handshake problems: Couldn't do ssl handshake OpenSSL Error: s3_srvr.c:1816: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned globus_gsi_callback.c:351: globus_i_gsi_callback_handshake_callback: Could not verify credential globus_gsi_callback.c:477: globus_i_gsi_callback_cred_verify: Could not verify credential globus_gsi_callback.c:769: globus_i_gsi_callback_check_revoked: Invalid CRL: The available CRL has expired Failure: GSS failed Major:000a0000 Minor: Token: –Solutions: check on CE if the CRL has expired (see /var/log/globus-gatekeeper.log) If yes run: /opt/glite/libexec/fetch-crl.sh –You find more details at the page 3-4 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 6 Security (3/5) FTPD GSSAPI error: GSS Major Status: General failure: –Test: edg-gridftp-ls gsiftp:// / error the server sent an error response: FTPD GSSAPI error: GSS Major Status: Authentication Failed 535-FTPD GSSAPI error: GSS Minor Status Error Chain: 535-FTPD GSSAPI error: accept_sec_context.c:170: gss_accept_sec_context: SSLv3 handshake problems 535-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:881: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials 535-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:854: globus_i_gsi_gss_handshake: SSLv3 handshake problems: Couldn't do ssl handshake 535-FTPD GSSAPI error: OpenSSL Error: s3_srvr.c:1816: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned 535-FTPD GSSAPI error: globus_gsi_callback.c:351: globus_i_gsi_callback_handshake_callback: Could not verify credential 535-FTPD GSSAPI error: globus_gsi_callback.c:436: globus_i_gsi_callback_cred_verify: The certificate has expired: Credential with subject: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN Catania/CN=Giuseppe has expired. 535 FTPD GSSAPI error: accepting context –Solutions: Syncronize the nodes –You find more details at the page 5 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 7 Security (4/5) No local mapping for Globus ID –Test: edg-gridftp-ls gsiftp:// / error the server sent an error response: LCMAPS credential mapping NOT successful (see /var/log/gridftp-lcas_lcmaps.log) LCMAPS 0: :57: : lcmaps_plugin_voms_poolaccount-plugin_run(): no match (or no poolaccount available) for group (/VO=gilda/GROUP=/gilda) in /opt/edg/etc/lcmaps/gridmapfile –Solutions: ensure that under /etc/grid-security/gridmapdir there are the pool accounts files (such as gildaxxx) –You find more details at the page 6-8 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 8 Security (5/5) LCMAPS credential mapping NOT successful: –Test: edg-gridftp-ls gsiftp:// / error the server sent an error response: LCMAPS credential mapping NOT successful (see /var/log/gridftp-lcas_lcmaps.log) LCMAPS 0: :57: : lcmaps_plugin_voms_poolaccount-plugin_run(): no match (or no poolaccount available) for group (/VO=gilda/GROUP=/gilda) in /opt/edg/etc/lcmaps/gridmapfile Solutions: check if: oVO is enabled oin /opt/edg/etc/lcmaps/gridmapfile thare are the VOMS entries oall pool accounts are already in use –You find more details at the page 9 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania JOB SUBMISSION

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 10 Job Submission (1/8) 7 authentication failed: –Reasons from edg-job-get-logging-info: 7 authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain:init.c:497: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context –Solutions: check for the reverse lookup problem in "/etc/hosts" on the client side or dns configuration. –You find more details at the page 10 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 11 Job Submission (2/8) Cannot read JobWrapper output.... : –Reasons from edg-job-get-logging-info: Cannot read JobWrapper output, both from Condor and from Maradona –Solutions:  Fix WN, CE, DNS or batch system configuration.  Check PBS status running pbsnodes qstat  Try restarting the PBS daemons on the CE (and on the WN).  The gatekeeper and the gridftpd on the CE might not map the DN to the same local user.  This can happen if the one service is configured to use VOMS (via LCMAPS), while the other relies on the standard grid-mapfile. Test this as follows: $ globus-job-run my-CE /usr/bin/id $ globus-url-copy file:/etc/group gsiftp://my-CE/tmp/test $ globus-job-run my-CE /bin/ls -l /tmp/test –You find more details at the page of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 12 Job Submission (3/8) Brokerhelper: Cannot plan. No compatible resources : –Reasons from edg-job-get-logging-info: Cannot plan (a helper failed) –Solutions: The status Cannot plan (a helper failed) means that a helper of the Workload Manager failed. Match making may fail for several reasons that may arise either from a failing middleware component, or application software unavailable, or a wrong request in the job JDL:  middleware failure is due to Information Service problems: the service is down  application software unavailable: the JDL requires a wrong/unsupported software version by that site  wrong user request takes place when the user asks for: an unsopported CPU type/operating system/memory/VO –You find more details at the page of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 13 Job Submission (4/8) ssh problem from WN to CE: –TEST: globus-job-run :2119/jobmanager-lcgpbs -q short /bin/id It doesn’t give back no output –Solutions:  Remove shosts.equiv and ssh_known_hosts files from /etc/ssh directory on the CE.  Re-run the following scripts on CE, that are usually also cron jobs: /opt/edg/sbin/edg-pbs-knownhosts /opt/edg/sbin/edg-pbs-shostsequiv  Re-run the following script on WN, that is usually also a cron job: /opt/edg/sbin/edg-pbs-knownhosts –You find more details at the page 15 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 14 Job Submission (5/8) submit-helper script... gave error: cache export dir...: –TEST: globus-job-run :2119/jobmanager-lcgpbs /bin/id submit-helper script running on host lxb1761 gave error: cache_export_dir (/home/dteam002/.lcgjm/globus-cache-export.Of5sOd) on gatekeeper did not contain a cache_export_dir.tar archive –Solutions :  The CE is not running a gridftp daemon. Check on the CE: o/etc/init.d/globus-gridftp status oRestart it as needed  The gridftp port could be closed –You find more details at the page of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 15 Job Submission (6/8) Globus error 3: –Reason from edg-job-get-logging-info: Got a job held event, reason: Globus error 3: an I/O operation failed –Solutions : The problem was that memory was very low. queue_submit() in Helper.pm of GRAM checks for memory and returns a NORESOURCES error if the free memory is less than 2% of the total, NORESOURCES is GRAM error 3, not necesarily IO. Check the WN that has the above problem and reboot it –You find more details at the page 18 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 16 Job Submission (7/8) Unspecified gridmanager error: –Reason from edg-job-get-logging-info: Got a job held event, reason: Unspecified gridmanager error –Solutions : the user does not have permission to submit to the given queue, or because the batch system is in some bad state. Check it on the configuration of the CE. –You find more details at the page 19 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 17 Job Submission (8/8) GRAM Job submission failed: –TEST: globus-job-run :2119/jobmanager-lcgpbs /bin/id GRAM Job submission failed because the job manager failed to open stderr (error code 74) –Solutions : The UI does not have inbound connectivity for the GLOBUS_TCP_PORT_RANGE ( ). Fix the UI’s firewall. –You find more details at the page 20 of the troubleshooting guide

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania SITE BDII

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 19 Site BDII (1/3) Check if the GIIS is running on CE: ldapsearch -x -h -p b mds-vo-name=,o=grid ldap_bind: Can't contact LDAP server Solution: check if the site bdii is running on CE: o/etc/init.d/bdii status orestart it as needed

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 20 Site BDII (2/3) The Site BDII doesn’t publish CE informations: Ensure that in /opt/bdii/etc/bdii-update.conf there is the CE ldap URL such as: CE ldap://ce.localdomain:2135/mds-vo-name=local,o=grid Solution: put CE ldap URL in /opt/bdii/etc/bdii-update.conf and restart the BDII service (see /opt/bdii/var/bdii.log to check the errors)

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA ROC-on-Duty Tutorial - Giuseppe Platania 21 Site BDII (3/3) Site BDII error: tail -f /opt/bdii/var/bdii.log CE: ldap_bind: Can't contact LDAP server Time for searches: 0 s Time to update DB: 1 s Grabbing port 2170 for 2172 Tue Sep 19 11:47:45 CEST 2006 Sleeping for 30 Solution: ensure that the GRIS is running: o/etc/init.d/globus-mds restart orestart it as needed