Latest WMS news and more

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Status of PDC’06 Latchezar Betev TF meeting – September 28, 2006.
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM and ICE Massimo Sgaravatto – INFN Padova.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN – Alice Offline – Thu, 27 Mar 2008 – Marco MEONI - 1 Status of RAW data production (III) ALICE-LCG Task Force weekly.
Experiment Operations: ALICE Report WLCG GDB Meeting, CERN 14th October 2009 Patricia Méndez Lorenzo, IT/GS-EIS.
1 WLCG-GDB Meeting. CERN, 12 May 2010 Patricia Méndez Lorenzo (CERN, IT-ES)
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Configuring and Deploying Web Applications Lesson 7.
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSA3.4.1 “The process document” Oliver Keeble.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
CREAM CE: upgrades in the system  Migration of the ALICE production queue in the CREAM CE: DONE  From pps-cream-fzk.gridka.de:8443/cream-pbs-pps to.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
SRM 2.2: experiment requirements, status and deployment plans 6 th March 2007 Flavia Donno, INFN and IT/GD, CERN.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Quattor installation and use feedback from CNAF/T1 LCG Operation Workshop 25 may 2005 Andrea Chierici – INFN CNAF
CREAM Status and plans Massimo Sgaravatto – INFN Padova
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
Status of the SL5 migration ALICE TF Meeting
Introduction to CAST Technical Support
CEMon
ALICE Workload Model – WMS and CREAM
LCG Service Challenge: Planning and Milestones
Andreas Unterkircher CERN Grid Deployment
Status of the Production
Torrent-based software distribution
Summary on PPS-pilot activity on CREAM CE
Porting MM5 and BOLAM codes to the GRID
Technical Board Meeting, CNAF, 14 Feb. 2004
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Accounting at the T1/T2 Sites of the Italian Grid
Grid status ALICE Offline week Nov 3, Maarten Litmaath CERN-IT v1.0
The CREAM CE: When can the LCG-CE be replaced?
Update on gLite WMS tests
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
Introduction to CAST Technical Support
Introduction to CAST Technical Support
The CMS Beijing Site: Status and Application
The LHCb Computing Data Challenge DC06
Presentation transcript:

Latest WMS news and more ALICE TF Meeting 06/11/08

WMS: Current situation (I) The (WMS usage) random distribution implementation has been included at CERN and Torino since more than 1 week with good results Case: WMS is temporary overloaded Problem: Jobs will be kept and then submitted in one bunch Solution: A «drain flag» definition is foreseen for the WMS In this case if one WMS is overloaded, the submission will pass automatically to the 2nd WMS defined (UI feature) This is true if the list of WMS contains multiple nodes

WMS: Current situation (II) In order to explote all the potential of the drain flag feature we should be: Use RB1 OR RB2 OR RB3. If all these WMS fail… Use RB4 OR RB5 OR RB6 The defined code is now implemented at CERN and in Torino LDAP configuration wms1;wms2,wms3;wms4 1st group 2nd group Into the VOBOX, this means the following: $HOME/alien-logs/wms103.cern.ch;wms109.cern.ch.vo.conf Where this files looks like as: [ VirtualOrganisation = "alice"; WMProxyEndpoints = {"https://wms103.cern.ch:7443/glite_wms_wmproxy_server","https://wms109.cern.ch:7443/glite_wms_wmproxy_server"}; MyProxyServer = "myproxy.cern.ch"; ]

WMS news GRIF (France) will provide ALICE in few days with a WMS (latest version) Definition of the configuration already discussed this morning with the site NIKHEF (NL) has already one WMS in testing and will be provided also soon to ALICE GD team is encouraging us to have a more direct approach with the sites regarding the CREAM setup We must go for this negociation site per site

Bugs affecting Alice (I) Problem: If none of the listed WMS is able to accept job requirements, a random WMS in the Grid will be chosen and it might be typically not registered into myproxy server Solution:  EnableServiceDiscovery  =  false; In addition, remember the field suggested last week: Problem: If not specified, job request will be resent until 10 times if it fails before arriving to the WN Solution: ShallowRetryCount = 0; (shallow resubmission) This is what we already have: RetryCount = 0; (deep resubmission) Differences: The resubmission is deep when the job fails after it has started running on the WN, and shallow otherwise

Bugs affecting ALICE (II) Problem: When the target WMS node is in drain mode, job submission may hang as follows: glite-wms-job-submit -a -c ui/glite_wms_wms116.conf --noint simple.jdl Connecting to the service https://wms116.cern.ch:7443/glite_w... Warning - Unable to register the job to the service: https://wms116.cern.ch:7443/glite_w... Unavailable service (the server is temporarily drained) Method: jobRegister <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/env..." xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/enc..." xmlns:xsi="http://www.w3.org/2001/XMLSchema-in..." xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:delegationns="http://www.gridsite.org/namespaces/..." xmlns:ns1="http://glite.org/wms/wmproxy"> <SOAP-ENV:Body> <delegationns:getProxyReq> <delegationID>Jze-MmXbCVwyttoaLL9lbQ</delegationID> </delegationns:getProxyReq> </SOAP-ENV:Body> </SOAP-ENV:Envelope> Solution: The workaround is to submit jobs with stdin redirected from /dev/null:   glite-wms-job-submit ..... < /dev/null

In addition: SLC5 tests in place We have been asked to provide the deployment and the FIO teams with a feedback of the experiment experiences running in SLC5 Whole setup done this week in voalice03 After several cnfiguration issues solved directly with FIO, the system is perfectly running for ALICE More than 14 jobs running concurrently this morning Working in compatibility mode (all s/w SLC4 compiled) The current configuration is 32b The final must be 64b mode: Upgrade foreseen at the beginning of the next week