Presentation is loading. Please wait.

Presentation is loading. Please wait.

Latest WMS news and more

Similar presentations


Presentation on theme: "Latest WMS news and more"— Presentation transcript:

1 Latest WMS news and more
ALICE TF Meeting 06/11/08

2 WMS: Current situation (I)
The (WMS usage) random distribution implementation has been included at CERN and Torino since more than 1 week with good results Case: WMS is temporary overloaded Problem: Jobs will be kept and then submitted in one bunch Solution: A «drain flag» definition is foreseen for the WMS In this case if one WMS is overloaded, the submission will pass automatically to the 2nd WMS defined (UI feature) This is true if the list of WMS contains multiple nodes

3 WMS: Current situation (II)
In order to explote all the potential of the drain flag feature we should be: Use RB1 OR RB2 OR RB3. If all these WMS fail… Use RB4 OR RB5 OR RB6 The defined code is now implemented at CERN and in Torino LDAP configuration wms1;wms2,wms3;wms4 1st group nd group Into the VOBOX, this means the following: $HOME/alien-logs/wms103.cern.ch;wms109.cern.ch.vo.conf Where this files looks like as: [ VirtualOrganisation = "alice"; WMProxyEndpoints = {" MyProxyServer = "myproxy.cern.ch"; ]

4 WMS news GRIF (France) will provide ALICE in few days with a WMS (latest version) Definition of the configuration already discussed this morning with the site NIKHEF (NL) has already one WMS in testing and will be provided also soon to ALICE GD team is encouraging us to have a more direct approach with the sites regarding the CREAM setup We must go for this negociation site per site

5 Bugs affecting Alice (I)
Problem: If none of the listed WMS is able to accept job requirements, a random WMS in the Grid will be chosen and it might be typically not registered into myproxy server Solution:  EnableServiceDiscovery  =  false; In addition, remember the field suggested last week: Problem: If not specified, job request will be resent until 10 times if it fails before arriving to the WN Solution: ShallowRetryCount = 0; (shallow resubmission) This is what we already have: RetryCount = 0; (deep resubmission) Differences: The resubmission is deep when the job fails after it has started running on the WN, and shallow otherwise

6 Bugs affecting ALICE (II)
Problem: When the target WMS node is in drain mode, job submission may hang as follows: glite-wms-job-submit -a -c ui/glite_wms_wms116.conf --noint simple.jdl Connecting to the service Warning - Unable to register the job to the service: Unavailable service (the server is temporarily drained) Method: jobRegister <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV=" xmlns:SOAP-ENC=" xmlns:xsi=" xmlns:xsd=" xmlns:delegationns=" xmlns:ns1=" <SOAP-ENV:Body> <delegationns:getProxyReq> <delegationID>Jze-MmXbCVwyttoaLL9lbQ</delegationID> </delegationns:getProxyReq> </SOAP-ENV:Body> </SOAP-ENV:Envelope> Solution: The workaround is to submit jobs with stdin redirected from /dev/null:   glite-wms-job-submit < /dev/null

7 In addition: SLC5 tests in place
We have been asked to provide the deployment and the FIO teams with a feedback of the experiment experiences running in SLC5 Whole setup done this week in voalice03 After several cnfiguration issues solved directly with FIO, the system is perfectly running for ALICE More than 14 jobs running concurrently this morning Working in compatibility mode (all s/w SLC4 compiled) The current configuration is 32b The final must be 64b mode: Upgrade foreseen at the beginning of the next week


Download ppt "Latest WMS news and more"

Similar presentations


Ads by Google