Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE is a project funded by the European Union under contract IST-2003-508833 LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,

Similar presentations

Presentation on theme: "EGEE is a project funded by the European Union under contract IST-2003-508833 LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,"— Presentation transcript:

1 EGEE is a project funded by the European Union under contract IST-2003-508833 LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting, November 4-5, 2004

2 , - 2 Problems hopefully already addressed The bugs below are still open in the LCG Savannah, but they have already been addressed  Patches provided (by us, or by LCG) Still open because patches under test/still to be tested #3252, #3546, #3807, #3848, #3883, #3884, #3895, #3896, #3900, #3916, #4009, #4047, #4070, #4098, #4109, #4127, #4144, #4378, #4836, #4891, #4909, #5237, #5238, #5244,#5261, #5269, #5427

3 , - 3 Issues not addressed yet #3302: On a RB+SE node there is a GridFTP problem  Asked for clarifications to LCG: no answer  Not considered a high priority problem #3671: To drain an RB  They would like to make possible to disallow new submissions, while allowing the other commands  Not addressed yet: only suggested, as trick, to set MaxInputSandboxSize=0 Doesn’t work for jobs without ISB #3724: LogMonitor should be resilient to full file system  Still to be understood why irepository.dat could not be recovered #3808: NetworkServer must log from which UI the job was submitted  A patch was provided, but it logs the UI address and the user DN in *separate* messages (and it is not possible to unambiguously connect them)  Asked if instead they could use the LB info instead: no answer

4 , - 4 Issues not addressed yet #3871: edg-wl-bkserverd: Terminating after 500 connections  'event_store_recover’ likely a inter-thread locking bug, which must be investigated  MarcoP agreed with D. Smith to provide a patch for all these bugs #4319: Suggestion for change of policy for resubmitted jobs  Basically they (D. Smith) think that if the job doesn’t even start its execution on a WN, this should not be counted as (re)submission  They'd want to be confident that the user payload of the previous attempts really have never started. However they don't require the same level of certainly in the opposite case  The “shallow resubmissions” should be limited by a configurable maximum number of attempts in the broker configuration OR by virtue of the fact that the shallow resubmission would need to target a previously tried CEid.  They would like a fix for the near future (~ 1 month)

5 , - 5 Issues not addressed yet #2716, #4126, #4894  Problems with NS affecting the same portion of code #4570: Multiple cancel requests can crash WM (and possibly PR)  Discussed at last meeting #4665: GlueCEPolicyMaxTotalJobs isn’t considered during matchmaking  Jobs shouldn’t be sent to CEs publishing jobs >= GlueCEPolicyMaxTotalJobs  Add this default requirement at WMS level (not UI) ?  Same for the other default requirements & rank #5347: FD limit for LM  Being discusses between Alessio and David Smith

6 , - 6 Issues addressed by LCG that we didn’t integrate yet #3931: Suggest a local proxy expiration check for WMS jobs  Proxy expiry check in the jobwrapper #4318: Matchmaking policy for resubmitted jobs  Remove previously matched sites in resubmission  Now we remove only previously matched CEs #4365: WL libraries/daemons must retry BDII queries  When the first query fails, it sleeps 5 seconds and retries; when the second attempt fails, it sleeps another 5 seconds and tries a third, final time #4388: WP1 on IA64: correct pointer casts in sources  Changes in interactive and LB to support IA64  Changes integrated for interactive but not for LB (as far as I know)

7 , - 7 Issues addressed by LCG that we didn’t integrate yet #4892: NS can (partially) crash with ‘unable to receive’  uncaught exception #5109: WMS daemon memory leaks  Memory leaks in JC, ldif2classad, LM, LB, NS  Fixes integrated only for JC and LM (as far as I know) #5274: Interface Resource Broker to Dataset catalogue (use the DataLocationInterface)  Heinz’s stuff

Download ppt "EGEE is a project funded by the European Union under contract IST-2003-508833 LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,"

Similar presentations

Ads by Google