EGEE is a project funded by the European Union under contract IST-2003-508833 LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,

Slides:

Advertisements

Similar presentations

Workload Management David Colling Imperial College London.

Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.

Workload management Owen Maroney, Imperial College London (with a little help from David Colling)

INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.

SEE-GRID-SCI Hands-On Session: Workload Management System (WMS) Installation and Configuration Dusan Vudragovic Institute of Physics.

INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.

Basic Grid Job Submission Alessandra Forti 28 March 2006.

“Grey areas” of the new architecture Massimo Sgaravatto INFN Padova.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

INFSO-RI Enabling Grids for E-sciencE CREAM: a WebService based CE Massimo Sgaravatto INFN Padova On behalf of the JRA1 IT-CZ Padova.

M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1

DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.

INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter

Grid Workload Management Massimo Sgaravatto INFN Padova.

Grid job submission using HTCondor Andrew Lahiff.

Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.

M. Sgaravatto – n° 1 Overview of WP1 Workload Management System in EDG 2.x Massimo Sgaravatto INFN Padova - DataGrid WP1

INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM and ICE Massimo Sgaravatto – INFN Padova.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.

EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the WMS Salvatore Monforte (INFN.

EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,

Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.

Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.

EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.

Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.

WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.

EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.

INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.

D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.

WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.

Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)

EGEE is a project funded by the European Union under contract IST The Workload Management System: an example Simone Campana LCG Experiment.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.

Integrating HTCondor with ARC Andrew Lahiff, STFC Rutherford Appleton Laboratory HTCondor/ARC CE Workshop, Barcelona.

EGEE is a project funded by the European Union under contract IST Catania Site Report 1 half Marco Pappalardo INFN Catania JRA1 ITCZ Cluster.

INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.

EGEE is a project funded by the European Union under contract IST Datamat Status Report F. Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.

Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.

CE design report Luigi Zangrando

INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.

INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Report from.

EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.

CREAM Status and plans Massimo Sgaravatto – INFN Padova

INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.

Resource access in the EGEE project Massimo Sgaravatto INFN Padova

LCG and Glite open issues Massimo Sgaravatto INFN Padova

INFNGRID Technical Board, Feb

JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004

CREAM and ICE Test Results

CE-Monitor Luigi Zangrando INFN-Padova

WP1 WMS release 2: status and open issues

INFN – CNAF Site Report Author: Valentina Medici

Summary on PPS-pilot activity on CREAM CE

Preview Testbed Massimo Sgaravatto – INFN Padova

BLAHPd developments (David, Massimo, Giuseppe)

CREAM Status and Plans Massimo Sgaravatto – INFN Padova

The CREAM CE: When can the LCG-CE be replaced?

CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.

CMS report from FNAL demo week Marco Verlato (INFN-Padova)

LCG and Glite open issues Massimo Sgaravatto INFN Padova

Presentation transcript:

EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting, November 4-5,

, - 2 Problems hopefully already addressed The bugs below are still open in the LCG Savannah, but they have already been addressed  Patches provided (by us, or by LCG) Still open because patches under test/still to be tested #3252, #3546, #3807, #3848, #3883, #3884, #3895, #3896, #3900, #3916, #4009, #4047, #4070, #4098, #4109, #4127, #4144, #4378, #4836, #4891, #4909, #5237, #5238, #5244,#5261, #5269, #5427

, - 3 Issues not addressed yet #3302: On a RB+SE node there is a GridFTP problem  Asked for clarifications to LCG: no answer  Not considered a high priority problem #3671: To drain an RB  They would like to make possible to disallow new submissions, while allowing the other commands  Not addressed yet: only suggested, as trick, to set MaxInputSandboxSize=0 Doesn’t work for jobs without ISB #3724: LogMonitor should be resilient to full file system  Still to be understood why irepository.dat could not be recovered #3808: NetworkServer must log from which UI the job was submitted  A patch was provided, but it logs the UI address and the user DN in *separate* messages (and it is not possible to unambiguously connect them)  Asked if instead they could use the LB info instead: no answer

, - 4 Issues not addressed yet #3871: edg-wl-bkserverd: Terminating after 500 connections  'event_store_recover’ likely a inter-thread locking bug, which must be investigated  MarcoP agreed with D. Smith to provide a patch for all these bugs #4319: Suggestion for change of policy for resubmitted jobs  Basically they (D. Smith) think that if the job doesn’t even start its execution on a WN, this should not be counted as (re)submission  They'd want to be confident that the user payload of the previous attempts really have never started. However they don't require the same level of certainly in the opposite case  The “shallow resubmissions” should be limited by a configurable maximum number of attempts in the broker configuration OR by virtue of the fact that the shallow resubmission would need to target a previously tried CEid.  They would like a fix for the near future (~ 1 month)

, - 5 Issues not addressed yet #2716, #4126, #4894  Problems with NS affecting the same portion of code #4570: Multiple cancel requests can crash WM (and possibly PR)  Discussed at last meeting #4665: GlueCEPolicyMaxTotalJobs isn’t considered during matchmaking  Jobs shouldn’t be sent to CEs publishing jobs >= GlueCEPolicyMaxTotalJobs  Add this default requirement at WMS level (not UI) ?  Same for the other default requirements & rank #5347: FD limit for LM  Being discusses between Alessio and David Smith

, - 6 Issues addressed by LCG that we didn’t integrate yet #3931: Suggest a local proxy expiration check for WMS jobs  Proxy expiry check in the jobwrapper #4318: Matchmaking policy for resubmitted jobs  Remove previously matched sites in resubmission  Now we remove only previously matched CEs #4365: WL libraries/daemons must retry BDII queries  When the first query fails, it sleeps 5 seconds and retries; when the second attempt fails, it sleeps another 5 seconds and tries a third, final time #4388: WP1 on IA64: correct pointer casts in sources  Changes in interactive and LB to support IA64  Changes integrated for interactive but not for LB (as far as I know)

, - 7 Issues addressed by LCG that we didn’t integrate yet #4892: NS can (partially) crash with ‘unable to receive’  uncaught exception #5109: WMS daemon memory leaks  Memory leaks in JC, ldif2classad, LM, LB, NS  Fixes integrated only for JC and LM (as far as I know) #5274: Interface Resource Broker to Dataset catalogue (use the DataLocationInterface)  Heinz’s stuff