WP1 WMS release 2: status and open issues

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

Workload Management David Colling Imperial College London.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Stephen Burke – Heidelberg - 26/9/2003 Partner Logo Overview of applications view of the data management middleware Stephen Burke.
Job Submission The European DataGrid Project Team
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
“Grey areas” of the new architecture Massimo Sgaravatto INFN Padova.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract IST Workload management system testing : - what exists - further testing planned.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
M. Sgaravatto – n° 1 Overview of WP1 Workload Management System in EDG 2.x Massimo Sgaravatto INFN Padova - DataGrid WP1
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
WP3 Information and Monitoring Steve Fisher / RAL 23/9/2003.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
30-Sep-03D.P.Kelsey, SCG Summary1 Security Co-ordination Group (WP7 SCG) EDG Heidelberg 30 September 2003 David Kelsey CCLRC/RAL, UK
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM and ICE Massimo Sgaravatto – INFN Padova.
M. Sgaravatto – n° 1 Overview of release 2 of the EDG WP1 Workload Management System deployed in the INFN production Grid Massimo Sgaravatto INFN Padova.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the WMS Salvatore Monforte (INFN.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Summary of the EDG review Some info for the next future of the WP1 software Massimo Sgaravatto INFN Padova.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Datamat Status Report F. Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
Massimo Sgaravatto INFN Padova
LCG and Glite open issues Massimo Sgaravatto INFN Padova
Gri2Win: Porting gLite to run under Windows XP Platform
CEMon
Turin, IT-CZ JRA1 meeting, 4-5 Nov 2004
JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004
CREAM and ICE Test Results
Overview of applications view of the job management middleware
CE-Monitor Luigi Zangrando INFN-Padova
Sergio Andreozzi + ValerioVenturi
Workload Management System ( WMS )
Summary on PPS-pilot activity on CREAM CE
Preview Testbed Massimo Sgaravatto – INFN Padova
Technical Board Meeting, CNAF, 14 Feb. 2004
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Massimo Sgaravatto INFN Padova On behalf of the CREAM product team
The CREAM CE: When can the LCG-CE be replaced?
Job Submission in the DataGrid Workload Management System
Introduction to Grid Technology
CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.
Short update on the latest gLite status
Job workflow Pre production operations:
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Gri2Win: Porting gLite to run under Windows XP Platform
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
LCG and Glite open issues Massimo Sgaravatto INFN Padova
Presentation transcript:

WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova

Status Latest released WP1 RPMs: 2.1.5 Deployed in the EDG dev testbed Under tests by LCG New procedure (agreed months ago but never applied) used when we have to release RPMs (in order to avoid releasing broken RPMs) “Test” branch forked from “head” Tests done relying on this branch Fixes to be applied on the “Test” branch and then do a merge, or fixes applied to both “Test” and “head” branches ?

What was fixed since Heidelberg Problem with resubmission: resubmission was tried even if a job aborted because proxy expired (bug #1643) Not use anymore /tmp, otherwise “old” WMS files get deleted by tmpwatch (bug #1918) Actually tmpwatch affects also /var/tmp Restart of daemons (bug #1105) edg-wl-ns restart didn’t work (bug #1798) Documentation: API doc provided (in WP1 web site) Deadlock problem with filelist (bug #2054) Remove grid-proxy-init/edg-voms-proxy-init from UI commands (bug #2195) Warning message if a “non supported” JDL attribute is specified (bug #446) Performance problems when edg-job-status/get-logging-info called for multiple jobs (bug #2196) …

Open issues and missing functionalities In Heidelberg we decided to address various issues by the end of the projects There are also some new ones We should decide which ones can be really addressed by the end of the project (and where: head/dagman), taking into account the other priorities

Open issues/missing functionalities Jobs stay in the “done” status after OSB retrieval (bug #2229) Failures logging the “Cleared” event Many job submissions fail because “Register” logging fails Or at least it is reported that the logging failed Filelist problem (bug #2220) Looks like the problem was not really fixed Segfaults (?) in NS, WM, JC, LM Memory leaks in NS (bug #2104) Memory leaks in underline code ? Logging by WM, JC and LM fail when SSL problems using user proxy (not only when it expired) Shall we use the host proxy when this happens ? (bug #2016) Problem with resubmission: CEs already “used” are not considered anymore (bug #1103)

Open issues/missing functionalities Registration of WMS services in RGMA and status scripts (bug #1324) BrokerInfo: Replacement for old getSelectedFile needed (bug #1848) Dynamic quota management in NS edg-job-list-match and edg-job-submit can hang (bug #1362) Approach: allow at least CTRL-C Not abort immediately a job in case of problems (RLS or II down), but retry for a while (bug #1812) Matchmaking should be retried till a certain TimeLimit=Min(TimeLimitJDL, TimeLimitConf) More clear error messages when no resources found with edg-job-list-match (bug #1997) As already done with edg-job-submit Documentation: Gangmatching note missing

Open issues/missing functionalities Exploit LB ACLs setting-query via command line tools Call a WP3 monitor script from Job Wrapper Has this been discussed with the tech. coordinator ? Exploit LB extended querying capabilities UI commands for these queries Possibility to define user tag in JDL to exploit extended querying capabilities Use of FTSH in JobWrapper In DAGMan branch ? GRIS queries Issue raised at Heidelberg by applications and also in the Iteam ML Matchmaking with InputData when “file” is used as protocol