“Grey areas” of the new architecture Massimo Sgaravatto INFN Padova.

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

INFN & Globus activities Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
Workload Management David Colling Imperial College London.
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
Alessandro Italiano INFN – CNAF 26/09/2003 1/5 Status of the INFN - EDG testbeds Alessandro Italiano 7th DataGrid Conference.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
U-Mail System Design Specification Joseph Woo, Chris Hacking, Alex Benson, Elliott Conant, Alex Meng, Michael Ratanapintha April 28,
FINAL DEMO Apollo Crew, group 3 T SW Development Project.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto - INFN Padova Francesco Prelz – INFN Milano.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Olof Bärring – WP4 summary- 6/3/ n° 1 Partner Logo WP4 report Status, issues and plans
1 G4MICE Design Iteration Malcolm Ellis MICE Video Conference 21 st April 2004.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – M. Sgaravatto – n° 1 The EU DataGrid Workload Management System: towards.
M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 WP1 activity, achievements and plans.
F.Pacini - Milan - 8 May, n° 1 Results of Meeting on Workload Manager Components Interaction DataGrid WP1 F. Pacini
Grid Workload Management Massimo Sgaravatto INFN Padova.
Grid checkpointing in the European DataGrid Project Alessio Gianelle – INFN Padova Rosario Peluso – INFN Padova Francesco Prelz – INFN Milano Massimo Sgaravatto.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Budapest, September 5th, 2002 DataGrid Accounting System DGAS Current status & plans Stefano Barale INFN Budapest, September.
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
M. Sgaravatto – n° 1 Overview of release 2 of the EDG WP1 Workload Management System deployed in the INFN production Grid Massimo Sgaravatto INFN Padova.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Storage Accounting for Grid Environments Fabio Scibilia INFN - Catania.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
Observing the Current System Benefits Can see how the system actually works in practice Can ask people to explain what they are doing – to gain a clear.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Summary of the EDG review Some info for the next future of the WP1 software Massimo Sgaravatto INFN Padova.
Accounting in DataGrid HLR software demo Andrea Guarise Milano, September 11, 2001.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
B.Jones– July n° 1 Software Release Testing u Draft document prepared and attached to the agenda page for this meeting u Explains the testing steps.
Plans for D7.7 The Security Report on the Final Project Release Linda Cornwall, RAL.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
CERN 1 DataGrid Architecture Group Bob Jones CERN.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Datamat Status Report F. Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
Feature Overview Oracle Explorer – browse and alter schema Wizards and Designers Automatic code generation PL/SQL Editor with IntelliSense Oracle Data.
EU 2nd Year Review – Feb – WP1 Demo – n° 1 WP1 demo Grid “logical” checkpointing Fabrizio Pacini (Datamat SpA, WP1 )
Updates on “job checkpointing and partitioning” Massimo Sgaravatto INFN Padova.
Workload Management Workpackage
CEMon
WP1 WMS release 2: status and open issues
Workload Management System ( WMS )
WP1 activity, achievements and plans
Report on GLUE activities 5th EU-DataGRID Conference
a middleware implementation
I Datagrid Workshop- Marseille C.Vistoli
Project Iterations.
Presentation transcript:

“Grey areas” of the new architecture Massimo Sgaravatto INFN Padova

Issues Many topics reported in D1.4 were not deeply discussed Some were NEVER discussed Not sure if there is a general consensus on what has been written (Hope so) In any case D1.4 too vague Ok for a “high level architecture document such as D1.4 Not enough in my opinion to describe in details how the whole system will work and how the whole stuff must be reorganized/implemented Not all components are in the picture (e.g. the Grid Accounting components)

Examples of areas that must be clarified Reservation and co-allocation How a reservation/co-allocation is used by a job Where and how a status of a reservation/co-allocation is kept ? LB ? Interfaces with GARA Interfaces with LB Which components push events to LB ? Which events are pushed to LB ? “Collection” jobs (e.g. jobs belonging to a same DAG) LB API needed for job checkpointing Which are the events that the Workload Manager can be notified by the Log Monitor, and what is the expected actions ? A job is submitted to CondorG when a suitable resource has been found, or is it immediately inserted into CondorG queue on hold, and then released when a suitable resource is found ? …

What is needed (in my opinion) Necessary to define much more clearly and in much more details the whole architecture Needed to define, considering the various use cases (the various commands and the various events which could occur) the exact functionalities provided by these components and the interfaces between these components Necessary to define clear responsibilities for the various components This must be done NOW if we want to rely on the new architecture by release 2.0

Responsabilities User Interface: Datamat Network Server: Catania (recycle some existing code of RB ?) Protocol: Catania (recycle some existing code of RB ?) Workload Manager: CNAF (recycle some existing code of RB ?) Reservation Agent: CNAF Co-Allocation Agent: CNAF Resource Broker (MatchMaker): Catania Partitioner: Padova Helper: Francesco G. Job Adapter: CNAF(recycle some existing code of jobwrapper) JSS object (Padova) Log Monitor: Padova (evolution of JSSparser) Logging & Bookkeeping: CESNET Integration with DAGMan: CNAF Grid Accounting components: Torino Interactive jobs support integration

Proposed schedule Today: define responsibilities for the various modules Today: define which functionalities can be realistically be in place (and tested) for release 2.0 (~8 working weeks till the end of September) Planned new functionalities (release 1.4 and 2.0): Support for interactive jobs: Support for job dependencies Integration with WP2 query optimization service Java API (if needed by applications) GUI Advance reservation API Deployment of Accounting infrastructure over Testbed (HLRs with command line interface) Support for logical trivial job check-pointing Support for job partitioning Full integration of cost estimation/accounting into scheduling policies Integration of advance reservation/co-allocation in to Resource Broker RB relying on the new IS Glue Schema Today and next days: identify which other components are missing in the picture and plug them in the picture (only Grid Accounting stuff ?)

Proposed schedule (Chat) meetings to discuss in more details the functionalities of the various components and the interfaces between them Start considering existing functionalities and then considering, one by one, the new functionalities that will be in place for release 2.0 Starting this Wednesday (“real” meeting between few partners) Date ??: New CVS in place Date ??: Start implementation relying on the new CVS September 2-5: EDG Workshop in Budapest September 9: start hands-on meeting September 30: release 2.0

Mail from Bob Jones … … Reflecting on what we discussed and taking into account to the opinions of several of you, I think we should be more realistic and assume there will only be at most one more EDG release after 1.2 that is deployed on the production testbed in The SC2002 et al. demos for November should be prepared based on release 1.2 Obviously the development and certification testbeds will be more advanced. For the EU review at the start of 2003, I think we could imagine providing demos of what is currently possible on the production testbed (i.e. reuse the SC2002 et al. demos) and also show them the latest features of the development or certification testbeds.

Mail from Bob Jones Mware sw scheduling info: Please look at the software release plan ( and, for each item for your WP listed in release 1.2, 1.3, 1.4 & 2.0 tell me: Delivery date: When you expect it to be delivered Note1 : If it is already included in release 1.2 then just say "1.2" Note 2: "delivered" means documented and tested (REALLY!) Effort Required: State how much effort is required to make the delivery (remember: documented & tested). Please specify in (wo)man weeks. Identify who will perform the work (i.e. specify the names and how many weeks of work they do each) Note 1: please check with the people concerned that your information is correct and that they can schedule the estimated time (i.e. they are not over committed with other tasks, on holiday for that period etc.) Dependencies List other sw not already included in release 1.2 that it depends on (both in your WP and any other) GLUE schema: please be sure to include details of the work on the information providers/consumers (including their current status). In general I prefer you to be pessimistic rather than optimistic about your dateshttp://edms.cern.ch/document/333297

Software release plan ItemExpected Release date Involved people Estimated effort Required Dependencies

WP1 Software release plan Item Expected Release date Involved teams Estimated effort Required Dependencies C++ API1.3Datamat Support for MPICH jobs1.3Padova Improving error reporting 1.3Datamat, Catania Support for interactive jobs 1.4Milano Job dependencies1.4CNAFCondor team? Integration with WP2 Query Optim. Service 1.4CataniaWP2 Query Opt. Service

WP1 Software release plan ItemExpected Release date Involved teams Estimated effort Required Dependencies Java API (if needed)1.4Datamat GUI1.4Datamat  Deployment of Accounting infrast. over Testbed (HLRs with command line interface) 1.4TorinoWP4? Advance reservation API1.4CNAF

WP1 Software release plan Item Expected Release date Involve d teams Estimated effort Required Dependencies RB relying on the Glue schema 1.4CataniaSchema and DIT defined WP4 (inf. pr.) Job checkpointing2.0Pd, Ces. LB Job partitioning2.0PadovaJob checkp., job depend. Full integration of cost estimation/accounting into scheduling policies 2.0Catania, Torino Integration of advance res./co-all. in to RB 2.0Catania, CNAF

My personal ideas Deliver new 1.2 RPMs as requested JSS problems + fixes for outstanding issues with autotools (if any) No new 1.3 RPMs To avoid to be asked to support 1.3 (as it happened with 1.2) and therefore not being able to implement the new stuff Deliver 2.0 RPMs (but with less functionalities as original planned)

WP1 Sw rel. plan (my prop.) Item Expected Release date Involved teams Estimated effort Required Dependencies C++ API1.3  2.0SM, MP (CT) Datamat (FP, AM), CESNet (AK), Pd (RP) 3 person week Support for MPICH jobs1.3  2.0Padova (AG)½ person week Improving error reporting and communication from UI 1.3  2.0Datamat (FP, AM), Catania (SM, MP) 2 person week Support for interactive jobs1.4  2.0Mi (MM), CNAF (ER) Datamat (FP, AM) 3 person week Job dependencies1.4  2.0CNAF (FG, ER), Cesnet (all), Datamat (FP, AM) 16 person week Integration with WP2 Query Optim. Service 1.4  2.0Catania (SM, MP) 1 person week WP2 Query Opt. Service

WP1 Sw rel. plan (my prop.) ItemExpected Release date Involved teams Estimated effort Required Dependencies Java API + GUI1.4  2.0Datamat (GA) 6 person week Deployment of Accounting infrast. over Testbed (HLRs with command line interface) 1.4  2.0Torino (AG, SB) 8 person week WP4 Advance reservation API1.4  2.0CNAF (FG, ER, SF) 2 person week

WP1 Sw rel. plan (my prop.) Item Expected Release date Involve d teams Estimated effort Required Dependencies RB relying on the Glue schema 1.4  2.0Catania (SM, MP) 2 person week Schema and DIT defined WP4 (inf. pr.) Job checkpointing2.0Pd (AG, RP), Ces. (MM) 6 person week LB Job partitioning2.0  after 2.0 Padova (AG, RP) 4 person week Job checkp., job depend. Full integration of price estimation/accounting into scheduling policies 2.0  after 2.0 Catania (SM, MP), Torino (SB, AG) 8 person week Integration of advance res./co-all. in to RB 2.0  after 2.0 Catania (SM, MP), CNAF (ER, SF, FG) 12 pers. week WP4, WP5, WP7