JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
CREAM-CE status and evolution plans Paolo Andreetto, Sara Bertocco, Alvise Dorigo, Eric Frizziero, Alessio Gianelle, Massimo Sgaravatto, Lisa Zangrando.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
Grid Workload Management Massimo Sgaravatto INFN Padova.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
LCG workshop on Operational Issues CERN November, EGEE CIC activities (SA1) Accounting: current status
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
II EGEE conference Den Haag November, ROC-CIC status in Italy
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Report from the PTF Fabrizio Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
CE design report Luigi Zangrando
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Report from.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI VT report OMB Meeting 28 th February 2012.
INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Resource access in the EGEE project Massimo Sgaravatto INFN Padova
Workload Management Workpackage
Grid2Win Porting of gLite middleware to Windows XP platform
CEMon
Grid Computing: Running your Jobs around the World
First proposal for a modification of the GIS schema
OGF PGI – EDGI Security Use Case and Requirements
CE-Monitor Luigi Zangrando INFN-Padova
Design rationale and status of the org.glite.overlay component
WP1 WMS release 2: status and open issues
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Workload Management System ( WMS )
Preview Testbed Massimo Sgaravatto – INFN Padova
Joint JRA1/JRA3/NA4 session
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Accounting at the T1/T2 Sites of the Italian Grid
The CREAM CE: When can the LCG-CE be replaced?
Grid2Win: Porting of gLite middleware to Windows XP platform
Job Submission in the DataGrid Workload Management System
Introduction to Grid Technology
Workload Management System
Outline Introduction Objectives Motivation Expected Output
Building Grids with Condor
TCG Discussion on CE Strategy & SL4 Move
Interoperability & Standards
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Future EU Grid Projects
Wide Area Workload Management Work Package DATAGRID project
GRID Workload Management System for CMS fall production
Servizi di Grid e impatto sulla rete
Presentation transcript:

JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004 Padova site report JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004

People Paolo Andreetto Stefano Borgia (part-time) Alvise Dorigo (since mid of May 2004) Alessio Gianelle Matteo Mordacchini (PhD student) Massimo Sgaravatto Luigi Zangrando Mister (hopefully Miss) X (on-going recruitment)

On-going activities Getting to grips with EDG WMS code (in particular JC & LM) Paolo Rel. 3 (DAGMan) testing activities Alessio Support of existing LCG-2 code, if/when needed  Alessio (shortly also Paolo) Working on the resource access problem (CE)  Luigi, Massimo, Matteo, Stefano

Other activities for the next future “Porting” the existing Job Submission components to EGEE CVS according to the new SCM JS reengineering Improvements on submission to LCG-2 CE Some of the improvements already identified at the last meeting Improvements on MPI support See if the outbound connectivity requirement with the LCG-2 CE can be removed … Move to submission to EGEE CE Guardian Angels of NS, WM, I-S reengineering

Activities concerning the CE problem Analysis of existing technologies and tools Web service & WSRF specifications DRMAA GGF proposal as common API to different LRMS Globus GRAM Globus 3.2 GRAM doc and code analyzed Globus team already contacted to know plans and timelines (in order to plan evaluation) of new GRAM, which will be in Globus v. 4 Already got a document explaining architecture, changes wrt previous GRAM, etc. Planned phone conference to discuss these items Alien Available (very limited) documentation studied Some studying of the code Plan to install and evaluate it

Activities concerning the CE Specifying requirements and expected functionality for the planned CE Specifying APIs Designing the architecture for the planned CE All these ideas collected in a work-in-progress document /workload/papers/CE @ infnforge CVS http://www.pd.infn.it/grid/jra1/CE/ce.ps This could be our contribution to the EGEE mw architecture document (CE section) Preliminary ideas, but we are ready to get feedbacks

Req.s & expected functionality Environment In EDG a CE was a LRMS queue, which had to encompass only homogeneous WNs In particular sysadmins don’t like it For EGEE we propose a CE being a site cluster, managed by a LRMS, encompassing heterogeneous resources, where multiple LRMS queues (which usually define policies on resource usage) can exist

Req.s & expected functionality Interface with the LRMS Very well specified interface with the underlying LRMS Interface with a specific LRMS implemented as pluggable module, easily replace with another one supporting another LRMS We plan to implement the interface with PBS, LSF, Condor (?) Make possible and easy to implement the interface with other LRMSs

Req.s & expected functionality Network connectivity SA1 doesn’t want neither inbound nor outbound connectivity on WNs In the ARDA middleware document the Site Proxy service planned to route messages from/to WNs Who is responsible to design and implement such service ??

Req.s & expected functionality Main functionality Job management Available to end-users and other Grid services (e.g. the “RB”) As a Web Service Push and pool model Architecture for pull model must be discussed and agreed in wider context (not a problem restricted to the CE) Some of the issues to be clarified When a CE should notify that is willing to receive jobs ? It could be “available” only for some kind of jobs, with some specific requirements, belonging to some specific users Who should be notified ? …

Req.s & expected functionality Job management operation Submit jobs Evaluate job execution Are there matching resources for this JDL ? If so, what is the expected quality of service (e.g. the Estimated Traversal Time) ? Remove jobs Suspend/resume jobs Get job status Get job outputs Get notifications E.g. when job changes status, when job reaches a certain status, etc.

Req.s & expected functionality Job types Sequential, batch jobs (as in EDG) Parallel (MPI) jobs (as in EDG) Checkpointable jobs (as in EDG) Interactive jobs (as in EDG) DAG jobs (as in EDG) DAG whose nodes have to be planned and executed within the CE Partitionable jobs (as in EDG) ? Jobs to be partitioned within the CE

Req.s & expected functionality Other functionality Provision of CE characteristics and status E.g. how many and which resources are there in the CE ? How many active jobs are there ? … To be decided which information and which interface to be used APIs and/or information published to an Information Service Grid accounting sensors To report on job resource usage To be integrated with the EGEE (DGAS ?) accounting system …

Req.s & expected functionality Security (Authentication & Authorization) Not too clear what JRA3 is going to provide Recommendations ? Software ? …

Req.s & expected functionality Need to talk with other “Grid systems” Which one ? GRAM, Condor-G, … At which level should these interfaces be implemented ? Should these Grid systems be considered as LRMS ? We have been suggested to consider the interface at an higher level EGEE CE able to “understand” GRAM SOAP messages, Condor-G SOAP messages, etc. and able to speak these protocols Need to understand if this is feasible (not only from a technical point of view)

CE Architecture CE JC JM WNs WEB WEB LSF PBS ? Client jobAssess A client could: 1) ask the CE whether a job could be executed and what is the expected QoS (e.g. ETT) 2) submit a job 3) query the CE to get its characteristics and status (and/or this info should be published to an IS ?) Client JDL jobAssess jobSubmit The CE matches the job req. against the resources available and computes the expected QoS QoS WEB WEB CE JC UC JM getWN insertWN deleteWN updateWN getUC createUC deleteUC updateUC DRMAA?? getUC updateUC WN UC LSF PBS ? WNs

CE Architecture CE JC JM WNs WEB WEB LSF PBS ? Client jobKill A client could: 1) ask the CE whether a job could be executed and what is the expected QoS (e.g. ETT) 2) submit a job 3) query the CE to get its characteristics and status (and/or this info should be published to an IS ?) Client JDL jobKill jobSuspend jobResume jobGetStatus jobGetOutput jobSignal jobMonitorSub jobAssess jobSubmit notify The CE checks if the client has already an UserContext. Create/Update the UC JC URL WEB Job status WEB CE submit UC JC JM JDL job getWN insertWN deleteWN updateWN getUC createUC deleteUC updateUC DRMAA?? getUC updateUC WN UC LSF PBS ? WNs

API specification jobAssess jobSubmit jobSuspend / jobResume jobList jobKill jobGetStatus / jobGetAllStatus jobGetOutput jobMonitorSub jobSignal

API specification jobAssess jobSubmit Description: Checks whether the job specified in the JDL could be run in the CE. It matches the job requirements against the available resources. If the job is effectively runnable on the worker nodes of the CE, it provides an estimation of the exptected QoS (e.g. waiting time in the local queue before the job can be runned). jobSubmit Description: Submit the job specified in the JDL to the CE.

API specification jobSuspend jobResume jobKill jobList Description: Allows to suspend the execution of the specified job(s) or to hold the job(s) in the local queue. jobResume Description: Allows to resume the execution of the specified job(s) or to release the job(s) in the local queue. jobKill Description: Allows to kill one or more jobs. jobList Description: Retrieves the list of the jobIDs submitted by the user.

API specification jobGetOutput jobGetStatus jobSignal jobMonitorSub Description: Allows the user to retrieve the final results of the execution of the specified job(s). jobGetStatus Description: Retrieves the status of the specified job(s). jobSignal Description: sends a signal to the specified job(s). jobMonitorSub Description: Allows the user to subscribe to the asyncronous notification system (JM) of the CE (e.g. To be notified about job status chenges)