Download presentation
Presentation is loading. Please wait.
Published byBaldwin Gardner Modified over 6 years ago
1
JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004
Padova site report JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004
2
People Paolo Andreetto Stefano Borgia (part-time)
Alvise Dorigo (since mid of May 2004) Alessio Gianelle Matteo Mordacchini (PhD student) Massimo Sgaravatto Luigi Zangrando Mister (hopefully Miss) X (on-going recruitment)
3
On-going activities Getting to grips with EDG WMS code (in particular JC & LM) Paolo Rel. 3 (DAGMan) testing activities Alessio Support of existing LCG-2 code, if/when needed Alessio (shortly also Paolo) Working on the resource access problem (CE) Luigi, Massimo, Matteo, Stefano
4
Other activities for the next future
“Porting” the existing Job Submission components to EGEE CVS according to the new SCM JS reengineering Improvements on submission to LCG-2 CE Some of the improvements already identified at the last meeting Improvements on MPI support See if the outbound connectivity requirement with the LCG-2 CE can be removed … Move to submission to EGEE CE Guardian Angels of NS, WM, I-S reengineering
5
Activities concerning the CE problem
Analysis of existing technologies and tools Web service & WSRF specifications DRMAA GGF proposal as common API to different LRMS Globus GRAM Globus 3.2 GRAM doc and code analyzed Globus team already contacted to know plans and timelines (in order to plan evaluation) of new GRAM, which will be in Globus v. 4 Already got a document explaining architecture, changes wrt previous GRAM, etc. Planned phone conference to discuss these items Alien Available (very limited) documentation studied Some studying of the code Plan to install and evaluate it
6
Activities concerning the CE
Specifying requirements and expected functionality for the planned CE Specifying APIs Designing the architecture for the planned CE All these ideas collected in a work-in-progress document infnforge CVS This could be our contribution to the EGEE mw architecture document (CE section) Preliminary ideas, but we are ready to get feedbacks
7
Req.s & expected functionality
Environment In EDG a CE was a LRMS queue, which had to encompass only homogeneous WNs In particular sysadmins don’t like it For EGEE we propose a CE being a site cluster, managed by a LRMS, encompassing heterogeneous resources, where multiple LRMS queues (which usually define policies on resource usage) can exist
8
Req.s & expected functionality
Interface with the LRMS Very well specified interface with the underlying LRMS Interface with a specific LRMS implemented as pluggable module, easily replace with another one supporting another LRMS We plan to implement the interface with PBS, LSF, Condor (?) Make possible and easy to implement the interface with other LRMSs
9
Req.s & expected functionality
Network connectivity SA1 doesn’t want neither inbound nor outbound connectivity on WNs In the ARDA middleware document the Site Proxy service planned to route messages from/to WNs Who is responsible to design and implement such service ??
10
Req.s & expected functionality
Main functionality Job management Available to end-users and other Grid services (e.g. the “RB”) As a Web Service Push and pool model Architecture for pull model must be discussed and agreed in wider context (not a problem restricted to the CE) Some of the issues to be clarified When a CE should notify that is willing to receive jobs ? It could be “available” only for some kind of jobs, with some specific requirements, belonging to some specific users Who should be notified ? …
11
Req.s & expected functionality
Job management operation Submit jobs Evaluate job execution Are there matching resources for this JDL ? If so, what is the expected quality of service (e.g. the Estimated Traversal Time) ? Remove jobs Suspend/resume jobs Get job status Get job outputs Get notifications E.g. when job changes status, when job reaches a certain status, etc.
12
Req.s & expected functionality
Job types Sequential, batch jobs (as in EDG) Parallel (MPI) jobs (as in EDG) Checkpointable jobs (as in EDG) Interactive jobs (as in EDG) DAG jobs (as in EDG) DAG whose nodes have to be planned and executed within the CE Partitionable jobs (as in EDG) ? Jobs to be partitioned within the CE
13
Req.s & expected functionality
Other functionality Provision of CE characteristics and status E.g. how many and which resources are there in the CE ? How many active jobs are there ? … To be decided which information and which interface to be used APIs and/or information published to an Information Service Grid accounting sensors To report on job resource usage To be integrated with the EGEE (DGAS ?) accounting system …
14
Req.s & expected functionality
Security (Authentication & Authorization) Not too clear what JRA3 is going to provide Recommendations ? Software ? …
15
Req.s & expected functionality
Need to talk with other “Grid systems” Which one ? GRAM, Condor-G, … At which level should these interfaces be implemented ? Should these Grid systems be considered as LRMS ? We have been suggested to consider the interface at an higher level EGEE CE able to “understand” GRAM SOAP messages, Condor-G SOAP messages, etc. and able to speak these protocols Need to understand if this is feasible (not only from a technical point of view)
16
CE Architecture CE JC JM WNs WEB WEB LSF PBS ? Client jobAssess
A client could: 1) ask the CE whether a job could be executed and what is the expected QoS (e.g. ETT) 2) submit a job 3) query the CE to get its characteristics and status (and/or this info should be published to an IS ?) Client JDL jobAssess jobSubmit The CE matches the job req. against the resources available and computes the expected QoS QoS WEB WEB CE JC UC JM getWN insertWN deleteWN updateWN getUC createUC deleteUC updateUC DRMAA?? getUC updateUC WN UC LSF PBS ? WNs
17
CE Architecture CE JC JM WNs WEB WEB LSF PBS ? Client jobKill
A client could: 1) ask the CE whether a job could be executed and what is the expected QoS (e.g. ETT) 2) submit a job 3) query the CE to get its characteristics and status (and/or this info should be published to an IS ?) Client JDL jobKill jobSuspend jobResume jobGetStatus jobGetOutput jobSignal jobMonitorSub jobAssess jobSubmit notify The CE checks if the client has already an UserContext. Create/Update the UC JC URL WEB Job status WEB CE submit UC JC JM JDL job getWN insertWN deleteWN updateWN getUC createUC deleteUC updateUC DRMAA?? getUC updateUC WN UC LSF PBS ? WNs
18
API specification jobAssess jobSubmit jobSuspend / jobResume jobList
jobKill jobGetStatus / jobGetAllStatus jobGetOutput jobMonitorSub jobSignal
19
API specification jobAssess jobSubmit
Description: Checks whether the job specified in the JDL could be run in the CE. It matches the job requirements against the available resources. If the job is effectively runnable on the worker nodes of the CE, it provides an estimation of the exptected QoS (e.g. waiting time in the local queue before the job can be runned). jobSubmit Description: Submit the job specified in the JDL to the CE.
20
API specification jobSuspend jobResume jobKill jobList
Description: Allows to suspend the execution of the specified job(s) or to hold the job(s) in the local queue. jobResume Description: Allows to resume the execution of the specified job(s) or to release the job(s) in the local queue. jobKill Description: Allows to kill one or more jobs. jobList Description: Retrieves the list of the jobIDs submitted by the user.
21
API specification jobGetOutput jobGetStatus jobSignal jobMonitorSub
Description: Allows the user to retrieve the final results of the execution of the specified job(s). jobGetStatus Description: Retrieves the status of the specified job(s). jobSignal Description: sends a signal to the specified job(s). jobMonitorSub Description: Allows the user to subscribe to the asyncronous notification system (JM) of the CE (e.g. To be notified about job status chenges)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.