Download presentation
Presentation is loading. Please wait.
Published byCecilia Peters Modified over 9 years ago
1
M. Sgaravatto – n° 1 Overview of release 2 of the EDG WP1 Workload Management System deployed in the INFN production Grid Massimo Sgaravatto INFN Padova - DataGrid WP1 massimo.sgaravatto@pd.infn.it http://presentation.address
2
M. Sgaravatto – n° 2 WP1 Workload Management System u Working Workload Management System prototype implemented by WP1 in the first phase of the EDG project n Software released in September 2001 n One of the very few components in the first release of EDG software u Application users have experienced for a while with this first release of the workload management system n Stress tests and semi-production activities (e.g. CMS stress tests, Atlas efforts) n Significant achievements exploited by the experiments but also various problems were spotted s Impacting in particular the reliability and scalability of the system
3
M. Sgaravatto – n° 3 Review of WP1 WMS architecture u WP1 WMS architecture reviewed n To apply the “lessons” learned and addressing the shortcomings emerged with the first release of the software, in particular s To increase the reliability problems s To address the scalability problems n To support new functionalities n To favor interoperability with other Grid frameworks, by allowing exploiting WP1 modules (e.g. RB) also “outside” the EDG WMS
4
M. Sgaravatto – n° 4 WMS Revised Architecture UI RLS Inform. Service Network Server Job Contr. - CondorG Workload Manager RB node CE characts & status SE characts & status RB storage Match- Maker/ Broker Job Adapter Log Monitor Logging & Bookkeeping WP 1
5
M. Sgaravatto – n° 5 Job submission example UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status
6
Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ && other.GlueCEPolicyMaxWallClockTime > 10000; Rank = other.GlueCEStateFreeCPUs; submitted Job Status UI: allows users to access the functionalities of the WMS Job Description Language (JDL) to specify job characteristics and requirements
7
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Input Sandbox files Job waiting submitted Job Status NS: network daemon responsible for accepting incoming requests
8
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status WM: responsible to take the appropriate actions to satisfy the request Job
9
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Where must this job be executed ?
10
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Matchmaker: responsible to find the “best” CE where to submit a job
11
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Where are (which SEs) the needed data ? What is the status of the Grid ?
12
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker CE choice
13
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Job Adapter JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.)
14
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status JC: responsible for the actual job management operations (done via CondorG) Job submitted waiting ready
15
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status Job Input Sandbox files submitted waiting ready scheduled
16
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node RB storage Job Status Input Sandbox submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job
17
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done
18
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox submitted waiting ready scheduled running done edg-job-get-output
19
Job submission UI Network Server Job Contr. - CondorG Workload Manager RLS Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done cleared
20
M. Sgaravatto – n° 20 Job monitoring UI Log Monitor Logging & Bookkeeping Network Server Job Contr. - CondorG Workload Manager Computing Element RB node LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB LB: receives and stores job events; processes corresponding job status Log of job events edg-job-status edg-job-get-logging-info Job status
21
M. Sgaravatto – n° 21 Interoperability with other services u Information services n Queried by the RB to see the status of the Grid (characteristics and status of CEs and SEs) n WMS able to work with: s Globus MDS based information services (as in LCG and INFN-GRID testbeds) s RGMA-GOUT (as in EDG testbed) n Evaluating direct (i.e. no via GOUT) interoperability to RGMA u Replica Location Service n Queried by the RB to see where the specified data are physically available (in which SEs) n WMS able to work with EDG-WP2 RLS n On-going plans to work also with US RLS u VOMS n Used for VO based security
22
M. Sgaravatto – n° 22 Deployment of WMS services RLS One or more for each RB Usually deployed in the “RB node” (but not required) One for each RB “Community” (VO) RB or “Personal” RB Queue of a LRMS (LSF, PBS) Submitting machine (UI) ”RB node” LB server II/ GOUT CE SE Usually one per VO NS, WM, JC, LM, PR MyProxy server Possibility to submit to more than one RBs from a single UI Used for proxy renewal VOMS Used for VO based security
23
M. Sgaravatto – n° 23 WMS release 2: new functionalities u User APIs u GUI u Support for interactive jobs u Job checkpointing u Support for parallel jobs u Gangmatching u Support for automatic output data upload and registration u VOMS support u …
24
M. Sgaravatto – n° 24 GUI & APIs
25
M. Sgaravatto – n° 25 Interactive jobs u Specified setting JobType = “Interactive” in JDL u When an interactive job is executed, a window for the stdin, stdout, stderr streams is opened n Possibility to send the stdin to the job n Possibility the have the stderr and stdout of the job when it is running u Possibility to start a window for the standard streams for a previously submitted interactive job with command edg-job-attach
26
M. Sgaravatto – n° 26 Job checkpointing u Checkpointing: saving from time to time job state n Useful to prevent data loss, due to unexpected failures n Approach: provide users with a “trivial” logical job checkpointing service n User can save from time to time the state of the job (defined by the application) n A job can be restarted from an intermediate (i.e. “previously” saved) job state u Different than “classical checkpointing (i.e. saving all the information related to a process: process’s data and stack segments, open files, etc.) n Very difficult to apply (e.g. problems to save the state of open network connections) n Not necessary for many applications u To submit a checkpointable job n Code must be instrumented (see next slides) n JobType=Checkpointable to be specified in JDL
27
M. Sgaravatto – n° 27 Job checkpointing example int main () { … for (int i=event; i < EVMAX; i++) { ;}... exit(0); } Example of Application (e.g. HEP MonteCarlo simulation)
28
M. Sgaravatto – n° 28 Job checkpointing example #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); ; … for (int i=event; i < EVMAX; i++) { ;... state.saveValue("first_event", i+1); ; state.saveValue("filename", PFN of file_on_SE);... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } User code must be easily instrumented in order to exploit the checkpointing framework …
29
M. Sgaravatto – n° 29 Job checkpointing example #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); ; … for (int i=event; i < EVMAX; i++) { ;... state.saveValue("first_event", i+1); ; state.saveValue("filename", PFN of file_on_SE);... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } User defines what is a state Defined as pairs Must be “enough” to restart a computation from a previously saved state
30
M. Sgaravatto – n° 30 Job checkpointing example #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); ; … for (int i=event; i < EVMAX; i++) { ;... state.saveValue("first_event", i+1); ; state.saveValue("filename", PFN of file_on_SE);... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } User can save from time to time the state of the job
31
M. Sgaravatto – n° 31 Job checkpointing example #include "checkpointing.h" int main () { JobState state(JobState::job); event = state.getIntValue("first_event"); PFN_of_file_on_SE = state.getStringValue("filename"); …. var_n = state.getBoolValue("var_n"); ; … for (int i=event; i < EVMAX; i++) { ;... state.saveValue("first_event", i+1); ; state.saveValue("filename", PFN of file_on_SE);... state.saveValue("var_n", value_n); state.saveState(); } … exit(0); } Retrieval of the last saved state The job can restart from that point
32
M. Sgaravatto – n° 32 Job checkpointing scenarios u Scenario 1 n Job submitted to a CE n When job runs it saves from time to time its state n Job failure, due to a Grid problem (e.g. CE problem) n Job resubmitted by the WMS possibly to a different CE n Job restarts its computation from the last saved state s No need to restart from the beginning s The computation done till that moment is not lost u Scenario 2 n Job failure, but not detected by the Grid middleware n User can retrieve a saved state for the job (typically the last one) s edg-job-get-chkpt –o n User resubmits the job, specifying that the job must start from a specific (the retrieved one) initial state s edg-job-submit –chkpt
33
M. Sgaravatto – n° 33 Submission of parallel jobs u Possibility to submit MPI jobs u MPICH implementation supported u Only parallel jobs inside a single CE can be submitted u Submission of parallel jobs very similar to normal jobs n Just needed to specify in the JDL: s JobType= “MPICH” s NodeNumber = n; n The number (n) of requested CPUs u Matchmaking n CE chosen by RB has to have MPICH sw installed, and at least n total CPUs n If there are two or more CEs satisfying all the requirements, the one with the highest number of free CPUs is chosen
34
M. Sgaravatto – n° 34 Gangmatching u With “standard” matchmaking only 2 “involved entities” the job and the CE u Gangmatching allows to take into account, besides CE information, also SE information in the matchmaking u Typical use case for gangmatching: n My job has to run on a CE close to a SE with at least 200 MB of available space: Requirements = anyMatch(other.storage.CloseSEs, target.GlueSAStateAvailableSpace > 200);
35
M. Sgaravatto – n° 35 Other new functionalities u VOMS support n VO taken from VOMS user proxy n Matchmaking performed wrt VO s Not necessary to publish anymore in the information service the list of authorized users (only list of authorized VOs needed) n In any case WMS works also with non-VOMS proxies u Compliance with Glue Schema n Common Information Service schema between US and EU HENP Grid Projects u LB ACLs n Allow setting who can query the status of a given job u Output data upload and registration n Possibility to trigger (via JDL) output data upload into a SE and registration in Replica Location Service (RLS)
36
M. Sgaravatto – n° 36 Output data registration OutputData = { [ OutputFile = "filename1"; LogicalFileName = "lfn:mylfn1"; StorageElement = "testbed007.cnaf.infn.it" ], [ OutputFile = "filename2" ], [ OutputFile = "filename3"; LogicalFileName = "lfn:mylfn2" ], [ OutputFile = "filename4"; StorageElement = "testbed007.cnaf.infn.it" ] } Both LFN and target SE specified Nor LFN nor target SE specified Only LFN specified Only target SE specified
37
M. Sgaravatto – n° 37 Status u WMS release 2 being used and evaluated by applications n Deployed in LCG testbed n Deployed in EDG testbeds n Being deployed-customized-exploited in CrossGrid testbed n Deployed in INFN-GRID testbeds s INFN-GRID development testbed used to test new stuff u So far very good feedbacks n Users are reporting great improvement wrt release 1, in particular for what concerns reliability n Also quite happy about the level of support (e.g. bug fixes) that we are able to provide
38
M. Sgaravatto – n° 38 LCG 1.0 Test (19./20. Sept. 2003): 5 streams 5000 jobs in total Input and OutputSandbox Brokerinfo query 30 sec sleep Slide presented at the latest EDG workshop by Markus Schulz (LCG)
39
M. Sgaravatto – n° 39 Future activities u Support and bug fixes u Working on new functionalities n Dependencies of jobs s Integration of Condor DAGMan s “Lazy” scheduling: job (node) bound to a resource (by RB) just before that job can be submitted (i.e. when it is free of dependencies) n Support for job partitioning s Use of job checkpointing and DAGMan mechanisms n Original job partitioned in sub-jobs which can be executed in parallel n At the end each sub-job must save a final state, then retrieved by a job aggregator, responsible to collect the results of the sub-jobs and produce the overall output n Grid Accounting s Based upon a computational economy model n Users pay in order to execute their jobs on the resources and the owner of the resources earn credits by executing the user jobs n To take account of resource usage n And to make possible a nearly stable equilibrium able to satisfy the needs of both resource `producers' and `consumers‘ u Getting ready for EGEE …
40
M. Sgaravatto – n° 40 Conclusions u Revised WMS architecture n To address emerged shortcomings n To support new functionalities u Deployed in various testbeds (in particular by LCG, our main customer) n Very good feedbacks u Working on some new functionalities to be shown at the last EDG review and in case going to be exploited by LCG (e.g. DAGMan) u More info n http://www.infn.it/workload-grid
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.