Download presentation
Presentation is loading. Please wait.
Published byAbel May Modified over 8 years ago
1
Updates on “job checkpointing and partitioning” Massimo Sgaravatto INFN Padova
2
Changes in the doc. (wrt. prev. release) Removed files from job state Defined just by pairs Not possible to move files from sub-jobs to job aggregator with job partitioning They must be saved to a SE, and their identifiers specified as pairs in their final job states LB server used to persistently save the job states Removed chkpt-server Possibility to specify pre-job (besides job aggregator) in job partitioning
3
Changes in the doc. (wrt. prev. release) Two new functions added to API set_final_state To specify that the state is the last one is_final_state Is this state the last one (I.e. was it “marked” using the set_final_state method ?) ? Check if all the sub-jobs have saved their final states done by the job aggregator The job aggregator responsible to decide the policy (e.g. all sub-jobs had to save their final states, at least one sub-job had to save its final state, at least x % of sub- jobs had to save their final states, ….)
4
APIs Object State: { // Data Members Label_t state_id = ``label''; VarValueSet var_value_pairs[] = {``var1''=``value1'', ``var2''=``value2'',... }; StepsSet main_stepper = {``element1'', ``element2'', ``element3'',... }; Label_t current_step; // Methods int save_value(Pair); int save_state(); string get_string_value(string); int get_int_value(string); double get_double_value(string); State load_state(Label_t); Label_t get_next_step(); int set_final_state(); bool is_final_state(Label_t); }
5
Issues Specifications of JobSteps for the job aggregator Should be the identifiers of the final states of the sub-jobs Possible approach: sub-job’s state ids represented by sub- job’s dg-job-id Necessary to know the dg-job-id’s of the sub-jobs given the dg-job-id of the original “partitionable” job (the dg-job-id associated to the DAG) Needed also to allow dg-get-job-chkpt for a partitionable job (dg-job-id of the partitionable job given as argument) Should return the states for its various sub-jobs Avoid that all sub-jobs are submitted to the same CE Same problem also when a bunch of jobs with same Requirements and Ranks are submitted together (EstimatedTraversalTime not promptly updated)
6
Next steps Some time (10 days ?) for other WP1 internal comments and then submit to WP8 TWG ? Definition of architecture with much more details Coordination with other teams, in particular CESNET (LB) and CNAF (DAGMAN)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.