Updates on “job checkpointing and partitioning” Massimo Sgaravatto INFN Padova.

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.
Workload Management David Colling Imperial College London.
WP 1 Members of Wp1: INFN Cesnet DATAMAT PPARC. WP 1 What does WP1 do? Broker Submission mechanism JDL/JCL and other UIs Logging computational economics.
Go to
Review Generics and the ArrayList Class
Microsoft Expression Web-Illustrated Unit J: Creating Forms.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Preparation of D1.7 Massimo Sgaravatto INFN Padova.
Alessandro Italiano INFN – CNAF 26/09/2003 1/5 Status of the INFN - EDG testbeds Alessandro Italiano 7th DataGrid Conference.
Week 11 - Friday.  What did we talk about last time?  Object methods  Accessors  Mutators  Constructors  Defining classes.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto - INFN Padova Francesco Prelz – INFN Milano.
“Grey areas” of the new architecture Massimo Sgaravatto INFN Padova.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
CERN – Roberta Faggian Marque, Jan Fiete Grosse-Oetringhaus GRACE General Meeting, September 2004, Brussels 1 D6.1 Integration with the European DataGrid.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – M. Sgaravatto – n° 1 The EU DataGrid Workload Management System: towards.
M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
F.Pacini - Milan - 8 May, n° 1 Results of Meeting on Workload Manager Components Interaction DataGrid WP1 F. Pacini
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Grid checkpointing in the European DataGrid Project Alessio Gianelle – INFN Padova Rosario Peluso – INFN Padova Francesco Prelz – INFN Milano Massimo Sgaravatto.
M. Sgaravatto – n° 1 Overview of WP1 Workload Management System in EDG 2.x Massimo Sgaravatto INFN Padova - DataGrid WP1
Software Documentation Section 5.5 ALBING’s Section JIA’s Appendix B JIA’s.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
M. Sgaravatto – n° 1 Overview of release 2 of the EDG WP1 Workload Management System deployed in the INFN production Grid Massimo Sgaravatto INFN Padova.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
Self-Learning Week 11 The End (of the Project) is Near Only 1 week until submission!
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Summary of the EDG review Some info for the next future of the WP1 software Massimo Sgaravatto INFN Padova.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EU 2nd Year Review – Feb – WP1 Demo – n° 1 WP1 demo Grid “logical” checkpointing Fabrizio Pacini (Datamat SpA, WP1 )
Massimo Sgaravatto INFN Padova
Workload Management Workpackage
CEMon
First proposal for a modification of the GIS schema
JRA1 IT-CZ cluster meeting Milano, May 3-4, 2004
Special jobs with the gLite WMS
WP1 WMS release 2: status and open issues
Preview Testbed Massimo Sgaravatto – INFN Padova
Job Submission in the DataGrid Workload Management System
2.2 Defining Classes Part 2 academy.zariba.com.
WEB 407 Competitive Success/snaptutorial.com
WEB 407 Education for Service-- snaptutorial.com.
WEB 407 Teaching Effectively-- snaptutorial.com
Report on GLUE activities 5th EU-DataGRID Conference
RPM: Basic plan data entry process A step-by-step guide for Plan Leads
Future EU Grid Projects
Additional Example 2: Graphing Ordered Pairs Graph and label each point on a coordinate grid. A. L (3, 5) Start at (0, 0)
Skills Profiler - Manager 1 Page Guide
Data Structures & Algorithms
GRID Workload Management System for CMS fall production
Presentation transcript:

Updates on “job checkpointing and partitioning” Massimo Sgaravatto INFN Padova

Changes in the doc. (wrt. prev. release) Removed files from job state Defined just by pairs Not possible to move files from sub-jobs to job aggregator with job partitioning They must be saved to a SE, and their identifiers specified as pairs in their final job states LB server used to persistently save the job states Removed chkpt-server Possibility to specify pre-job (besides job aggregator) in job partitioning

Changes in the doc. (wrt. prev. release) Two new functions added to API set_final_state To specify that the state is the last one is_final_state Is this state the last one (I.e. was it “marked” using the set_final_state method ?) ? Check if all the sub-jobs have saved their final states done by the job aggregator The job aggregator responsible to decide the policy (e.g. all sub-jobs had to save their final states, at least one sub-job had to save its final state, at least x % of sub- jobs had to save their final states, ….)

APIs Object State: { // Data Members Label_t state_id = ``label''; VarValueSet var_value_pairs[] = {``var1''=``value1'', ``var2''=``value2'',... }; StepsSet main_stepper = {``element1'', ``element2'', ``element3'',... }; Label_t current_step; // Methods int save_value(Pair); int save_state(); string get_string_value(string); int get_int_value(string); double get_double_value(string); State load_state(Label_t); Label_t get_next_step(); int set_final_state(); bool is_final_state(Label_t); }

Issues Specifications of JobSteps for the job aggregator Should be the identifiers of the final states of the sub-jobs Possible approach: sub-job’s state ids represented by sub- job’s dg-job-id  Necessary to know the dg-job-id’s of the sub-jobs given the dg-job-id of the original “partitionable” job (the dg-job-id associated to the DAG) Needed also to allow dg-get-job-chkpt for a partitionable job (dg-job-id of the partitionable job given as argument) Should return the states for its various sub-jobs Avoid that all sub-jobs are submitted to the same CE Same problem also when a bunch of jobs with same Requirements and Ranks are submitted together (EstimatedTraversalTime not promptly updated)

Next steps Some time (10 days ?) for other WP1 internal comments and then submit to WP8 TWG ? Definition of architecture with much more details Coordination with other teams, in particular CESNET (LB) and CNAF (DAGMAN)