Pseudo dynamic DAG control Version 1
Outline Goal Solution Restrictions Example Case Study
Goal The user should be able to redirect the control in his workflow upon the outcome of any job. The exit value set by the executable of the user’s job determines weather a job fails or succeeds The failed job therefore should not stop the overall operation, and no real subsequent computational activity must be started in the branch proved to be false. The solution should be a clean “user level” one not requesting any change in the P-GRADE Portal middleware
The Solution The solution is the introduction of a suggested Job template where the frame of the job is standardized.
Solution details Enveloping the executable of the original job in a standard wrapper program which terminates as TRUE. The wrapper program is written in C and downloadable as lyes/PseudoDynamicDAGControl/wrapper.exe Adding standard logical Input / Output channels to the wrapped job to control the flow
Restriction of the solution The solution handles only internal (programmed) job failures Failures due to the environment (resource, authentication and communication problems) are recognized by the DAGMAN and can be handled by the Rescue feature of the P-GRADE Portal
EXECUTABLE_INPUT port Original input files Modified job: If(LOG_INPUT.value) LOG_INPUT.value= Executable().exit; TRUE_OUTPUT.value = LOG_INPUT.value; FALSE_OUTPUT.value = ! LOG_OUTPUT.value; Original output files I/O convention for Job Wrapper Extension of an original job LOG_INPUT port TRUE_OUTPUT port FALSE_OUTPUT port Original job: Executable();
InputData LOG_INPUT F_OUTPUTT_OUTPUT OutputData Fake Output gen. T_OUTPUT LOG_INPUT execute InputData LOG_INPUT F_OUTPUTT_OUTPUT OutputData LOG_INPUT execute InputData LOG_INPUT F_OUTPUTT_OUTPUT OutputData LOG_INPUT Fake Output gen. T_OUTPUTF_OUTPUT T_OUTPUT Non Zero (false) exit value on “execute” activates the subsequent jobs connected to the F(ALSE)_OUTPUT FALSE value on LOG_INPUT activates the subsequent jobs connected to the F(ALSE)_OUTPUT Real Output data will be forwarded only if the user job “execute” succeeds Animation of wrapper job operation Zero (true) exit value on “execute” activates the subsequent jobs connected to the T(RUE)_OUTPUT “execute” may return false or true exit value TRUE on Logical input triggers the execution of the program of the user Possible states I II III Token with value TRUE or FALSE arrives on LOG_INPUT In the different cases pro forma (fake) output will be generated to “cheat” the DAGMAN
RULES FOR EXTENDED JOBS The Job Executable is a special wrapper program (w rapper.exe ) The genuine (user) executable returns the exit value Two additional input Ports and two additional output Ports are introduced each with standard Internal File Name: the genuine executable is associated as “ EXECUTABLE_INPUT ”, the file delivering the executing permission is “ LOG_INPUT ”, the name of files delivering the propagated permissions for the subsequent jobs in the proper direction are “ TRUE_OUTPUT ” and “ FALSE_OUTPUT ” The logical input and output ports accept special files with content { TRUE | FALSE } The Internal File Names of the output files which may be produced by the user executable must be listed after the genuine arguments separated by the keyword –outputs. This list is needed because if the LOG_INPUT delivers FALSE value or the user job fails then the wrapper must create pro forma (fake) output data files substituting the not running or not properly running executable of the user. In the lack of these files the DAGMAN would abort the job while attempting to copy the not existing files to the subsequent jobs.
EXAMPLE: IF(C1) E1 ELSE IF(C2) E2 ELSE E3 Owerview
EXAMPLE: IF(C1) E1 ELSE IF(C2) E2 ELSE E3 Detailes new LOG_INPUT port (Value: TRUE,FALSE) Job executable is the the standard “wrapper.exe” original input data port new EXECUTABLE_INPUT port to upload the genuine executable new TRUE_OUTPUT port (value: TRUE,FALSE) Each Internal File Name of files which can be produced by the genuine user executable must be listed after the separator attribute -outputs original output data port new FALSE_OUTPUT branch (value:TRUE,FALSE)
Example IF(C1) E1 ELSE IF(C2) E2 ELSE E3 Environment EXECUTABLE_INPUT LOG_INPUT TRUE_OUTPUT TRUE FALSE_OUTPUT A LOG_INPUT port not connected to any (logical) output ports must be associated to a file containing the ascii string “TRUE”
II Part (A case study) The case study is an IF THEN ELSE type simple workflow containing three jobs. The tested application can be downloaded as: DynamicDAGControl/TestProgram/SZTAKI_hermann_IF _THEN_ELSE_fork_seegrid.tar.gzSZTAKI_hermann_IF _THEN_ELSE_fork_seegrid.tar.gz
II Part (Case study) The test job IFargEq0 is the wrapper of the executable “exitWithArg.exe” which exits the same value it has been defined as Attributes i.e. we expect that the workflow will execute the job FALSEBR (connected to the FALSE_OUTPUT port ) Input port definition to upload the executable “exitWithArg.exe” The first job of wrapper type must run unconditionally therefore gets a file containing “TRUE” as LOG_INPUT The job “TRUEBR” connected by the TRUE_OUTPUT port of the job “IFargEq0” will not execute its user program “multiply.exe” defined at the port:1 Port to define the user executable “multiply.exe” The job “FALSEBR” connected to the port FALSE_OUTPUT of IFargEq0 will run in our experiment executing the user program “CopyAndTime” defined at the port:1
Result of the case study
Job IFArgEq0 output listing Message of the embedded user program “ExitWithArg” As this program has no “real” data output the warning can be left out of consideration The wrapper reports its decision which determines the activation of subsequent jobs
Job TRUEBR output listing As the preceding wrapper job resulted the value “FALSE” on the TRUE_OUPUT port the user executable of this job will not be executed
Job FALSEBR output listing Message of the embedded user program “CopyAndTime” The wrapper reports its decision which determines the activation of subsequent jobs