Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance Ludek Matyska On behalf of the CESNET team (Czech Republic) GridWorld 2006 13 th September 2006

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 The Team Ales Krenek – chair (Brno) Jiri Sitera (Pilsen) Frantisek Dvorak (Pilsen) Milos Mulac (Pilsen) Miroslav Ruda (Brno) Zdenek Salvet (Brno) Daniel Kouril (Brno)

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLite and Jobs gLite is a middleware developed within the EU EGEE project The EGEE Grid is strictly job oriented –Submitting a job is the only way how users interact with the resources Each job is described using the Job Description Language based on a ClassAd syntax –Very complex description is possible, including proximity to the storage of input/output files, environmental settings etc. Job collections are also possible, forming simple workflows in the form of Directed Acyclic Graphs (DAGs) –Each DAG is completely described using nested JDL as a set of its nodes (jobs) and execution dependencies among them

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Job Processing in gLite Job is submitted through a User Interface Workload Manager queues a job and starts to look for appropriate Computing Element The job is passed to the selected Computing Element (to its queue) The job runs After a run, user can retrieve the job output (collected in the output sandbox) All actions on a job are tracked by the Logging and Bookkeeping (LB) service, that provides job state and related information After retrieval of the output sandbox, all the middleware data (including the whole LB data) are transferred to the Job Provenance (JP) Users can add Annotations as tags (name/value pairs) to a job either via LB (when job is on a Grid) or JP (any time afterwards)

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Challenge Workflow Implemented as a gLite DAG –Procedures becomes nodes of the DAG (gLite jobs) –Dependencies among procedures as DAG jobs dependencies Data are implicit, each job is responsible for upload and download of its input and outputs, resp., from an appropriate storage element –We setup a GridFTP server and all data were uploaded or downloaded using the gsiftp:// protocol –This means all data are identified by a their full URL

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Provenance Trace gLite Job Provenance is primary a storage and retrieval service for provenance data –Currently no GUI, only command line interface –Optimized to store large amount of provenance data  Mostly events recorded during job lifetime  WORM semantics for the primary data –User annotations  New annotations could be added any time  Annotations are “distilled” from the primary data, too –An extensible framework, where specific metadata processing is available through plug-ins that could be added at any time The Provenance challenge participation challenged the metadata interpretation –more work in this area has been and is still needed

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Attribute classes Most work done on the annotations (processed raw events) Four annotations’ classes used: –JP system ones  E.g. JobID or reistration time –Digested form LB trace  E.g. time when the job run –Digested from the JDL  E.g. Ancestor and Successor from the DAG description –Unqualified user tags All attributes can occur multiple times –E.g. “softmean” has 4 ancestor annotations (with “reslice” value)

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Specific Tags We used 6 specific user tags for the provenance challenge –IPAW_OUTPUT –IPAW_INPUT –IPAW_STAGE –IPAW_PROGRAM –IPAW_PARAM –IPAW_HEADER They kept the appropriate values as specified by the Provenance Challenge description –They were fed via the LB interface

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JP Queries JP Primary Server –Keeps primary data –Only data retrieval, the JobID must be known JP Index Server –Configurable cache of subset of jobs and their attributes –It can search for jobs matching specific query criteria  Comparison of an attribute with a constant value –Multiple JP IS can serve one JP PS

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Query #1 Find the process that led to Atlax X Graphics Input –URL of the queried Atlas X Graphic file Outputs –List of nodes (DAG jobs) that contributed to the queried file  Input and output files (their URLs)  Stage of the workflow, program name and parameter values Implementation –Recursive graphs search Results: –Above mentioned list of nodes and their attributes Low readability, no GUI manipulation –However, all the relevant information available

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Query #3 Find the Stage 3, 4, and 5 details of the process that lead to the Atlas X Graph Same as Query #1, output restricted to the above specified stages Comment –More efficient processing possible if we know the relationship between stages (i.e. we know that Stage 3 precedes Stage 4) –Generic enough to process STAGE specified via unstructured name, not only via numeric value

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Query #4 Find all invocations of procedure align_warp using a 12 th order nonlinear 1365 parameter model that run on Monday Outputs –Time, stage, program name, inputs, outputs Implementation –JPIS is queried for jobs matching IPAW_PROGRAM=“align_warp” and IPAW_PARAM=“-m 12” –Output is filtered for Monday

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Query #8 Annotated anatomy images Not directly possible –JP does not deal with data directly, only with jobs –No annotations on data available Possible solution (not implemented, but a similar to the one used to answer Query #9): –Introduction of “dummy” jobs, that will have the particular data file assigned as their input. –Associate annotations with these jobs –Process job annotations instead of data annotations

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Summary gLite Job Provenance usable to answer all queries but one gLite JP focused on efficient metadata storage and retrieval –In a semi-production operation on the EGEE preview testbed gLite JP Usable as the lowest layer for more complex Provenance systems –Some processing currently done at the client site Support for more complex workflows related to the introduction and support of complex workflows in the EGEE environment New challenge: precise re-run of a job from a past (complex environment setup)

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Questions?


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance."

Similar presentations


Ads by Google