EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
XHTML Introductory1 Forms Chapter 7. XHTML Introductory2 Objectives In this chapter, you will: Study elements Learn about input fields Use the element.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
1 Sergio Maffioletti Grid Computing Competence Center GC3 University of Zurich Swiss Grid School 2012 Develop High Throughput.
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
INFSO-RI Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
INFSO-RI Enabling Grids for E-sciencE GILDA and GENIUS Guy Warner NeSC Training Team An induction to EGEE for GOSC and the NGS NeSC,
INFSO-RI Enabling Grids for E-sciencE Charon Extension Layer. Modular environment for Grid jobs and applications management Jan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMPROXY usage Álvaro Fernández IFIC (CSIC)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
INFSO-RI Enabling Grids for E-sciencE Use Case of gLite Services Utilization. Multiple Ligand Trajectory Docking Study Jan Kmuníček.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks User traceability and log analysis tools.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Computational chemistry with ECCE on EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Introduction to P-GRADE Portal hands-on Miklos Kozlovszky MTA SZTAKI
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CharonGUI A Graphical Frontend on top of.
EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite – UNICORE interoperability Daniel Mallmann.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
Developing GRID Applications GRACE Project
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
FESR Trinacria Grid Virtual Laboratory Practical using WMProxy advanced job submission Emidio Giorgio INFN Catania.
Practical using C++ WMProxy API advanced job submission
Workload Management System ( WMS )
and Alexandre Duarte OurGrid/EELA Interoperability Meeting
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
OGSA Data Architecture Scenarios
EGEE Middleware: gLite Information Systems (IS)
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance Ludek Matyska On behalf of the CESNET team (Czech Republic) GridWorld th September 2006

Enabling Grids for E-sciencE EGEE-II INFSO-RI The Team Ales Krenek – chair (Brno) Jiri Sitera (Pilsen) Frantisek Dvorak (Pilsen) Milos Mulac (Pilsen) Miroslav Ruda (Brno) Zdenek Salvet (Brno) Daniel Kouril (Brno)

Enabling Grids for E-sciencE EGEE-II INFSO-RI gLite and Jobs gLite is a middleware developed within the EU EGEE project The EGEE Grid is strictly job oriented –Submitting a job is the only way how users interact with the resources Each job is described using the Job Description Language based on a ClassAd syntax –Very complex description is possible, including proximity to the storage of input/output files, environmental settings etc. Job collections are also possible, forming simple workflows in the form of Directed Acyclic Graphs (DAGs) –Each DAG is completely described using nested JDL as a set of its nodes (jobs) and execution dependencies among them

Enabling Grids for E-sciencE EGEE-II INFSO-RI Job Processing in gLite Job is submitted through a User Interface Workload Manager queues a job and starts to look for appropriate Computing Element The job is passed to the selected Computing Element (to its queue) The job runs After a run, user can retrieve the job output (collected in the output sandbox) All actions on a job are tracked by the Logging and Bookkeeping (LB) service, that provides job state and related information After retrieval of the output sandbox, all the middleware data (including the whole LB data) are transferred to the Job Provenance (JP) Users can add Annotations as tags (name/value pairs) to a job either via LB (when job is on a Grid) or JP (any time afterwards)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Challenge Workflow Implemented as a gLite DAG –Procedures becomes nodes of the DAG (gLite jobs) –Dependencies among procedures as DAG jobs dependencies Data are implicit, each job is responsible for upload and download of its input and outputs, resp., from an appropriate storage element –We setup a GridFTP server and all data were uploaded or downloaded using the gsiftp:// protocol –This means all data are identified by a their full URL

Enabling Grids for E-sciencE EGEE-II INFSO-RI Provenance Trace gLite Job Provenance is primary a storage and retrieval service for provenance data –Currently no GUI, only command line interface –Optimized to store large amount of provenance data  Mostly events recorded during job lifetime  WORM semantics for the primary data –User annotations  New annotations could be added any time  Annotations are “distilled” from the primary data, too –An extensible framework, where specific metadata processing is available through plug-ins that could be added at any time The Provenance challenge participation challenged the metadata interpretation –more work in this area has been and is still needed

Enabling Grids for E-sciencE EGEE-II INFSO-RI Attribute classes Most work done on the annotations (processed raw events) Four annotations’ classes used: –JP system ones  E.g. JobID or reistration time –Digested form LB trace  E.g. time when the job run –Digested from the JDL  E.g. Ancestor and Successor from the DAG description –Unqualified user tags All attributes can occur multiple times –E.g. “softmean” has 4 ancestor annotations (with “reslice” value)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Specific Tags We used 6 specific user tags for the provenance challenge –IPAW_OUTPUT –IPAW_INPUT –IPAW_STAGE –IPAW_PROGRAM –IPAW_PARAM –IPAW_HEADER They kept the appropriate values as specified by the Provenance Challenge description –They were fed via the LB interface

Enabling Grids for E-sciencE EGEE-II INFSO-RI JP Queries JP Primary Server –Keeps primary data –Only data retrieval, the JobID must be known JP Index Server –Configurable cache of subset of jobs and their attributes –It can search for jobs matching specific query criteria  Comparison of an attribute with a constant value –Multiple JP IS can serve one JP PS

Enabling Grids for E-sciencE EGEE-II INFSO-RI Query #1 Find the process that led to Atlax X Graphics Input –URL of the queried Atlas X Graphic file Outputs –List of nodes (DAG jobs) that contributed to the queried file  Input and output files (their URLs)  Stage of the workflow, program name and parameter values Implementation –Recursive graphs search Results: –Above mentioned list of nodes and their attributes Low readability, no GUI manipulation –However, all the relevant information available

Enabling Grids for E-sciencE EGEE-II INFSO-RI Query #3 Find the Stage 3, 4, and 5 details of the process that lead to the Atlas X Graph Same as Query #1, output restricted to the above specified stages Comment –More efficient processing possible if we know the relationship between stages (i.e. we know that Stage 3 precedes Stage 4) –Generic enough to process STAGE specified via unstructured name, not only via numeric value

Enabling Grids for E-sciencE EGEE-II INFSO-RI Query #4 Find all invocations of procedure align_warp using a 12 th order nonlinear 1365 parameter model that run on Monday Outputs –Time, stage, program name, inputs, outputs Implementation –JPIS is queried for jobs matching IPAW_PROGRAM=“align_warp” and IPAW_PARAM=“-m 12” –Output is filtered for Monday

Enabling Grids for E-sciencE EGEE-II INFSO-RI Query #8 Annotated anatomy images Not directly possible –JP does not deal with data directly, only with jobs –No annotations on data available Possible solution (not implemented, but a similar to the one used to answer Query #9): –Introduction of “dummy” jobs, that will have the particular data file assigned as their input. –Associate annotations with these jobs –Process job annotations instead of data annotations

Enabling Grids for E-sciencE EGEE-II INFSO-RI Summary gLite Job Provenance usable to answer all queries but one gLite JP focused on efficient metadata storage and retrieval –In a semi-production operation on the EGEE preview testbed gLite JP Usable as the lowest layer for more complex Provenance systems –Some processing currently done at the client site Support for more complex workflows related to the introduction and support of complex workflows in the EGEE environment New challenge: precise re-run of a job from a past (complex environment setup)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Questions?