1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.

Slides:



Advertisements
Similar presentations
Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce
Advertisements

IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines Ewa Deelman USC Information Sciences Institute Acknowledgement:
1 USC INFORMATION SCIENCES INSTITUTE Modeling and Using Simulation Code for SCEC/IT Yolanda Gil Varun Ratnakar Norm Tubman USC/Information Sciences Institute.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
DSM Workshop, October 22 OOPSLA 2006 Model-Based Workflows Leonardo Salayandía University of Texas at El Paso.
Copyright © 2006 Software Quality Research Laboratory DANSE Software Quality Assurance Tom Swain Software Quality Research Laboratory University of Tennessee.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
1 Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
1 Yolanda Gil AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute Part III Computational Workflows in Wings/Pegasus AAAI-08.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Designing Workflows: An Example from Image Analysis Yolanda Gil Information Sciences Institute University of Southern California October 17,
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part II Designing Workflows AAAI-08 Tutorial on Computational.
1 USC INFORMATION SCIENCES INSTITUTE Modeling and Using Simulation Code for SCEC/IT Yolanda Gil Jihie Kim Varun Ratnakar Marc Spraragen USC/Information.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part VII: Future Challenges in Computational Workflows and.
NOVA: CONTINUOUS PIG/HADOOP WORKFLOWS. storage & processing scalable file system e.g. HDFS distributed sorting & hashing e.g. Map-Reduce dataflow programming.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information.
Usage of `provenance’: A Tower of Babel Luc Moreau.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 AAAI-08 Tutorial on Computational Workflows for Large-Scale.
1 Yolanda Gil Information Sciences InstituteFebruary 4, 2010 Metadata Meets Semantic Workflows Yolanda Gil, PhD Information Sciences Institute.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Interactive Composition of Computational Pathways Jihie Kim Varun Ratnakar Students: Marc Spraragen (USC)
1 USC INFORMATION SCIENCES INSTITUTE CAT: Composition Analysis Tool Interactive Composition of Computational Pathways Yolanda Gil Jihie Kim Varun Ratnakar.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Apr. 8, 2002Calibration Database Browser Workshop1 Database Access Using D0OM H. Greenlee Calibration Database Browser Workshop Apr. 8, 2002.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Model Based Engineering Environment Christopher Delp NASA/Caltech Jet Propulsion Laboratory.
Modelling and Solving Configuration Problems on Business
Pegasus WMS Extends DAGMan to the grid world
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
USC Information Sciences Institute {jihie, gil,
Ewa Deelman University of Southern California
Requirement Analysis using
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
Data Provenance.
A General Approach to Real-time Workflow Monitoring
Scientific Workflows Lecture 15
Presentation transcript:

1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute

2 Pegasus and wings Workflow Selection Workflow Template Data Selection Workflow Instance Workflow Libraries Data Repositories Application Components Ontologies: Domain terms, Component types, Workflow Products - Preexisting data collections - Workflow execution results “Show me workflows that generate hazard maps” “Run that with the USGS data set” “Validate this workflow based on the component specs” SCIENTIST EXPERT SCIENTIST Workflow Creation SCIENTIST RESEARCHING NEW MODELS -Workflow templates specify complex analyses sequences - Workflow instances specify data “Here is a new wave propagation model, takes in a series of fault ruptures, is compiled for MPI” Component Specification Executable Workflow Pegasus WINGS - Specifies data requirements - Specifies execution requirements DAGMan/ Globus (OWL) WINGS/Pegasus: Workflow Instance Generation and Selection

3 Pegasus and wings Workflow Template Collections Computational nodes

4 Pegasus and wings Workflow Instance

5 Pegasus and wings Executable Workflow

6 Pegasus and wings Metadata Constraints (in OWL ontology) Constraints on Files metadata attributes: data types and default values Constraints on collections and collection of collection Type of each element Relations between metadata of a collection and metadata of individual items Component-level constraints on metadata attributes of input/output files or collections Deriving metadata of output files from metadata of input files Template level constraints on metadata attributes of files or collections Input/output files of different components can have the same metadata Checking number of items in collections

7 Pegasus and wings Provenance records Workflow Selection Workflow Template Data Selection Workflow Instance Workflow Libraries Data Repositories Application Components Ontologies: Domain terms, Component types, Workflow Products - Preexisting data collections - Workflow execution results “Show me workflows that generate hazard maps” “Run that with the USGS data set” SCIENTIST EXPERT SCIENTIST Workflow Creation -Workflow templates specify complex analyses sequences - Workflow instances specify data Component Specification Executable Workflow Pegasus WINGS - Specifies data requirements - Specifies execution requirements DAGMan/ Globus (OWL) VDS PTC

8 Pegasus and wings Queries answered Keys to provenance Capturing the correct metadata and propagating it through the template and instance Capturing runtime information Used (SparQL and scripting) and SQL to pose queries Queries 1,2,5,6,8—query to File and Workflow Instance Ontologies Query 4—query to the VDS PTC n Queries 3,7,9 —lack of time

9 Pegasus and wings hasType AnatomyImages OfPatient CollOf Collection FileCollection hasType File hasType AnatomyImages OfPatientInPeriod AnatomyImageFile hasType hasPatientID Metadata:String hasType hasPeriodID Metadata:Int hasIndexID hasPatientID PatientID1 hasPatientID PeriodID1 hasTimePeriodID Constraints on collection element types metadata constraints on collections & their elements … CC-AnatomyImages-Skolem C-AnatomyImages-Skolem AnatomyImage-Skolem hasType hasItems CollectionList FileList hasItems … CC-AnaImages-for-Patient112 C-AnaImages_P112_p1 C-AnaImages_P112_p2 C-AnaImages_P112_p12 hasItems Domain independent definitions Domain dependent definitions Skolem instance definitions hasItems part3 127_6.part2 img112_1.part1 112_2.par part2 img112-2.part1 112_12.part5 112_12.part2 img112_12.part1 … hasItems example files and collections... Constraints on Nested Collections hasTimePeriodID IndexID1 hasIndexID

10 Pegasus and wings Align_Warp Component Type hasInputs FileOrCollection List hasOutputs hasInputs Align_Warp_InputsAlign_Warp_Outputs hasOutputs hasIndexID Anatomy_IndexID1 hasPatiendID PatientID1 metadata constraints on input and output files Constraints on the types of input and output file and collections … … … Align_Warp_Skolem AnatomyHeader1 WarpParamFile1 Component level constraints on metadata attributes of input/output files or collections hasIndexID AnatomyImage1 hasPatiendID hasIndexID

11 Pegasus and wings fMRI Template1 InputLink_XYZInputFi le_to_Convert InputLink_ReslicedIma ge_to_Softmean hasLink InputLink_AnatomyImage s_to_Align_Warp hasFile hasPatientID PatientID1 N_Images hasN_Items … … Constraints on number of elements in different collections metadata constraints on files/collections of different components XYZInputFile1 Collection_Anato myImage1 Collection_Reslice dImages1 Template level (global) constraints on metadata attributes of files or collections hasPatientID

12 Pegasus and wings Refinement provenance (in design) We not only consider the provenance of the executing application but also of the refinement process that maps an abstract workflow (workflow instance) onto a set of resources The refinement process can be multi-staged Stages of the refinement can execute on a variety of resources We capture provenance of the entire workflow as well as workflow constituent The representations of the refinement and of the workflow provenance are uniform

13 Pegasus and wings Original Workflow Workflow 1

14 Pegasus and wings 1 st executable partition mapped onto resources

15 Pegasus and wings Chain of Refinement and Execution Steps

16 Pegasus and wings Definition of refinement and execution provenance [[I/O] data input/output [function performed] [performance info] [optional annotations]] Could include a justification of the reasons for the tasks performed

17 Pegasus and wings Provenance records relating to the refinement process [I:[ O: ; ] [ ][ ] [ ] [I:[ O: ] [ ][ ][ ] [I: O: ] [ ][ ][ ] [I:[ O: ] [ ][][] [[I: ] [O: ] [ ( could be in a form of a DAX (XML-DAG used by Pegasus )), ] [ …..][]] [I:, O: ] [ …][]] Thanks to Luc Moreau for his input!