Presentation is loading. Please wait.

Presentation is loading. Please wait.

UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.

Similar presentations


Presentation on theme: "UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow."— Presentation transcript:

1 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow System Ilkay Altintas, Assistant Director, National Laboratory for Advanced Data Research Manager, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, University of California, San Diego Oscar Barney, Scientific Computing and Imaging Institute, The University of Utah Efrat Jaeger-Frank, San Diego Supercomputer Center, University of California, San Diego

2 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies What is a scientific workflow?

3 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies What does the user want? “To get work done” and “Make hard things easy” How to do this? 1.Combine tools with disparate strengths 2.Make them work efficiently 3.Focus on interfaces 4.Enable consistent user interfaces

4 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Real-time Weather Sensor Data Display Workflow Basic Steps 1.Get Real Time Weather Data from the ORB 2.Convert this data into a visualizable/ graphical plot via image manipulation tools such as JAI,Java2D/3D,Gnuplot or Matlab. 3.Display the above weather plot Images in Kepler. 4.Refresh the images produced so as to reflect the most recent data. A very basic pipeline!

5 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Promoter Identification Workflow

6 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Scientific Workflow is a Set of Steps To… Combine different CI technologies –To promote “scientific discovery” by providing tools and methods to generate scientific workflows Often through an extensible and customizable graphical user interface – For scientists from different scientific domains –To support computational experiment creation, execution, sharing, reuse and provenance –To connect to the existing data and integrate heterogeneous data from multiple resources in efficient ways provided by a scientific workflow system –To bring CI into user’s monitor!!!

7 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Why do we need to track provenance in a scientific workflow system?

8 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Because science is an evolving process… “ A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it. ” (First Law of Mentat), Frank Herbert, Dune. Recreate results and rebuild workflows using the evolution information Associate the workflow with the results it produced Create links between generated data in different runs, and compare different runs Recover from a system failure –Checkpoint a workflow –Debug and explain results (via lineage tracing, …) Smart Reruns

9 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflow KEPLER = “Ptolemy II + X” for Scientific Workflows Kepler is a Scientific Workflow System … and a cross-site collaboration 1st Beta release (Out next week…) www.kepler-project.org Builds upon the open-source Ptolemy II framework

10 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Kepler is a Team Effort Ptolemy II Resurgence Griddles SRB LOOKING BIRN CipresNLADR Contributor names and funding info are at the Kepler website!! Other contributors: - Chesire (UK Text Mining Center) - DART (Great Barrier Reef, Australia) - National Digital Archives + UCSD-TV (US)

11 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Vergil is the GUI for Kepler Actor ontology and semantic search for actors Search -> Drag and drop -> Link via ports Metadata-based search for datasets Actor Search Data Search Director Actor

12 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Actor Search Kepler Actor Ontology Used in searching actors and creating conceptual views (= folders) Currently 160 Kepler actors added!

13 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Data Search and Usage of Results Kepler DataGrid – Discovery of data resources through local and remote services SRB, Grid and Web Services, Db connections – Registry of datasets on the fly using workflows

14 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Kepler System Architecture Authentication GUI Vergil SMS Kepler Core Extensions Ptolemy …Kepler GUI Extensions… Actor&Data SEARCH Type System Ext Provenance Framework Kepler Object Manager Documentation Smart Re-run / Failure Recovery

15 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Initial Work on the Provenance Framework OPTIONAL! –Modeled as a separate concern in the system –Listens to the execution and saves information customized by a set of parameters Context: who, what, where, when, and why that is associated with the run Input data and its associated metadata Workflow outputs and intermediate data products Workflow definition (entities, parameters, connections): a specification of what exists in the workflow and can have a context of its own Information about the workflow evolution -- workflow trail Types of Provenance Information: –Data provenance Intermediate and end results including files and db references –Process provenance Keep the wf definition with data and parameters used in the run –Error and execution logs –Workflow design provenance

16 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Kepler Provenance Recording Utility Parametric and customizable –Different report formats –Variable levels of detail Verbose-all, verbose-some, medium, on error –Multiple cache destinations Saves information on –User name, Date, Run, etc…

17 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies What other system functions does provenance relate to in Kepler? Failure recovery Smart re-runs Semantic extensions Kepler Data Grid Reporting and Documentation Authentication Data registration Re-run only the updated/failed parts Guided documentation generation an updates

18 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies “Smart” Re-runs Instead of running a workflow from scratch, only re-run parts of the workflow that have not been done before – Example: Change a parameter downstream and don’t re-run the actors that lead up to the one with the parameter change Especially useful: –In visualization pipelines –Long running workflows

19 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies “Smart” Re-runs Uses VisTrails’ cache manager algorithm* Idea: –To re-run as little of the network as possible by combining intermediate results from different workflow runs –Past results stored in a provenance store (currently cache) Queries and recreates input to actors that need to be re-fired * L. Bavoil, et al. VisTrails: Enabling Interactive Multiple-View Visualizations. IEEE Visualization, 2005.

20 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies What is needed for “Smart” Re-runs? Need to keep track of –what have done before –what actors have been given what inputs with what outputs Uses the stored provenance data –From the cache

21 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies

22 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Next Steps Deciding on terms and definitions for all Kepler A relational schema for the provenance info in addition to the existing XML Collect data/metadata in different formats.kar file generation, registration and search for provenance information Adding provenance repositories Automatic report generation from accumulated data A GUI to keep track of the changes Continue work on “Smart” Re-runs system

23 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies To Sum Up … is an open-source system and collaboration Kepler provenance framework and smart rerun manager are in their initial steps –Aims to support different scientific domains Saving data in different repositories and metadata formats –Successful results in the initial runs Short demonstration…

24 UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Ilkay Altintas altintas@sdsc.edu +1 (858) 822-5453 http://www.sdsc.edu Questions… Thanks!


Download ppt "UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow."

Similar presentations


Ads by Google