Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines Ewa Deelman USC Information Sciences Institute Acknowledgement:
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Computational Physics Kepler Dr. Guy Tel-Zur. This presentations follows “The Getting Started with Kepler” guide. A tutorial style manual for scientists.
Computers: Tools for an Information Age
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Lecture Nine Database Planning, Design, and Administration
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
CREATING A MULTI-WAVELENGTH GALACTIC PLANE ATLAS WITH AMAZON WEB SERVICES G. Bruce Berriman, John Good IPAC, California Institute of Technolog y Ewa Deelman,
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Managing Workflows with the Pegasus Workflow Management System
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
January, 23, 2006 Ilkay Altintas
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Workflow Project Luciano Piccoli Illinois Institute of Technology.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
Khoros Yongqun He Dept. of Computer Science, Virginia Tech.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
© 2006 Open Grid Forum Workflow Management Research Group - WFM-RG q Chairs: Ian Taylor and Ewa Deelman Secretaries: Andrew Harrison and Matthew Shields.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Workflow Project Status Update Luciano Piccoli - Fermilab, IIT Nov
Application portlets within the PROGRESS HPC Portal Michał Kosiedowski
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Page 1© Crown copyright 2004 FLUME Metadata Steve Mullerworth 3 rd -4 th October May 2006.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Graphical Design Environment for a Reconfigurable Processor IAmE Abstract The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Chapter – 8 Software Tools.
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Kepler BEAM Workshop Samantha Romanello LTER Network Office.
Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
Service Composition Orchestration BPEL Cédric Tedeschi ISI – M2R.
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Ewa Deelman University of Southern California
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
A General Approach to Real-time Workflow Monitoring
Chaitali Gupta, Madhusudhan Govindaraju
Frieda meets Pegasus-WMS
Scientific Workflows Lecture 15
GGF10 Workflow Workshop Summary
Presentation transcript:

Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina del Rey, CA pegasus.isi.edu

Ewa Deelman, Motivation Many workflow systems exists today The choice of particular system often dictated by who you know Various workflow system have different capabilities Application components versus services Visual vs. scripting workflow descriptions Performance optimization, etc. Can you combine two separate systems? What are the issues?

Ewa Deelman, Kepler (UCSD and UCDavis) Scientific workflow management system based on Ptolemy II Allows scientists to visually design and execute scientific workflows Actor-oriented model with directors acting as the main workflow engine Enables different models of computation.

Ewa Deelman, Pegasus (USC/ISI) Based on programming language principles Leverages abstraction for workflow description to obtain ease of use, scalability, and portability Provides a compiler to map from high-level descriptions to executable workflows Correct mapping Performance enhanced mapping Relies on a runtime engine to carry out the instructions Scalable manner Reliable manner

Ewa Deelman, Combing Kepler & Pegasus Integration of Kepler visual programming environment with the grid mapping abilities of Pegasus Giving Kepler users the ability to map their large workflows onto the grid Giving Pegasus users a visual workflow composition tool Differences in the level of abstraction of workflow description

Ewa Deelman, Kepler Provenance Challenge Workflow

Ewa Deelman, Concrete Workflow Generation and Mapping

Ewa Deelman, Implementation Strategy Develop Pegasus-specific entities Abstract jobs Directors and actors “Pegasus Director” and “Pegasus Jobs (Actor Entities)” act as the main grid components to execute a given grid computation Focus mainly on abstract jobs in the Kepler environment portable and resources-knowledge independent workflow descriptions

Ewa Deelman, Integration

Ewa Deelman, Pegasus Actor & Director Entities

Ewa Deelman, Visual Abstract Workflow Creation Users can create visual models of abstract workflows and specify logical transformations without specifying grid resources

Ewa Deelman, Job Abstract Configuration --Integration with the Transformation Catalog

Ewa Deelman, Resultant Abstract Job on Kepler Canvas: A Pegasus abstract job can take in multiple input files as can output multiple output file Grid resources information is not expected in such an actor.

Ewa Deelman, Support for Concrete Jobs --- useful for monitoring and debugging A concrete job requires specific grid resources information from the scientist. Allows the scientist to directly execute jobs on the grid

Ewa Deelman, Pegasus Director/ DAX Generator Controls the execution of all the job (actor) entities and creates a resulting directed acyclic graph in XML format Generates a DAX Gives it to DAGMan for execution

Ewa Deelman, Sample DAX Generated :

Ewa Deelman,

Ewa Deelman, Provenance Challenge Workflow in Kepler/Pegasus In Kepler each node needs a unique name, so TC needs many duplicate entries

Ewa Deelman, Integration Benefit for Pegasus users Visualizing/ Debugging Existing Models: Support a scientist trying to redo/visualize or easily re-configure existing DAX Provide option to upload existing DAX files into the workspace Convert the specified DAX file into a MoML (Kepler’s) format by passing it through an XSLT processor and generating the required directors and actors on the canvas Issues of scalability (only small workflows can be visualized) Scoping may need to be applied

Ewa Deelman, Integration Issues Kepler acts a visual programming environment Actors represent single units of computation with data flow among each other Some configuration not intuitive (TC entries) There is no concept of representation of files separately in Kepler Have multiport I/O ports for each job The user is given the option to connect as many files going into and coming out of the port Potential use of integrated environment for debugging Not done Integration with Pegasus data registry No monitoring of execution in Kepler Use of Kepler’s workflow execution engine Support for Kepler actors in Pegasus

Ewa Deelman, Relevant Links Kepler: Pegasus: DAGMan: Provenance challenge: Workshop on Tuesday NSF workshop on Challenges of Scientific Workflows: