Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.

Slides:



Advertisements
Similar presentations
CICC June meeting IUPUI team: Kelsey Forsythe Malika Mahoui Deepthi Jonnala Usha Cheemakurthi.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Web Service Composition Prepared by Robert Ma February 5, 2007.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Programming Paradigms and languages
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
WebRatio BPM: a Tool for Design and Deployment of Business Processes on the Web Stefano Butti, Marco Brambilla, Piero Fraternali Web Models Srl, Italy.
Distributed, parallel web service orchestration using XSLT Peter Kelly Paul Coddington Andrew Wendelborn.
A. Bucchiarone / Pisa/ 30 Jan 2007 Dynamic Software Architectures for Global Computing Antonio Bucchiarone PhD Student – IMT Graduate School Piazza S.
Semantic description of service behavior and automatic composition of services Oussama Kassem Zein Yvon Kermarrec ENST Bretagne France.
IBM WebSphere survey Kristian Bisgaard Lassen. University of AarhusIBM WebSphere survey2 Tools  WebSphere Application Server Portal Studio Business Integration.
Business Process Orchestration
TRAVEL RESERVATION SYSTEM USING WEB SERVICES COMPOSITION LANGUAGE
Kmi.open.ac.uk Semantic Execution Environments Service Engineering and Execution Barry Norton and Mick Kerrigan.
Streams – DataStage Integration InfoSphere Streams Version 3.0
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
THE NEXT STEP IN WEB SERVICES By Francisco Curbera,… Memtimin MAHMUT 2012.
LAYING OUT THE FOUNDATIONS. OUTLINE Analyze the project from a technical point of view Analyze and choose the architecture for your application Decide.
Ontology-derived Activity Components for Composing Travel Web Services Matthias Flügge Diana Tourtchaninova
Demonstrating WSMX: Least Cost Supply Management.
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Taverna and my Grid Basic overview and Introduction Tom Oinn
DEVS Namespace for Interoperable DEVS/SOA
WSDL Tutorial Ching-Long Yeh 葉慶隆 Department of Computer Science and Engineering Tatung University
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
CSE 219 Computer Science III Program Design Principles.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
CaBIG Workflow University of Chicago, USA University of Manchester, UK.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
95-843: Service Oriented Architecture 1 Master of Information System Management Service Oriented Architecture Lecture 7: BPEL Some notes selected from.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Algorithmic Finance and Tools for Grid Execution (the Swift Grid Scripting/Workflow tool) Tiberiu (Tibi) Stef-Praun.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
© 2007 IBM Corporation SOA on your terms and our expertise Software WebSphere Process Server and Portal Integration Overview.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library Building Scalable Environments Technologies and SCAPE Platform.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Course: COMS-E6125 Professor: Gail E. Kaiser Student: Shanghao Li (sl2967)
Qusay H. Mahmoud CIS* CIS* Service-Oriented Computing Qusay H. Mahmoud, Ph.D.
Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer Science Faculty of Information Technology.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
SE 548 Process Modelling WEB SERVICE ORCHESTRATION AND COMPOSITION ÖZLEM BİLGİÇ.
Service Composition Orchestration BPEL Cédric Tedeschi ISI – M2R.
1 Seminar on SOA Seminar on Service Oriented Architecture BPEL Some notes selected from “Business Process Execution Language for Web Services” by Matjaz.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
Business Process Execution Language (BPEL) Pınar Tekin.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Managing, Storing, and Executing DTS Packages
University of Chicago and ANL
Web Ontology Language for Service (OWL-S)
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Presentation transcript:

Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University of Chicago and Argonne National Laboratory, USA 2 School of Computer Science, University of Manchester, Manchester, U.K

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 2 Agenda Introduction to caGrid Why scientific workflows in caGrid? BPEL and Taverna comparison -Service discovery -Service composition & workflow execution - Data-driven vs. control-driven modeling - Implicit vs. explicit definition of data - Implicit vs. explicit iteration on data -Workflow result analysis Conclusion

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Globus Introduction: caBIG and caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL As of Oct 19, 2008: 122 participants 105 services 70 data 35 analytical

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 5 caGrid data instruments computation resource Virtualization Security Connectivity Introduction: caGrid and workflow Discovery Composition Execution Analysis Community Scientific workflow lifecycle reuse generate

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Challenges faced by caGrid users 66 caGrid Discovery Composition Execution Analysis Community reuse generate Locating needed services Determining function Accessing services from a workflow GUI for building workflows easily Executing workflow efficiently Persisting and visualizing results Sharing and reusing workflows

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Our goals in this paper Communicate practical experiences based on our work in the caGrid project Cover the entire scientific workflow lifecycle, from service discovery to service composition, workflow execution, and workflow result analysis  Based on caGrid requirements for workflow language and tooling  Also applicable to other areas in data-intensive and exploratory science? 7

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL BPEL and Taverna Not the only two but they are representative choices BPEL -XML-based specification for web service based process behavior -Industry standard adopted by IBM, SAP, Oracle, etc. -Has also attracted attention from the scientific community because of its support for SOA paradigm Taverna -Open-source, from the myGrid consortium in UK -Design and execution of scientific workflows -Plug-in architecture for extension (access more applications, visualize more data types, etc.) 8

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Querying semantic data in cancer research Identify description logic concepts relating to a particular context, e.g., “caCore” 1)Query all projects related to context “caCore” 2)find UML classes in each project 3)use project and UML class information to query the semantic metadata 4)retrieve the concept code We adopt this query as a use case to guide our comparison

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Support for service discovery Before building a workflow -Need to find appropriate services to be composed -Service endpoints are not naturally known to users -Exact semantics of those services are not known  Taverna offers -A extensible scavenger interface for arbitrary service discovery according to users needs (see next page) -A native semantic discovery facility called Feta: myGrid ontology based service annotation and search.  BPEL offers -UDDI which is not widely adopted -Research efforts like: WSMO, OWL-S, which are more on specification level -No open-source tool is available that works with a service query component in an integrated way 10

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 11 Solution for caGrid: Metadata-based service query 1. Semantic/metadata based service discovery. 2. Build a workflow using the services obtained by discovery. 3. Execute the workflow and view the results. 1. Semantic/metadata based service discovery. 2. Build a workflow using the services obtained by discovery. 3. Execute the workflow and view the results. caGrid service metadata caGrid scavenger: query the CaDSR Service in the use case Types of query -String based -Property based -Semantic based

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Service composition & workflow execution Data-driven vs. control-driven modeling Implicit vs. explicit definition of data Implicit vs. explicit iteration on data 13

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Data-driven vs. control-driven modeling 14 BPEL Taverna (Scufl) Activities in model Basic and structure activities Processors as data processing units with in/output ports Semantics of links Transfer of controlTransfer of data Data definition Explicitly defined (global variables) Implicitly defined (processor’s input/output) Data initialization Complex data type must be explicitly initialized Automatically Control logic Full-fledged: sequence, conditional, parallel, event- triggered, etc Limited: sequential, parallel and conditional Parallel execution Defined in or By default Comparison of BPEL and Taverna (Scufl) w.r.t. control/data-flow

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Implicit vs. explicit definition of data Taverna -Processors have input/output ports with an associated data type -Data travels from the output port of a processor to the input of one or more downstream processors -Interaction among processors is defined entirely by the arcs in the dataflow graph BPEL -Requires the explicit definition of variables, and explicit initiation for complex types -Data are shared amongst activities (i.e., are global) -More complexity, but more power and flexibility in data handling 15

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Implicit vs. explicit iteration on data Implicit iteration in Taverna -Occurs when an input port receives a list element: - E.g., a processor that outputs a “list of strings,” can legally be connected to a processor with an input port of type “string.” -Taverna interprets this type mismatch as an indication that the destination processor must be invoked repeatedly, once for each element of the input list -This behavior is defined with Taverna's functional programming model Explicit iteration in BPEL -BPEL does not allow type mismatch and iterate needs to be defined explicitly -Again, BPEL offers more flexibility to define more advanced iteration patterns (with more complexity in the model, though) 16

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Implicit vs. explicit iteration in CaDSR 17 findProjects returns an array Project [] findClassesInProject receives type Project and finds all UML classes in this (single) project In Taverna an xmlsplitter extracts the project array and feeds this directly into findClassesInProject In BPEL a ForEach construct is needed for the iteration over array Project []

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL Workflow result analysis Workflow provides a natural framework for data tracking and analysis -In both Taverna and BPEL Taverna: offers native provenance support -More precise linkage annotation between services’ input and output -Semantic support -Not the focus of our project, see ref. [16] [17] for more details 18

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 19 Conclusion: Taverna offers lifecycle support + caGrid = ? + caGrid = ? + caGrid = ? + caGrid = ? caGrid Discovery composition Execution Analysis Community reuse generate Scavenger: for customized service discovery Feta: service annotation and discovery. Scufl: compact modeling of data flow Built-in processors: Soaplab, BioMart, etc. Customized processors as plug-ins Implicit iteration: handle parallel execution Result persistence and visualization A community for sharing workflows Provides a compact set of primitives that eases the modeling of data flows Allows users to specify “what to do” instead of “how to do it”

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 20 Conclusion: BPEL offers unique features Build-time -A comprehensive set of primitives to model processes of all flavors - control-flow oriented - data-flow oriented (although a little verbose) - event driven, etc. -Full featured - process logic, data manipulation, event and message processing, fault handling, etc. Run-time -BPEL engines typically run inside application servers with - persistent state storage - reliability and scalability guarantees -Important for long-running and computation-intensive workflows -For now Taverna engine does not provide these capabilities

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 21 Conclusion Factors in deciding which language/tool to choose -User IT expertise - some prefer scripting language, others a friendly GUI -Problem size - Taverna often runs on desktop and handles problem of moderate size (currently common in bioinformatics) - Grid/server based systems like Swift can deal with huge volume of data and intensive computation (for example, applications in medical informatics, neuroscience, physics) -Applications involved - Web services, batch jobs, shell scripts, etc. Future work -Enrich the caGrid workflow tool set based on Taverna -Build more real workflows to help scientific investigation -Address issues of scale as they arise

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 22 Thank you for your attention

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL 23 Introduction: caGrid and workflow caGrid data instruments computation resource Virtualization Security Connectivity