Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam.

Slides:



Advertisements
Similar presentations
Virtual Lab AMsterdam VLAM-G: A Grid-based Virtual Laboratory Presented by Cees de Laat VLAM-G developers team Computer Architecture and Parallel Systems.
Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Team involved in Preparing the demo: Presenter: Marcia Inda (SP1.5) Preparing the demo: Adam Belloum (SP2.5), Dmitry Vasunin (SP2.5), Victor Guevara (SP2.5),
Systems Development Environment
Systems Analysis & IT Project Management Pepper. System Life Cycle BirthDeathDevelopmentProduction.
WS-VLAM Introduction presentation ws-VLAM workflow Composer System and Network Engineering group Institute of informatics University of Amsterdam.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
WS-VLAM Introduction presentation WS-VLAM Workflow Engine System and Network Engineering group Institute of informatics University of Amsterdam.
WS-VLAM Introduction presentation WS-VLAM Semantic tools Systems, Networking, and Engineering group Institute of informatics University of Amsterdam.
WS-VLAM: Towards a Scalable Workflow System on the Grid V. Korkhov, D. Vasyunin, A. Wibisono, V. Guevara-Masis, A. Belloum Institute.
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
VL-e PoC Architecture and the VL-e Integration Team David Groep VL-e work shop, April 7 th, 2006.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Requirements Analysis 5. 1 CASE b505.ppt © Copyright De Montfort University 2000 All Rights Reserved INFO2005 Requirements Analysis CASE Computer.
Workshop on Cyber Infrastructure in Combustion Science April 19-20, 2006 Subrata Bhattacharjee and Christopher Paolini Mechanical.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
WS-VLAM Introduction presentation WS-VLAM Introduction Systems and Network Engineering group Institute of informatics University of Amsterdam.
I n t e g r i t y - S e r v i c e - E x c e l l e n c e Business & Enterprise Systems Introduction to Hewlett Packard (HP) Application Lifecycle Management.
Virtual Lab AMsterdam VLAM-G Project VLAM-G developers team Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit.
CSC230 Software Design (Engineering)
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
June Amsterdam A Workflow Bus for e-Science Applications Dr Zhiming Zhao Faculty of Science, University of Amsterdam VL-e SP 2.5.
UML - Development Process 1 Software Development Process Using UML (2)
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
INFSO-SSA International Collaboration to Extend and Advance Grid Education ICEAGE Forum Meeting at EGEE Conference, Geneva Malcolm Atkinson & David.
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Agents on the Semantic Web – a roadmap to the future An arial view from feet.
Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues.
A Novel Approach to Workflow Management in Grid Environments Frank Berretz*, Sascha Skorupa*, Volker Sander*, Adam Belloum** 15/04/2010 * FH Aachen - University.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
Workflow Project Status Update Luciano Piccoli - Fermilab, IIT Nov
Systems Analysis and Design in a Changing World, 3rd Edition
August , Elsevier, Amsterdam Scientific Workflows in e-Science Dr Zhiming Zhao System and Network.
VL-e Workshop, 7 April Developments and Activities in VL-e Medical Sílvia D. Olabarriaga Informatics Institute, UvA
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
WS-VLAM Tutorial Part I: Hands on the User Graphical Interface Adam Belloum.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.
Agents on the Semantic Web – a roadmap to the future An arial view from feet.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Virtual Lab AMsterdam VLAMsterdam Abstract Machine Toolbox A.S.Z. Belloum, Z.W. Hendrikse, E.C. Kaletas, H. Afsarmanesh and L.O. Hertzberger Computer Architecture.
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sergiu M. Dascalu.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Agents for Case-based software reuse Stein Inge Morisbak Web:
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
December, 2006 ws-VLAM Workflow Management System a Re-factoring of VLAM Dmitry Vasyunin Adianto Wibisono Adam Belloum.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
CASE Tools and Joint and Rapid Application Development
Joseph JaJa, Mike Smorul, and Sangchul Song
The 2007 Winter Conference on Business Intelligence
A Web-enabled Approach for generating data processors
VL-e PoC Architecture and the VL-e Integration Team
Analysis models and design models
Overview of Workflows: Why Use Them?
The ViroLab Virtual Laboratory for Viral Diseases
Scientific Workflows Lecture 15
Presentation transcript:

Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Outline Background –Scientific experiments, Workflow and e-Science framework Workflow management in the VL-e framework –The approach followed review the related work –Application use cases and workflow support Future work

Scientific experiments & e-Science Step1: designing an experiment Step2: performing the experiment Step3: analyzing the experiment results success Complex experiments:  have complex processes  require interdisciplinary expertise  require large scale resources Grid & high level support Scientific workflows

Scientific Workflow Management Systems in an e-Science environment Functionalities: –Automating experiment routines; –Rapid prototyping of experimental computing systems; –Hiding integration details between resources; –Managing experiment lifecycle; Cross different layers of middleware for managing: –Data; –Computing; –Information; –Knowledge. Generic Grid middleware Data management Computing tasks Information Knowledge SWMS High level workflow services Engine User support Domain specific Applications e-Science framework Grid infrastructure Workflow Management system In the VL-e project the targeted e-science framework is …

VL-e workflow wish list Classified in 4 categories: –Functionality and Capability –User interface characteristics –Run time capabilities –Software engineering aspects VL-e SIG Workflow meeting Jan 11th, 2005, 10:00–11:30, H220 (NIKHEF building) Present: Belleman, Belloum, Bouwhuis, Breanndán, Kaletas, Konijnenburg, Marshall, Rauwerda, Sterk, Sluiter, Terpstra, Vasunin, wibisono, Yakali. A list of 36 points was established to characterise the ideal workflow for the VL-e

Prioritize the workflow requirements based on the VL-e Applications Classified in 4 categories: –Application domains Model; –Engineering; –Underlying middleware; –Workflow management system : Composition/ Engine (runtime issues)/User support A list of 12 points was established to characterise the practical workflow for VL-e VL-e sub-program 2.5 in collaboration with SP1.X developers SP1.X contributors: Belleman, Klous, Konijnenburg, Marshall, Rauwerda, Sluiter, Terpstra,

Application use cases and workflow requirements Application use cases –Different rounds: a series of meetings –Distinguish workflow requirement Summary –From the resource perspective: To support legacy tools; standard middleware, e.g., web/grid services; To be able to invoke resources from different systems; Provides a rich library of workflow components; –From the application process perspective: To efficiently manage parallel processes/tasks in an experiment (Job farming); To efficiently explore large parameter space (Parameter sweep); To support knowledge based information processing (semantic level data integration). –From the perspective of using a SWMS: To provide a friendly user interface (preferably a GUI); To support the development of new workflow components ( java, scripts, C++, documentation and support); To be able to execute tasks on distributed resources (clusters or Grid); To be stable at runtime; To be able to interoperate with different workflow management systems.

Workflow management in VL-e First prototype –VLAM-G –Shortcoming (GUI, control flow, monitoring etc. + software engineering) Approach –Collect and analyze application use cases –Review the state of art of workflow systems –Propose workflow systems for the PoC environment –Be active in use case projects –Learn lessons from use cases –Propose a new design Based on the list of 36 items was established to characterize the ideal workflow for the VL-e, the VLAM-G scored: 13 Yes, 5 but need to be reimplementation, 09 No, 02 Partially supported, 6 In progress or Planned

Survey of existing workflow systems Participants: Belloum, De Boer, Guevara-Masis, Korkhov, Mirzadeh, Terpstra, van Hooft, Vasunin, wibisono, Yakali, Zhao.

Survey results Based on the survey and the practical tests on the nine workflow systems, we learn: –All of the systems are still in beta-versions (even in alpha), and have the tendency to crash when we do relatively complex tests. –None of the systems have support for collaboration, data sharing, and information management. –None of the systems enforce best practice or provide support for knowledge capture. –Most of systems are not geared to use Grid based systems, they have been built to work on a single system with some features to submit jobs on a remote host (user still exposed to some Grid related issues like writing RSLs). –We have had some problems when testing some features described in the documentation. Participants: Belloum, De Boer, Korkhov, Terpstra, van Hooft, Vasunin, wibisono, Zhao.

Recommendation for PoC R1 ( Part of the short term solution ) Participants: Belloum, De Boer, Korkhov, Terpstra, van Hooft, Vasunin, wibisono, Zhao.

Use cases and small project teams Use case project teams –Participants from SPs from P1, P2, P3 and P4. –Contributions from workflow team: distinguish reusable components and provide integration solution. –We are also active in project management, such as decomposing the implementation into concrete tasks, and track the progress. Inside SP2.5, we divide the group members –SP1.2  Belloum & Korkhov –SP1.3  Belloum & De Boer –SP1.4  Zhao & Vasunin –SP1.5  Zhao & Wibisono –SP1.6  Belloum & Paul & De Boer

Collaboration with VL-e Applications SP1.2 – AID-Food informatics-IvI –WCFS case: searching in “Research Management System” (Selected by the VLeIT) (ongoing …) SP1.3 – AMC-IvI –High-volume data management in the PoC SRB (Selected by the VLeIT) (ongoing …) SP1.4 - IBED-IvI –Run KansK toolbox in Workflow environment (Master thesis project) ( ongoing …)

Collaboration with VL-e Applications SP1.5 IBU-IvI –Histone code - semantic data integration (Selected by VLeIT) (ongoing …) –Running R scripts on multiple nodes using web service (Finished) –Running R scripts in workflows (ongoing …) –Ridge-O-grammer (ongoing …) SP1.6 AMOLF-IvI –SRB Meta data update from file header (Selected by VLeIT) (ongoing …)

SP1.2: WCFS case: searching in Research Management System” AID tools Lab. Exp In Sample Out Data Analysis In Data Out Data Situation Problem Research question Answer / conclusion LiteratureLit Report Much data in scientific research But: –No reuse: data not available across projects –No context: meaning of data not known –Not reproducible experiments –Only successful experiments traceable Wish: –Research Management System: manage experimental data for WCFS researchers

SP1.3: High-volume data management in the PoC SRB The goal of the use case is to: –Facilitate the data management and analysis for the functional MRI studies bu using PoC resources for computation and resources Matrix cluster SRB FMRI pilot is going to be developed as a first step.

SP1.4: Run KansK toolbox in Workflow environment To be integrated in workflow –VLAM The toolbox main processes are dealing with the data preparation, evaluate, prediction, and display The workflow is about the prediction of the location of the birds

SP1.5: Histone code - semantic data integration Model Alignment / Model Extension Data Acquisition e.g. Dbconnection, API, screen scraper Map e.g. Table -> RDF + model Flat map to RDFRDF to structured RDF Assign LSID’s Scaling problems –Sesame –Jena Data Import UCSC tables  RDF repository Data Exploration Extract overlapping genome locations Knowledge & Data Discovery

Read data Normalization F test Gene data generator R web services Model Raw data Normalized data FILE V plot Matrix FDR Gene data Model Local Grid Activity Data SP1.5: Running R scripts in workflows SP1.5 side (Frans and Han) SP2.5 side (Wibi, Zhiming) Define concrete description Provide UML based analysis diagrams Have a meeting: decompose the task Implement the functionality in the modules (Kepler Actor or VLAM module) Work together and give necessary support. Integrating modules into a workflow (a integration meeting) Refine the modulesRefine workflow Final demonstration

SP1.5: Ridge-O-grammer Input: Tamscriptome map Slide Window Median (SWM) Slide Window Median Probability (SWMP) Histogram of frequencies (HF) Histogram of probabilities (HP) False Discovery Rate (FDR) Output: List of Ridges The outcome of this work is going to be presented at “Netherlands Bioinformatics Conference” - 24 April 2006 identify ridges (regions of increased gene expression)

On going development Activities on the rapid prototyping environment Simple file management tools for SRB, and GridFTP R scripts in workflow system Parameters sharing of workflow components. Service discovery using P2P approach Parameter Sweep and Job farming

Future work By far the most active and rapidly progressing WMS is Kepler Beta-version March Kepler/Ptolomy has two ways of extending the Systems: Actors Directors

Summary Survey results showed that the e-science WMS targeted in VL-e – Does not exist yet – Collaboration with other Workflow project will likely speed up the development process Project teams working on application use case is the only way to progress VLAM is still quite useful for rapid prototyping

References People: Adam Belloum (SP2.5 leader), Zhiming Zhao, Paul van Hooft (post doc), Adianto Wibisono, Dmitry Vasyunin, Vladimir Korkhov, Frank Terpstra (Ph.D students), Piter de Boer (Programmer) VL-e Reports: 1.PoC recommendation report; Publications: 1.Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the 5th International Conference on Computer and Information Technology, pp IEEE Computer Society Press, Shanghai, China, September Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O. Hertzberger: Scientific workflow management: between generality and applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows, pp IEEE Computer Society Press, Melbourne, Australia, September 19th-21st Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence, pp IEEE Computer Society Press, Hongkong, China, November 14th-16th Activity: 1.Int’l workshop on Workflow systems in e-Science, organized by Zhiming Zhao and Adam Belloum, in the context of ICCS06, Reading University, May 28, Workshop on Workflow systems in e-Science, to be held during the next e-Science conference in Amsterdam December 2006.