Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
WebRatio BPM: a Tool for Design and Deployment of Business Processes on the Web Stefano Butti, Marco Brambilla, Piero Fraternali Web Models Srl, Italy.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Towards a Generic Platform for Developing CSCL Applications Using Grid Infrastructure by Santi Caballé Open University of Catalonia Barcelona, Spain with.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Nadia Ranaldo - Eugenio Zimeo Department of Engineering University of Sannio – Benevento – Italy 2008 ProActive and GCM User Group Orchestrating.
© 2006 IBM Corporation IBM Software Group Relevance of Service Orientated Architecture to an Academic Infrastructure Gareth Greenwood, e-learning Evangelist,
Chapter 1: Overview of Workflow Management Dr. Shiyong Lu Department of Computer Science Wayne State University.
January 1999 CHAIMS1. January 1999 CHAIMS2 CHAIMS: Compiling High-level Access Interfaces for Multi-site Software CHAIMS Stanford University Objective:
PRIVACY, TRUST, and SECURITY Bharat Bhargava (moderator)
David Harrison Senior Consultant, Popkin Software 22 April 2004
Community Manager A Dynamic Collaboration Solution on Heterogeneous Environment Hyeonsook Kim  2006 CUS. All rights reserved.
Computational Thinking Related Efforts. CS Principles – Big Ideas  Computing is a creative human activity that engenders innovation and promotes exploration.
Mobile Agents for Integrating Cloud-Based Business Processes with On-Premises Systems and Devices Janis Grundspenkis Antons Mislēvičs Department of Systems.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
June Amsterdam A Workflow Bus for e-Science Applications Dr Zhiming Zhao Faculty of Science, University of Amsterdam VL-e SP 2.5.
A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt Nemeth Thierry Priol CoreGRID Post Doc IRISA, Rennes, France.
TC Methodology Massimo Cossentino (Italian National Research Council) Radovan Cervenka (Whitestein Technologies)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE II - Network Service Level Agreement (SLA) Establishment EGEE’07 Mary Grammatikou.
Understanding the utility and fitness of Workflow Provenance for Experiment Reporting Pınar Alper, Supervisor: Carole A. Goble 1.
Software Software is omnipresent in the lives of billions of human beings. Software is an important component of the emerging knowledge based service.
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI CloudBroker Platform integration into WS-PGRADE/gUSE Zoltán Farkas MTA.
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Agent Model for Interaction with Semantic Web Services Ivo Mihailovic.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 09. Review Introduction to architectural styles Distributed architectures – Client Server Architecture – Multi-tier.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Chapter 1: Overview of Workflow Management Dr. Shiyong Lu Department of Computer Science Wayne State University.
Distributed Aircraft Maintenance Environment - DAME DAME Workflow Advisor Max Ong University of Sheffield.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Workflow Project Status Update Luciano Piccoli - Fermilab, IIT Nov
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Managing and communicating uncertainty in geospatial web service workflows Richard Jones, Dan Cornford, Lucy Bastin, Matthew Williams Computer Science,
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
Introduction Infrastructure for pervasive computing has many challenges: 1)pervasive computing is a large aspect which includes hardware side (mobile phones,portable.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Yuhui Chen; Romanovsky, A.; IT Professional Volume 10, Issue 3, May-June 2008 Page(s): Digital Object Identifier /MITP Improving.
1 Taverna CISTIB Ernesto Coto Taverna Open Workshop, October 2014.
Workflow in Grid Systems Workshop Dave Berry, Research Manager UK National e-Science Centre GGF10, Mar 2004.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Enabling the Future Service-Oriented Internet (EFSOI 2008) Supporting end-to-end resource virtualization for Web 2.0 applications using Service Oriented.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library Building Scalable Environments Technologies and SCAPE Platform.
BPEL in Grids Aleksander Slomiski Department of Computer Science Indiana University
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Enabling Components Management and Dynamic Execution Semantic.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
MDD approach for the Design of Context-Aware Applications.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 4 Slide 1 Software Processes.
Scientific Workflows for the Sensor Web ICT for Earth Observation Anwar Vahed.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Collaborative Tools for the Grid V.N Alexandrov S. Mehmood Hasan.
Towards ‘Ubiquitous’ Ubiquitous Computing: an alliance with ‘the Grid’ Oliver Storz, Adrian Friday, and Nigel Davies Computing Department, Lancaster University,
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI Accessing cloud resources through the WS-PGRADE/gUSE and CloudBroker integrated.
OMII-BPEL Grid Services Orchestration using the Business Process Execution Language (BPEL) Liang Chen Bruno Wassermann Project Inspector: Wolfgang Emmerich.
Distribution and components
Chapter 18 MobileApp Design
Service-centric Software Engineering 1
Service Oriented Architecture (SOA)
Managing Process Integrity (Chapter 8)
Introduction to SOA Part II: SOA in the enterprise
Scientific Workflows Lecture 15
Presentation transcript:

Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project Mastodons CNRS Predon

Outline  Scientific workflows  Examples  Existing systems and limitations  Using conventionnel workflow technologies in simulation/experiments  Introduction  Modeling using BPEL  Reusing in scientific workflow  Fragment reusing  Privacy aware provenance

What are scientific workflows?  Scientific experiments/computations/simulation modeled and executed as workflows called scientific workflow (SWf).  Deal with intensive data, are long running, data driven, can integrate multiple data sources (i.e. sensors)

SWF Examples

Source: Matt Coleman (LLNL)

Source: NIH BIRN (Jeffrey Grethe, UCSD)

SWF Examples (cont)

A snapshot of the Taverna Workbench.

Scientific workflow systems Workflow are already used in e-science This is not always the conventional workflow technology Some workflow systems in e-science: Kepler, Taverna, Pegasus, Trident, Simulink, Karajan … To be improved Robustness, fault handling Flexibility and adaptability Reusability Scalability Interaction with users, user-friendliness of tools science skills required from scientist No generic approach Domain specific solutions (in term of modeling and execution)

Scientific workflow systems  Data-driven applications are more and more developed in science to exploit the large amount of digital data today available  Adequate workflow composition mechanisms are needed to support the complex workflow management process including workflow creation, workflow reuse, and modifications made to the workflow over time.  Use conventional technologies (Business processes)

Business workflows (i.e, BPEL)  independent of the application domain, can be used for every type of scenario  The concept of workflow models and instances is inherently capable of enabling parameter sweeps.  Asynchronous messaging features are predestinated for non- blocking invocation of long running scientific computations  Business workflows are usually based on agreed-upon standards for workflow modeling and execution as well as for integration technologies.  facilitate collaboration between scientists (e.g., with the help of Web services).  Services computing technology enables scientists to expose data and computational resources wrapped as publicly accessible Web services

Scientific workflows vs. Business workflows  Scientific “Workflows” – Dataflow and data transformations – Data problems: volume, complexity, heterogeneity – Grid-aspects Distributed computation Distributed data – User-interactions/WF steering – Data, tool, and analysis integration  Dataflow and control-flow are married!  Business Workflows – Process composition – Tasks, documents, etc. undergo modifications (e.g., flight reservation from reserved to ticketed), but modified WF objects still identifiable throughout – Complex control flow, task-oriented: travel reservations; credit approval  Dataflow and control-flow are divorced!

Business and scientific Lifecycles

Scientific workflow limitations  Scientific workflow life cycle: scientits’perspective  Reflects how scientits actually work-trial and error fashion  Hidden technical details  « no » deployment phase  Operations to control workflow execution  Monitoring is the visualisation of the results only

Scientific workflow and the scalability  Service-Oriented Workflows on Cloud Infrastructures for reusing.  The service-oriented paradigm will allow large-scale distributed workflows to be run on heterogeneous platforms and the integration of workflow elements developed by using different programming languages. Web and Cloud services are a paradigm that can help to handle workflow interoperability,

BPaas vs SPaaS  Business process vs swf outsourcing to take advantage of the Cloud computing model. Reusing process fragments to develop process-based service compositions and adapt the new swf according to the scientists (reusing a partial differential equation program)  privacy risks aware.  Sharing scientific process fragments and hide provenance.  Many works studied data provenance but not hide provenance.  Formal model of BPaaS vs Scientific Process as a Service

Decomposition of business process vs scientific process

Identification of fragments June 24-28, 2012IEEE ICWS Hawaii, USA18

Development of processes June 24-28, 2012IEEE ICWS Hawaii, USA19

Provenance and privacy in SPaaS An adversary (a curious) can discover the provenance of the reused process fragments. Can infer connections between end-users and scientists that outsource fragments to the Cloud. No related work!

Formal model Business Process: business graph. [Beeri,VLDB’06] Process Fragment: business subgraph. August 31, 2012Cloud-I / VLDB 2012 workshop - Istanbul BPaaS: a finite set of business processes. Reusing Function.

Anonymous Views June 24-28, 2012IEEE ICWS Hawaii, USA22 View on SPaaS : A set of process fragments having the same objective ( c alled clones). Anonymous View on SPaaS: View on SPaaS having at most K clones. Objective : Make it hard for an adversary to know the provenance of a reused process fragment. (Anonyfrag)

Let P a business process to be developed in the SPaaS For each Process Fragment F in P Verify if F exists in the SPaaS P’ a process-based service compositions developed in the SPaaS Reuse F to develop P exist s not exists

Workshop organisation 1st Workshop on LOng term Preservation for big Scientific data (LOPS) to be held in conjunction with ICDE 2014, Mach 31-April 4, Chicago, IL, USA Lipade.math-info.univ-paris5.fr/lops/