Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Workflow utilization in composition of complex applications based.
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
An Application-led Approach for Security-related Research in Ubicomp Philip Robinson TecO, Karlsruhe University 11 May 2005.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
On management aspects of future ICT systems Associate Professor Evgeny Osipov Head of Dependable Communication and Computation group Luleå University of.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Database Management Systems Chapter 1. Introduction What is a database? What is a database management system (DBMS)? Remind me to tell you about this:
eGovernance Under guidance of Dr. P.V. Kamesam IBM Research Lab New Delhi Ashish Gupta 3 rd Year B.Tech, Computer Science and Engg. IIT Delhi.
1 FM Overview of Adaptation. 2 FM RAPIDware: Component-Based Design of Adaptive and Dependable Middleware Project Investigators: Philip McKinley, Kurt.
Dagstuhl, February 16, 2009 Layers in Grids Uwe Schwiegelshohn 17. Februar 2009 Layers in Grids.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
Intelligent Workflow Management System(iWMS). Agenda Background Motivation Usage Potential application domains iWMS.
Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.
Environment for Management of Experiments on the Grid Master of Science Thesis AGH University of Science and Technology, Krakow, Poland Faculty of Electrical.
CGW 2003 Institute of Computer Science AGH Proposal of Adaptation of Legacy C/C++ Software to Grid Services Bartosz Baliś, Marian Bubak, Michał Węgiel,
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
DAME: Distributed Engine Health Monitoring on the Grid
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
WordFreak A Language Independent, Extensible Annotation Tool.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
A Proposal of Application Failure Detection and Recovery in the Grid Marian Bubak 1,2, Tomasz Szepieniec 2, Marcin Radecki 2 1 Institute of Computer Science,
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
PERVASIVE COMPUTING MIDDLEWARE BY SCHIELE, HANDTE, AND BECKER A Presentation by Nancy Shah.
An Approach To Automate a Process of Detecting Unauthorised Accesses M. Chmielewski, A. Gowdiak, N. Meyer, T. Ostwald, M. Stroiński
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Introduction Infrastructure for pervasive computing has many challenges: 1)pervasive computing is a large aspect which includes hardware side (mobile phones,portable.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
Yuhui Chen; Romanovsky, A.; IT Professional Volume 10, Issue 3, May-June 2008 Page(s): Digital Object Identifier /MITP Improving.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
EC-project number: Universal Grid Client: Grid Operation Invoker Tomasz Bartyński 1, Marian Bubak 1,2 Tomasz Gubała 1,3, Maciej Malawski 1,2 1 Academic.
EC-project number: ViroLab Virtual Laboratory Marian Bubak ICS / CYFRONET AGH Krakow virolab.cyfronet.pl.
Grid programming with components: an advanced COMPonent platform for an effective invisible grid © 2006 GridCOMP Grids Programming with components. An.
1 Yield Analysis and Increasing Engineering Efficiency Spotfire Users Conference 10/15/2003 William Pressnall, Scott Lacey.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Application Ontology Manager for Hydra IST Ján Hreňo Martin Sarnovský Peter Kostelník TU Košice.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.
Approaching Fine-grain Access Control for Distributed Biomedical Databases within Virtual Environments Onur Kalyoncu, Yi Pan, Matthias Assel High Performance.
An Overview of Scientific Workflows: Domains & Applications Laboratoire Lorrain de Recherche en Informatique et ses Applications Presented by Khaled Gaaloul.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Virtual Information and Knowledge Environments Workshop on Knowledge Technologies within the 6th Framework Programme -- Luxembourg, May 2002 Dr.-Ing.
OWL-S: As a Semantic Mark-up Language for Grid Services By Narendranadh.J.
ICS - Intelligent Collaboration system Simulator DSL lab, computer science faculty Technion – Israel institute of technology Supervisor: Uri Shani Michal.
Research Overview Gagan Agrawal Associate Professor.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Clouds , Grids and Clusters
Tools and Services Workshop
Similarities between Grid-enabled Medical and Engineering Applications
Introduction to Software Engineering
Knowledge Based Workflow Building Architecture
The ViroLab Virtual Laboratory for Viral Diseases
Presentation transcript:

Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Institute of Computer Science Kraków,

Outline This thesis goals Motivation Provenance introduction Requirements How to, Brief analysis System’s architecture Assumptions, Environment, Details Reference implementation Feasibility study Work status

Thesis goals Requirement’s analysis for provenance tracking in modern e-science virtual laboratories Provenance data model design for the ViroLab virtual laboratory Design of the provenance tracking system adapted for ViroLab’s requirements Reference implementation of the system ViroLab environment integration and real-world usage Usefulness study of the presented solution

Motivation Rapid development of the e-science infrastructure Semantic Grid – new direction and new challenges Limitations of current, narrow-minded provenance solutions Lack of user-oriented tools and models Full potential of e-science systems is yet to be discovered

Introduction Virtual laboratories – new tools for e-science Limitations of current solutions Fixed models  „Too user-friendly”  ViroLab EU project – overview ViroLab’s virtual laboratory and it’s approach Virusology applications in the ViroLab

Requirements – how to... ? The Challenging Task – requirements gathering and analysis Lack of example systems, users, real-world usage models... Sources Applications and users Complex, artificial scenarios State of the art – weak spots Research – Provenance Challenge

Requirements – brief list Most important - functional Actor provenance and annotations Immutable data - infinite storage Query capabilities Not to be underestimated – non-functional Scalable data storage Distributed processing – performance! Easiness of management and configuration

Architecture – assumptions Employ semantic – in data and processing Query capabilities driven by languages – XQuery XML data form = XML native storage Communication Interoperability......and performance Data store architecture impacted by the data model characteristics

Architecture - environment All components, required to achieve fully-functional provenance tracking Monitoring Middleware Event generator Querying

Architecture - details PROToS data model concepts PROToS core components Retrieval Gathering Supervising Distributed storage

Reference implementation Maven2 and components Components groups Management and configuration Run-time Compile-time Core technologies Dependency Injection container Communication XML and semantic processing

Feasibility study Provenance usage Application optimization Result management Experiment replay Querying capabilities QUaTRO Sample scenarios Drug Resistance Workflow

Feasibility study – cont. Ontologies for the Vlvl Experiment, data and domain models

Work status Goals achieved Successful VLvl integration Full architectural ground Real-world usage and feedback To be done Distributed query support  Data and node migration  Reliability – testing, testing, testing !

Collection and storage of provenance data Please visit following websites: ViroLab : VLvl : PROToS : QUaTRO :