Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.

Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU

Introduction Provenance Systems –Provenance is considered as a metadata that keeps the record of the origin and history of a target object. –The metadata contains the log of each step in sourcing, moving, and processing the object. –Keeps the record of transformation steps on target object –Provides information related to recreation of object –Helps in maintaining the quality and reliability of object –Provide trust mechanism on object for its use in simulation and experiments

Introduction (2) Open Distributed Information Systems –Information and sequence of steps performed are distributed among information systems that are independent and could be under different administrative controls –Nodes can be heterogeneous –Now widely used in collaboration and information sharing –Requires open access (read/write) to digital artifact Web 2.0 (blogs, Wikipedia,etc) Grids and Cloud Computing

Our Problem Main Problem –To propose and develop provenance system for open distributed environment

Our Problem Main Problem –To propose and develop provenance system for open distributed environment Research Question –How can we develop provenance model for an information system in open distributed environment Hypothesis 1.Provenance model for an information system in an open distributed environment can be developed by incorporating agents to autonomously track the interactions. 2.Providing provenance ontology enables the provenance representation in RDF graphs to work in a heterogeneous environment. 3.The use of ontology and RDF graphs will also make the system domain independent.

Motivation & Justification Most of the existing provenance systems track data only –The definition of data is now changing –Information portals in open environment can contain data, document and information –Tagged representation in XML reduces the gap between data and document Most of the existing provenance systems are specialized (domain dependent) –Open distributed systems should be able to accommodate any kind of information -- Generic The existing systems are not Autonomous –They require to change in operating systems or work flows in order to track provenance Most of the existing provenance systems do not give importance to Heterogeneity –It is one of the important factor to be considered in open distributed systems

Research Issues Provenance Tracking, Representation and Storage in open distributed systems lead to following research challenges –Autonomousity –Domain Independent –Heterogeneity –Scalability and Efficiency –Genericity –Mobility –Privacy & Security

Proposed Solution As a testbed we developed an XML based Information System –XML page contains information contributed by different sources and used by different usersXML page contains information contributed by different sources and used by different users –Each interaction is merged with main XML page using Agents –Provenance of each interaction is tracked using Multi Agent SystemsProvenance of each interaction is tracked using Multi Agent Systems –Provenance logs are represented in RDF Graphs as TriplesProvenance logs are represented in RDF Graphs as Triples –The logs are stored in distributed locationsThe logs are stored in distributed locations

Proposed Solution Generic –Research Question (1): Can we develop a provenance system that can track not only data but also other digital objects. –Most of the existing systems work for data only For example they use RDBMS as underlying storage mechanisms The provenance model should be generic that can accommodate data, documents and other digital artifacts –Semantic Grid based techniques can play its role XML reduces the gap between data and documents due to tagged representation All data formats are translated to XML in information system Our provenance tracking system will track the interactions performed as XML tree

Proposed Solutions Autonomousity –Research Question (Sub problem 1) Can we develop a model that does not require to change or adapt OS, language platform or workflow application to track provenance? –To provide automated and autonomous tracking –Almost all the systems are dependent on APIs, OS routines, workflows etc to track provenance which is not recommended for open systems like grids since one can’t change OS or Workflows to use the provenance aware information service –Multi Agent based systems can be used to provide autonomous nature –Only one work uses MAS to track data provenance for their Health care system (specialized domain) –MAS based system will provide the best autonomous system among other options

Proposed Solution Heterogeneity –Research Question (2) Can we develop a provenance system that can track the transformation steps in heterogeneous nodes of open distributed system. –The system should record and track provenance even for heterogeneous nodes Device Heterogeneity Platform Heterogeneity Semantic (Schema) Heterogeneity –JVM based implementation will provide heterogeneity at device and platform –Semantic Heterogeneity will be solved by representing provenance metadata in RDF triples as graphs XML and RDF are standards according to W3C for all systems and devices Requires to develop RDF vocabulary for Provenance – Ontology JVM, XML and RDF based provenance model will make our system Domain Independent

Proposed Solution Scalability –Research Question (3) Can we make provenance storage and tracking scalable? –The tracking system should be Scalable in case of increasing number of users in open distributed system The simultaneous recording through agents will make the tracking scalable. Each node is responsible for autonomously tracking the interaction –The scalable storage system depends on the location of provenance store containing log With the target or separate ?? Centralized or Decentralized Decentralized system will be scalable –RDF graphs will reside on some other node »No single node will be over utilized Problem: This solution will cost efficiency !! Another solution is to store sub graphs at the local host instead of combining and merging sub graphs into one

Proposed Solution Efficiency –Research Question (4) With the propose solution of scalability, can we adapt efficiency in our system for fast retrieval of provenance metadata scattered around the system The solutions of scalability costs the overhead of low efficiency –Extra time required to search for RDF graphs –Some lookup tables will be required. Solution –Each digital artifact must be given unique ID like URI –Unique IDs should compose of binary strings –Lookup table will use these binary strings for fast retrievals Can use our own developed ID system A trie based indexing scheme can be used –Requires very small number of entries to store large strings –Depends on the width of strings not the total values  O(w) where w is independent of n (number of IDs). –Single RDF graph should be maintained for multiple copies

Current Progress A prototype application is developed that is serving as a testbed for information system on open distributed environmentA prototype application is developed that is serving as a testbed for information system on open distributed environment The system can track provenance log in RDF file that is merged in single main RDF graph that keeps that track of informationThe system can track provenance log in RDF file that is merged in single main RDF graph that keeps that track of information Dublin Core is used as an ontology for provenance Both the contribution to information and provenance metadata are transmitted through Aglets An ID system is developed to label the digital artifact Scalability analysis is performed on distributed tightly coupled provenance store

Results The earlier results are showing that Provenance log is independent of file size The logs are dependent on interactions Our storage algorithm has some limitations. Logs are converging at one place

Contribution towards Provenance A Knowledge Provenance Architecture Open Distributed Systems Autonomous Provenance Recording in Heterogeneous nodes A Scalable Provenance Storage System Semantic Heterogeneity of Provenance System using Provenance Ontology A Domain Independent Provenance System

Publications Syed Imran Jami and Zubair A. Shaikh, "A workflow based academic management system using multi agent approach", Proceedings of the 11th WSEAS International Conference on Computers, Agios Nikolaos, Crete Island, Greece, Pg 202-207, Year of Publication: 2007, ISSN:1790-5117 Imran Jami and Zubair A. Shaikh, "A Multi Agent based Architecture for Data Provenance in Semantic Grid", Proceedings of International Multi-Conference of Engineers and Computer Scientists, Hong Kong, Pg 360-364, Year of Publication: 2008, ISBN: 978-988-98671-8-8 Syed Imran Jami, Jemal Abawajy, Zubair A. Shaikh, “A Taxonomy of Provenance Models for Open Distributed Systems”, Submitted in Journal of Information Sciences, Elsevier Publisher, Impact Factor 2.147 Syed Imran Jami, Jemal Abawajy, Zubair A. Shaikh, “Information Provenance for Open Distributed Collaborative System”, About to submit in ACS high impact conference.

Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.

Similar presentations

Presentation on theme: "Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.

Similar presentations

Presentation on theme: "Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU."— Presentation transcript:

Similar presentations

About project

Feedback