Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Team Submission 16 May 2005 Dominique Hazaël-Massieux, Dan Connolly Summarized by.
SmartER Semantic Cloud Sevices Karuna P Joshi University of Maryland, Baltimore County Advisors: Dr. Tim Finin, Dr. Yelena Yesha.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Ontologies IS 277 Spring Outline n Ontologies n Types of ontologies n Examples n Ontology engineering n Ontology standards n Machine-readable ontologies.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Distributed Collaborations Using Network Mobile Agents Anand Tripathi, Tanvir Ahmed, Vineet Kakani and Shremattie Jaman Department of computer science.
Versus: A Web Repository Daniel Gomes, João P. Campos, Mário J. Silva XLDB Research Group University of Lisbon [dcg, jcampos, Versus is.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
January, 23, 2006 Ilkay Altintas
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Practical RDF Chapter 1. RDF: An Introduction
GRITS Working with AVM Data Astronomy Visualization Metadata June 11th, 2010 Casey Rosenthal
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
NGCWE Expert Group EU-ESA Experts Group's vision Prof. Juan Quemada NGCWE Expert Group IST Call 5 Preparatory Workshop on CWEs 13th.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
CS 405G: Introduction to Database Systems
The Semantic Web By: Maulik Parikh.
MATLAB Distributed, and Other Toolboxes
Proposal for Term Project
Joseph JaJa, Mike Smorul, and Sangchul Song
Middleware independent Information Service
Knowledge Management Systems
GSAF Grid Storage Access Framework
Model-Driven Analysis Frameworks for Embedded Systems
NSDL Data Repository (NDR)
Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad
AMGA Web Interface Vincenzo Milazzo
Technical Capabilities
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
Presentation transcript:

Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU

Introduction Provenance Systems –Provenance is considered as a metadata that keeps the record of the origin and history of a target object. –The metadata contains the log of each step in sourcing, moving, and processing the object. –Keeps the record of transformation steps on target object –Provides information related to recreation of object –Helps in maintaining the quality and reliability of object –Provide trust mechanism on object for its use in simulation and experiments

Introduction (2) Open Distributed Information Systems –Information and sequence of steps performed are distributed among information systems that are independent and could be under different administrative controls –Nodes can be heterogeneous –Now widely used in collaboration and information sharing –Requires open access (read/write) to digital artifact Web 2.0 (blogs, Wikipedia,etc) Grids and Cloud Computing

Our Problem Main Problem –To propose and develop provenance system for open distributed environment

Our Problem Main Problem –To propose and develop provenance system for open distributed environment Research Question –How can we develop provenance model for an information system in open distributed environment Hypothesis 1.Provenance model for an information system in an open distributed environment can be developed by incorporating agents to autonomously track the interactions. 2.Providing provenance ontology enables the provenance representation in RDF graphs to work in a heterogeneous environment. 3.The use of ontology and RDF graphs will also make the system domain independent.

Motivation & Justification Most of the existing provenance systems track data only –The definition of data is now changing –Information portals in open environment can contain data, document and information –Tagged representation in XML reduces the gap between data and document Most of the existing provenance systems are specialized (domain dependent) –Open distributed systems should be able to accommodate any kind of information -- Generic The existing systems are not Autonomous –They require to change in operating systems or work flows in order to track provenance Most of the existing provenance systems do not give importance to Heterogeneity –It is one of the important factor to be considered in open distributed systems

Research Issues Provenance Tracking, Representation and Storage in open distributed systems lead to following research challenges –Autonomousity –Domain Independent –Heterogeneity –Scalability and Efficiency –Genericity –Mobility –Privacy & Security

Proposed Solution As a testbed we developed an XML based Information System –XML page contains information contributed by different sources and used by different usersXML page contains information contributed by different sources and used by different users –Each interaction is merged with main XML page using Agents –Provenance of each interaction is tracked using Multi Agent SystemsProvenance of each interaction is tracked using Multi Agent Systems –Provenance logs are represented in RDF Graphs as TriplesProvenance logs are represented in RDF Graphs as Triples –The logs are stored in distributed locationsThe logs are stored in distributed locations

Proposed Solution Generic –Research Question (1): Can we develop a provenance system that can track not only data but also other digital objects. –Most of the existing systems work for data only For example they use RDBMS as underlying storage mechanisms The provenance model should be generic that can accommodate data, documents and other digital artifacts –Semantic Grid based techniques can play its role XML reduces the gap between data and documents due to tagged representation All data formats are translated to XML in information system Our provenance tracking system will track the interactions performed as XML tree

Proposed Solutions Autonomousity –Research Question (Sub problem 1) Can we develop a model that does not require to change or adapt OS, language platform or workflow application to track provenance? –To provide automated and autonomous tracking –Almost all the systems are dependent on APIs, OS routines, workflows etc to track provenance which is not recommended for open systems like grids since one can’t change OS or Workflows to use the provenance aware information service –Multi Agent based systems can be used to provide autonomous nature –Only one work uses MAS to track data provenance for their Health care system (specialized domain) –MAS based system will provide the best autonomous system among other options

Proposed Solution Heterogeneity –Research Question (2) Can we develop a provenance system that can track the transformation steps in heterogeneous nodes of open distributed system. –The system should record and track provenance even for heterogeneous nodes Device Heterogeneity Platform Heterogeneity Semantic (Schema) Heterogeneity –JVM based implementation will provide heterogeneity at device and platform –Semantic Heterogeneity will be solved by representing provenance metadata in RDF triples as graphs XML and RDF are standards according to W3C for all systems and devices Requires to develop RDF vocabulary for Provenance – Ontology JVM, XML and RDF based provenance model will make our system Domain Independent

Proposed Solution Scalability –Research Question (3) Can we make provenance storage and tracking scalable? –The tracking system should be Scalable in case of increasing number of users in open distributed system The simultaneous recording through agents will make the tracking scalable. Each node is responsible for autonomously tracking the interaction –The scalable storage system depends on the location of provenance store containing log With the target or separate ?? Centralized or Decentralized Decentralized system will be scalable –RDF graphs will reside on some other node »No single node will be over utilized Problem: This solution will cost efficiency !! Another solution is to store sub graphs at the local host instead of combining and merging sub graphs into one

Proposed Solution Efficiency –Research Question (4) With the propose solution of scalability, can we adapt efficiency in our system for fast retrieval of provenance metadata scattered around the system The solutions of scalability costs the overhead of low efficiency –Extra time required to search for RDF graphs –Some lookup tables will be required. Solution –Each digital artifact must be given unique ID like URI –Unique IDs should compose of binary strings –Lookup table will use these binary strings for fast retrievals Can use our own developed ID system A trie based indexing scheme can be used –Requires very small number of entries to store large strings –Depends on the width of strings not the total values  O(w) where w is independent of n (number of IDs). –Single RDF graph should be maintained for multiple copies

Current Progress A prototype application is developed that is serving as a testbed for information system on open distributed environmentA prototype application is developed that is serving as a testbed for information system on open distributed environment The system can track provenance log in RDF file that is merged in single main RDF graph that keeps that track of informationThe system can track provenance log in RDF file that is merged in single main RDF graph that keeps that track of information Dublin Core is used as an ontology for provenance Both the contribution to information and provenance metadata are transmitted through Aglets An ID system is developed to label the digital artifact Scalability analysis is performed on distributed tightly coupled provenance store

Results The earlier results are showing that Provenance log is independent of file size The logs are dependent on interactions Our storage algorithm has some limitations. Logs are converging at one place

Contribution towards Provenance A Knowledge Provenance Architecture Open Distributed Systems Autonomous Provenance Recording in Heterogeneous nodes A Scalable Provenance Storage System Semantic Heterogeneity of Provenance System using Provenance Ontology A Domain Independent Provenance System

Publications Syed Imran Jami and Zubair A. Shaikh, "A workflow based academic management system using multi agent approach", Proceedings of the 11th WSEAS International Conference on Computers, Agios Nikolaos, Crete Island, Greece, Pg , Year of Publication: 2007, ISSN: Imran Jami and Zubair A. Shaikh, "A Multi Agent based Architecture for Data Provenance in Semantic Grid", Proceedings of International Multi-Conference of Engineers and Computer Scientists, Hong Kong, Pg , Year of Publication: 2008, ISBN: Syed Imran Jami, Jemal Abawajy, Zubair A. Shaikh, “A Taxonomy of Provenance Models for Open Distributed Systems”, Submitted in Journal of Information Sciences, Elsevier Publisher, Impact Factor Syed Imran Jami, Jemal Abawajy, Zubair A. Shaikh, “Information Provenance for Open Distributed Collaborative System”, About to submit in ACS high impact conference.