MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.

Slides:



Advertisements
Similar presentations
Building a Semantic IntraWeb with Rhizomer and a Wiki Roberto Garcia and Rosa Gil GRIHO (Human Computer Interaction Research Group) Universitat de Lleida,
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Authz work in GGF David Chadwick
Direct Congress Dan Skorupski Dan Vingo 15 October 2008.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Triple Stores.
Audumbar. Access control and privacy Who can access what, under what conditions, and for what purpose.
1 © Talend 2014 XACML Authorization Training Slides 2014 Jan Bernhardt Zsolt Beothy-Elo
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
TeraGrid Science Gateways: Scaling TeraGrid Access Aaron Shelmire¹, Jim Basney², Jim Marsteller¹, Von Welch²,
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Elisa Bertino Purdue University Pag. 1 Security of Distributed Systems Part II Elisa Bertino CERIAS and CS &ECE Departments Purdue University.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
11 Usage policies for end point access control  XACML is Oasis standard to express enterprise security policies with a common XML based policy language.
XML Meta Documents Security Based on Extended Provisional Authorization.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
DReSS Engineering a Replay Application Based on RDF and OWL Chris Greenhalgh, Andy French, Jan Humble, Paul Tennent School of Computer Science, University.
Taverna Workbench Stuart Owen University of Mancester, UK
Web Information Systems Modeling Luxembourg, June VisAVis: An Approach to an Intermediate Layer between Ontologies and Relational Database Contents.
Practical RDF Chapter 10. Querying RDF: RDF as Data Shelley Powers, O’Reilly SNU IDB Lab. Hyewon Lim.
June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
RDF and Relational Databases
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Triple Storage. Copyright  2006 by CEBT Triple(RDF) Storages  A triple store is designed to store and retrieve identities that are constructed from.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
RDF David R Newman 15 May 2009.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Lessons learned from Semantic Wiki Jie Bao and Li Ding June 19, 2008.
1 A Medical Information Management System Using the Semantic Web Technology Networked Computing and Advanced INFORMATION MANAGEMENT, NCM '08. Fourth.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Life Science Identifiers Chris Wroe (based on material from myGrid team and IBM Life Sciences)
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Databases (CS507) CHAPTER 2.
Triple Stores.
LSIDs in Taverna Daniele Turi University of Manchester
Middleware independent Information Service
Analyzing and Securing Social Networks
Chapter 2 Database Environment Pearson Education © 2009.
Triple Stores.
Lecture 1: Multi-tier Architecture Overview
RDF David R Newman 15 July 2009.
Triple Stores.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

myGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

Components Identifiers –LSIDs Data –JDBC data store Metadata –RDF Provenance Plugin Browsing –Provenance Browser Plugin Security –Under development

LSID

LSID: Life Science Identifier URN specification in progress 5 part identifier (with optional version id) –urn:lsid: –urn:lsid:ncbi.nlm.nlh.gov.lsid.biopathways.org:genbank_gi : protocol for retrieving data and metadata about an object commitment by the provider to always return the same data for an ID

LSID (ctd) Issue – LSID Authorities Resolution – LSID Resolvers Examples – my Grid – Long Term Ecological Research Network – BioPathways Consortium

LSID (ctd 2) abstraction lightweight independent from actual storage implementation – database – file system – application both for private and public data sources

Data

Data Storage (current) Taverna can persist inputs, outputs and intermediate results in an SQL database via JDBC Optional and can be done by configuring a Baclava Data Store Allows the LSIDs of data items to be resolved against the actual data

Data Storage (future) Domain-specific databases –use outside myGrid Develop: –taverna processor for JDBC/OGSA-DAI –associated interface (cf BioMart) Users will be able to study the contents of an existing database and: –write queries that extract data from the database, where the query may be parameterised with values passed in from the workflow; –write requests that insert data from the workflow into a named table in the database.

Metadata

Metadata Generation Taverna Provenance Plugin Listen to Taverna Events –WorkflowEventListener Faithfully record them as ontological instance data –RDF graphs (one for each Taverna run)

Metadata Representation Ontology (Schema) Storage Query Browsing

Representation RDF –triples subject –predicate  object –URIs (hence easy data integration) –semantic web language –XML serialization –flexible, powerful –sets of triples gives rise to graphs

Workflow Run urn:lsid:..:wfInstance:8 runs launchedBy belongsTo urn:lsid:…:org:HY7 urn:lsid:…:person:4 urn:lsid:…:workflow:6 urn:lsid:…:processRun:84 urn:lsid:…:processRun:51 executed

Schema Ontology –RDF schema Taxonomic inferences –also available as OWL opens it up to complex reasoning

Typed Workflow Run urn:lsid:..:wfInstance:8 runs launchedBy Experimenter belongsTo Organization urn:lsid:…:org:HY7 ProcessRunWorkflowRunWorkflow Provenance Ontology runs launchedBy belongsTo executed urn:lsid:…:person:4 urn:lsid:…:workflow:6 urn:lsid:…:processRun:84 urn:lsid:…:processRun:51 executed

Storage Named RDF graphs –retrieve whole graphs (eg workflows) –implementation in NG4J (Jena + MySQL) –scalability issues Sesame2 native store –scalable –Java 5

Query RDF query languages –TriQL, SeRQL, SPARQL query languages for named RDF graphs Ontology inspection/reasoning Canned Queries –workflows with failed processes –input/output of past process runs –workflows with data changed by user

Browsing

Provenance Browsing Provenance Browser Plugin –reusing Taverna GUI components Matthew Gamble

Analysis

Provenance Analysis Comparison Aggregation etc –see work by Jun Zhao

Security

User sends LSID ref and credentials to the Access Point Access Point returns data and metadata or denies access as follows: –credentials are passed to a User Directory –User Directory passes the corresponding user to the Authorization Authority –Authorization Authority returns the user attributes in the form of a (possibly signed) SAML assertion –this assertion, together with the lsid and its corresponding metadata, is passed to the Policy Enforcement Point (PEP) –PEP uses these three inputs to form an XACML request that is passed to a Policy Decision Point (PDP) that is preloaded with an XACML Policy Set.XACML Policy Set –PDP evaluates the request against its policy set and returns an XACML response to PEP –PEP decodes the response and either allows data/metadata to be returned to the user or denies access.

myGrid XACML Policy Scenario –supervisors can access all workflows in the organization –students can access only their own workflows –blacklisted users cannot access anything See policySet.xml on myGrid wikipolicySet.xml