Photo taken by Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush!

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

BAH DAML Tools XML To DAML Query Relevance Assessor DAML XSLT Adapter.
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
XML: Extensible Markup Language
Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Ontology Notes are from:
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Presentation 7 part 2: SOAP & WSDL. Ingeniørhøjskolen i Århus Slide 2 Outline Building blocks in Web Services SOA SOAP WSDL (UDDI)
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web Ontology Language for Service (OWL-S). Introduction OWL-S –OWL-based Web service ontology –a core set of markup language constructs for describing.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
BioMoby and Taverna Tutorial. Downloading Taverna ► Taverna can be obtained from:
Web Web 3.0 = Web 5.0? The HSFBCY + CIHR + Microsoft Research SADI and CardioSHARE Projects Mark Wilkinson & Bruce McManus Heart + Lung Institute.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
ASP.NET Programming with C# and SQL Server First Edition
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Practical RDF Chapter 1. RDF: An Introduction
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
A simple overview of BioMoby Mark Wilkinson iCAPTURE Centre St. Paul’s Hospital Vancouver.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Workflows over Grid-based Web services General framework and a practical case in structural biology BioMOBY Services Enrique de Andrés.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Resource Description Framework (RDF) Course: Electronic Document Team member: Ding Feng Ding Wei Wang Ling Date:
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
Implementing computational analysis through Web services Arnaud Kerhornou CRG/INB Barcelona - BioMed Workshop IRB November 2007.
BioMoby and Taverna 2 Tutorial Mark Wilkinson, Edward Kawas, David Withers.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Martin Kruliš by Martin Kruliš (v1.1)1.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Suggestions for Galaxy Workflow Design Using Semantically Annotated Services Alok Dhamanaskar, Michael E. Cotterell, Jessica C. Kissinger, and John Miller.
Chapter – 8 Software Tools.
26/02/ WSMO – UDDI Semantics Review Taxonomies and Value Sets Discussion Paper Max Voskob – February 2004 UDDI Spec TC V4 Requirements.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Session: Towards systematically curating and integrating
Product Training Program
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
The Re3gistry software and the INSPIRE Registry
Web Web 3.0 = Web 5.0? The HSFBCY + CIHR + Microsoft Research SADI and CardioSHARE Projects Mark Wilkinson Heart + Lung Research Institute iCAPTURE.
Basic Local Alignment Search Tool (BLAST)
Tutorial 7 – Integrating Access With the Web and With Other Programs
Supporting High-Performance Data Processing on Flat-Files
A Sample Gbrowse-Moby BioMoby Browsing Session
Presentation transcript:

Photo taken by Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush!

A brief history of BioMoby Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) May 21, 2002 – Genome Canada Platform Award May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML July 18, 2002 – First Moby Client (Gbrowse Moby) June 9, 2003 – API Version 0.5 deployed 2006 – Genome Canada Platform Award Version 1.0 API submitted for publication

MOBY-DIC Chapter VII 7 th Model Organism Bring Your-own Database Interface Conference Vancouver, BC, June 2007.

The Core Ahab’s

Wendy Richard Mylah Martin Eddie

Andreas Paul Ivan Mark’s Screen…

Create an ontology of bioinformatics data-types Define a serialization of this ontology (data syntax) Create an open API over this ontology Define Web Service inputs and outputs v.v. Ontology Register Services in an ontology-aware Registry Machines can find an appropriate service Machines can execute that service unattended Ontology is community-extensible The BioMoby Plan

Gene names MOBY Central MOBY hosts & services Sequence Alignment Sequence Express. Protein Alleles … Align Phylogeny Primers Overview of BioMoby Transactions

MOBY Central Sequence Align Phylogeny Primers Overview of BioMoby Transactions Object ontology What is a sequence? A sequence is a ___ That has these features __ Discovery of services That consume things LIKE sequences!

This is SCUFL – Simple Conceptual Unified Flow Language It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…

Pipeline discovery “on the fly” No explicit coordination between providers Dynamic discovery of ~appropriate Services Automated execution of services

Some BioMoby statistics

Moby: Breadth Namespaces (data types): 418 Objects (data syntaxes): >561 Service Types (analytical categories): 112 Providers: ~50 active Service Instances: ~1200 currently “alive” –In main Moby Central server in Canada –Others in “boutique” Moby registries serving specialized communities worldwide

Moby: Clients Gbrowse_moby (M Wilkinson) PlaNet Locus_View (H Schoof, R Ernst) Blue-Jay (P Gordon) Taverna (T Oinn, M Senger, E Kawas) MOWserv (INB, Spain) Remora (S Carrere, J Gouzy, INRA) MOBYLE (B Néron, P Tufféry, C Letondal, Pasteur Inst.) SeaHawk (P Gordon)

BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

Moby Namespaces A “Namespace” is a category of identifiers –NCBI has gi numbers (gi Namespace) –GO Terms have accession numbers (GO Namespace) Namespaces indicate data’s semantic type. –GO:  a Gene Ontology Term –gi|  a GenBank record Though we are using the word “Namespace” correctly, it causes confusion! –“Namespace” in XML is tightly associated with an XML document and/or its syntax –In Moby, we are ONLY talking about data entities NOT THEIR SYNTAX

BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

The MOBY Object Ontology Syntactic types are defined by a GO-like ontology –Class name at each node –Edges define the relationships between Classes –GO used as a model because of its familiarity in the community Edges define one of three relationships –ISA Inheritance relationship All properties of the parent are present in the child –HASA Container relationship of ‘exactly 1’ –HAS Container relationship with ‘1 or more’

The Simplest Moby Data- Type Object The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation

Moby Primitives Object Integer String Float DateTime ISA 38

A Derived Data-Type Object Integer Virtual Sequence String ISA HASA Describes the semantic relationship between the Integer and the Virtual Sequence

38 38 ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC Object Integer Virtual Sequence String ISA HASA Generic Sequence ISA HASA A Derived Data-Type

38 ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC 38 ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC Object Integer Virtual Sequence String ISA HASA Generic Sequence ISA HASA DNA Sequence ISA A Derived Data-Type

Legacy file formats TBLASTN [Feb ] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25: Query= gi| (504 letters) Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters Searchingdone Score E Sequences producing significant alignments: (bits) Value gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t e-07 emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05 Containing “String” allows ontological classes to represent legacy data types

Binaries – pictures, movies MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx HTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVl bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEf MB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt Text-base64 is a Class that contains String Binaries are base64 encoded and passed in classes that inherit from text- base64 base64_encoded_jpeg ISA text/base64 ISA text/plain HASA String

With legacy data-types defined, we can extend them as we see fit annotated_jpeg ISA base64_encoded_jpeg annotated_jpeg HASA 2D_Coordinate_set annotated_jpeg HASA Description This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV Extending legacy datatypes

The same object… This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description

The same object… This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3 Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description

Cross reference types Simple –A MOBY Object Rich –Takes the form: –…Incidentally, this avoids the problem of reification that is experienced in RDF... Textual Description Textual Description...

XML Schema? The Object Ontology allows new data-types WITHOUT new flatfile formats, and without having to understand e.g. XML Schema Minimize future heterogeneity Improve interoperability without requiring schema- to-schema mapping

Object Ontology terms have semantically rich names, but this is primarily for human intuition –DNA Sequence –Annotated_GIF Object Ontology does not define the meaning of an object to the machine –No machine-readable semantics It does define the representation –SYNTAX XML Schema?

A portion of the MOBY-S Object Ontology …community-built!

BioMoby in detail MOBY Data typing system: Semantic Type MOBY Data typing system: Syntactic Type Moby Registry Queries

A Moby Central Query Give me: –Services that consume THIS data-type in THIS syntax… –…do SOMETHING LIKE THIS to it… –…and provide me THAT data-type in response

Example Find me services that –consume FASTA sequence data, –do a BLAST with it, –and provide me lists of GenBank GI numbers in return. Query can be any or all of the above criterion –Also limit by service provider and service description keyword

Remember!! Moby Registry Query INPUT TYPE | TRANSFORMATION TYPE | OUTPUT TYPE

A weakness of MOBY Service discovery is horribly flawed due to insufficiently rich semantics…

Chickens go in; Pies come out! The problem with Moby

What sort o’ pies?

Apple! The problem with Moby

The MOBY-S Service Ontology A simple ISA hierarchy… –too simple! Primitive types include: –Analysis –Parsing –Registration –Retrieval –Resolution –Conversion –Rendering

Parse_WU_Blast A slice of the Service Ontology Service Blast NCBI_Blast WU_Blast Parse_NCBI_Blast Parsing Alignment Analysis “The Exploding Bicycle” - A. Rector, U Manchester

Summary so far BioMoby uses ontologies to describe both data types and data syntaxes –This is where the interoperability comes from –These are used to match consumers with providers during service discovery BioMoby uses a simple ontology to describe bioinformatics operations –This ontology is only marginally useful

Seahawk Highlight data in your browser and drag/drop it into Moby What could be easier than that?! Paul MK Gordon and Christoph W Sensen BMC Bioinformatics 2007, 8:208

BMC Bioinformatics, in press Seahawk: A New Moby Client for Biologists Drag ‘n’ drop, highlight existing data for use with MOBY Services Paul Gordon & Christoph Sensen

Seahawk looks like a browser

How do I load data?

Use the “open” button: –Text file (e.g. FASTA sequences) –HTML page (e.g. NCBI Entrez Web page) –RTF document (e.g. conference abstract) –MOBY XML document Drag ‘n’ Drop –Web links and desktop files –Highlighted text from open documents or Web pages

Under the Hood (Beneath the Bonnet?) Data has to be converted into Moby XML format to be used by Moby Moby data has to be converted back to human-readable text for presentation to the biologist

MOB Rules

DEM Rules...

Again: How do I load data?

How do I Find Services? Right-click  MOB rules are invoked Resulting Moby XML is used for service search

How do I run a service? Click it! If necessary, a service’s extra parameters can be set Control+click submits using default params

How do I run a service? If required inputs are missing, the missing ones must be dragged into place. Unrecognized data will be rejected

How do I collate data? Seahawk clipboard lets you build collections of objects Seahawk “knows” the type of collection and will suggest appropriate Moby services

Seahawk Summary Seahawk integrates Moby Web Service discovery and execution into the biologists day-to-day “Web Surfing” activity It uses Regular Expressions and XSLT to move normal web or hard-drive-file data into and out of BioMoby

…hey… wait a minute…!! If SeaHawk can automatically convert “raw” data into Moby data and back again… …and if the majority of tools in the world use “raw” data… Q: Why can’t we automatically Mobyfy the existing tools that we all know and love? A: Because there is no way for Seahawk to know the INTENT of each field in a Web FORM! That is only knowable to a human... …but what if it could figure it out…!!

Watch and learn! Spying on biologist’s behaviour to automate Mobyfication of existing tools (coming soon to SeaHawk!!...)

Watch and Learn The biologist opens up their favorite tool inside of SeaHawk… and the SeaHawk spies on them as they use it Seahawk Proxied Web page Drag ‘n’ drop Seahawk AJAX prompting

The Process Spy on the biologist Interpret their use of the Web tool Auto-generate a series of MOB/DEM rules Auto-generate a Moby Service that utilizes these MOB/DEM rules to interact with that Web tool Auto-register that “proxy” Moby Service in Moby Central That legacy Web tool now becomes available to everyone through BioMoby

Why doesn’t Moby Use RDF/OWL?

Timeline of Moby/W3C Activities RDF Candidate Spec RDF Schema Candidate Spec W3C Launches Semantic Web (SW) Activity Group BioMoby Project Established BioMoby XML Finalized BioMoby Stable 0.85 API Published (>400 services) RDF/OWL Formal W3C Recommendations BioMoby Stable 1.0 API Published >>>>>> Extensive SW toolbuilding…

Moby 2.0 Getting it right, the second time!

What BioMoby Already Does Sequence Data BLAST SERVER Blast Hit

What BioMoby Already Does Sequence Data Blast Hit givesBlastResult Not “Bologically” Meaningful

What BioMoby Already Does Sequence Data Blast Hit hasHomologyTo URI hasHomologyTo URI …looks a lot like… Which is effectively just an RDF triple,

Now think in reverse…

(in case you forgot…) Moby Registry Query INPUT TYPE | TRANSFORMATION TYPE | OUTPUT TYPE

Moby 2.0 Sequence Data What does Have homology to? hasHomologyTo Maps to BLAST SERVICE Send data Blast Hit

Query FIND SERVICES THAT Consume Sequence Data | Provide hasHomologyTo Property | Attached to other Sequence Data

SPARQL A Semantic Web query language Queries “look like” graphs Find “X” with predicate “Y” attached to “Z”

Moby 2.0 extends the SPARQL query language SPARQL queries contain concepts and the relationships between them (subject, predicate, object) We simply map RDF predicates onto Moby services capable of generating that relationship Registry query: “What Moby service consumes [subject] and generates the [predicate] relationship type?”

But wait, there’s more!

Exploit knowledge in OWL ontologies to enhance query Subject Predicate Look up and execute Moby service Consumes proteins and generates Functional annotation info Subject Predicate Look up and execute Moby service Consumes STK or proteins and Looks-up inhibitor molecules Evaluate Query Expression

Exploit knowledge in OWL ontologies to enhance query This SPARQL query could be posed on a database of RAW, UNANNOTATED Protein sequences, and be answered by Moby 2.0 (a.k.a. CardioSHARE)

Credits Genome Canada/Genome Alberta myGrid – Carole Goble in particular Spanish National Institute for Bioinformatics (INB) through Fundación Genoma España Generation Challenge Programme (GCP) of the Consultative Group for International Agricultural Research (CGIAR) Heart and Stroke Foundation of BC and Yukon (CardioSHARE) Microsoft Research (CardioSHARE)