Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan

Similar presentations


Presentation on theme: "IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan"— Presentation transcript:

1 IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan dennisq@us.ibm.com

2 IBM Watson Research © 2004 IBM Corporation Problems in bioinformatics  Myriad of public databases have specific facets of information about biological objects of interest (e.g., proteins, genes, etc.)  Databases have their own access protocols, data formats, naming conventions, and means of describing relationships between objects in different databases  Different software required to view information from different databases –User must be keenly aware of which tool or site to use –Relevant information comes in fragments –Exploration process is discontinuous

3 IBM Watson Research © 2004 IBM Corporation A common naming convention: LSID URNs  Life Sciences Identifiers (LSIDs) are URNs for biological objects that are backed by RDF metadata: –E.g., urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:nm_001240  LSID and LSID protocol (SOAP-based) specification sponsored by I3C and undergoing standardization by OMG  Most of the publicly available bioinformatics databases available via LSID today –PDB LSID authority online; “proxy” LSID authorities for databases such as NIH databases, SwissProt hosted by I3C  Really easy to set up LSID clients and servers –IBM Internet Technology group provides Open Source LSID client and server software for a variety of languages and platforms

4 IBM Watson Research © 2004 IBM Corporation RDF/XML: on demand data integration human hemoglobin LSID oxygen transport protein atagccgta cctgcgagt ctagaagct derives from is a human hemoglobin LSID has 3D structure GenBank Gene Ontology PDB human hemoglobin LSID atagccgta cctgcgagt ctagaagct derives from oxygen transport protein is a has 3D structure Unified view + +

5 IBM Watson Research © 2004 IBM Corporation Haystack: letting users interact with their data  Haystack is a tool for creating, exploring, and organizing information: –Personal information: e-mails, contacts, documents, etc. –Bioinformatics: proteins, publications, genes, etc.  Research project originating from MIT CSAIL  Uses RDF as an underlying data model  Built on Java and Eclipse, IBM’s Open Source rich client platform http://haystack.lcs.mit.edu/

6 IBM Watson Research © 2004 IBM Corporation Browsing highly interconnected information  Single screen presents multiple facets of a single object originating from separate databases  Users navigate space like a Web browser: hyperlinking, drag and drop, etc.

7 IBM Watson Research © 2004 IBM Corporation Personalization  People keep track of their information by personalizing their workspaces: –Grouping paperwork into folders –Highlighting important text in documents –Attaching sticky notes as reminders –Jotting down lists of related items  Haystack has pervasive support for annotation and allows users to group related objects together arbitrarily for their own purposes

8 IBM Watson Research © 2004 IBM Corporation BioHaystack  BioHaystack: application of Haystack technologies to bioinformatics problem –Integrated environment for working with biological data –Intended for end users, i.e., non-programmers –Builds on LSID, RDF, and Haystack  Integration offers the promise of lowering barriers to access to different backend systems (e.g., LSID servers, Grids, Web Services, relational databases, annotation servers)  Just as the Web browser acts as a client for Web content, BioHaystack can act as a client for biological Semantic content and services

9 IBM Watson Research © 2004 IBM Corporation Real world collaboration: myGrid  UK-funded joint project with the University of Manchester and other UK research institutions  RDF-based platform for supporting e-Science experiments  Real use cases; developed in collaboration with bioinformaticians  myGrid creates LSIDs and RDF metadata in the process of enacting experiments for scientists  Using BioHaystack as a browser for metadata

10 IBM Watson Research © 2004 IBM Corporation Registry mIR Discovery View Haystack Provenance Browser FreeFluo Enactor Taverna WF Builder Pedro Annotation tool Ontology Store Others WSDL Soap- lab Interface Description Annotation/description Annotation providers Query & Retrieve Workflow Execution Store data/ knowledge Scientists Bioinformaticians invoking Query & register Service Providers Data descriptions Vocabulary myGrid Architecture Courtesy of Professor Carole Goble, University of Manchester

11 IBM Watson Research © 2004 IBM Corporation BioHaystack + myGrid Courtesy of Professor Carole Goble, University of Manchester

12 IBM Watson Research © 2004 IBM Corporation Thank you for your attention  Dennis Quan, dennisq@us.ibm.com (IBM Watson Research)  Haystack project home page (download available May 24) –http://haystack.lcs.mit.edu/  IBM LSID home page –http://www.ibm.com/developerworks/oss/lsid/  myGrid home page –http://www.mygrid.org.uk/  See also our session on constructing Haystack applications: –Developer’s Day, Saturday, 4:30pm


Download ppt "IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan"

Similar presentations


Ads by Google