Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), 532-551 Genome Analysis.

Similar presentations


Presentation on theme: "Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), 532-551 Genome Analysis."— Presentation transcript:

1 Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), 532-551 Genome Analysis Paper Presentation March 24, 2005

2 Presentation Overview  Why the need to integrate  Definitions (“MW”s)  Biologists’ burden  What is TAMBIS  The TaO  Brains of TAMBIS  What makes TAMBIS “service-oriented”?  GRAIL  TAMBIS Architecture  What can you do at TAMBIS?  Related Work  More current Work  Ongoing challenges for integration

3 Why the need to Integrate?  The Molecular Biology Database Collection has 500+ resources 719 in 2005 NAR DB issue Adding ~150 in the past two years  Independent development and differing scopes  heterogeneous formats, interfaces, input, outputs  Most popular resources : DNA and Protein sequences (GenBank, Swiss-Prot) Genome data (ACeDB) Protein structure and motifs (PDB, PROSITE) Similarity searching (BLAST)

4 Definitions (MW*)  Extensional coverage : number of entries / instances covered by the source  Intensional coverage : number of information fields / meta-data in each source  Description Logic : A family of knowledge representation languages which can be used to represent the terminological knowledge of an application domain in a structured and formally well- understood way.  CPL (Collection Programming Language) : A functional multidatabase language; models complex data types such as lists, sets, and variants with drivers (wrappers) that execute requests over data sources * MW = “misunderstood word” (from a Montessori class)

5 Definitions (MW*)  Terminology server : Encapsulates the reasoning services associated with the Description Logic, supporting concept reasoning, role sanctioning, thesaurus, extrinsics services  Sanctioning : Capability of inferring more (biological) concepts by way of compositional constraints encompassed in the ontology  Ontology : An explicit formal specification of how to represent the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them. * MW = “misunderstood word” (from a Montessori class)

6 Biologists’ burden  Construct a view of the meta-data  Resolve structural and semantic differences in the information  Locate and communicate with the sources  Interoperate between resources  Transformation process …. “fragile” process…. undoubtably specialized

7 TAMBIS  A prototype mediation system Designed to lessen the burden as described previously Service-oriented Based on an extensive source-independent global ontology of molecular biology and bioinformatics Represented in a Description Logic Managed by a terminology server  A mixed top-down and bottom-up iterative methodology  Providing a single access point for biological information sources around the world

8 Emphasis of TAMBIS  High transparency  Read-only access  Retrieval-oriented architecture Efficiency and correctness  Heterogeneity management  Visual query interface

9 Features of TAMBIS  Very rich domain ontology (1,800 biological concepts)  Web-based… Query formation Ontology browsing  Query translation and planning process  More than GO, more than SRS

10 The TaO  Aim is to capture biological and bioinformatics knowledge in a logical conceptual framework  Constraints… or features… Only biologically sensible concepts classify correctly Can encompass different user views Makes biological concepts and their relationships computationally accessible  Could have used another ontology but this one was developed concurrently for TAMBIS

11 The TaO

12 Current state of TaO  Big Model Covers proteins, nucleic acids, their components, function, location, publishing  Baby model (Baby TaO) Covers only the protein subset of the big model Used for the “fully functional version” of TAMBIS  Reconciled model Merged version of the big and baby TAMBIS ontologies

13 Brains of TAMBIS  … Query translation and planning process  “A concept formed as a query is resolved when its extension is retrieved” Sample query, Protein which hasFunction Receptor  Takes a query phrased in terms of the conceptual layer and converts it into an executable plan in terms of the classes and methods of the physical layer.  Plans an efficient way of executing a query i.e., evaluates the alternatives paths  The various resources do not need to provide query language interfaces

14 (Definitions revisited) concept relationship

15 What makes TAMBIS “service- oriented”?  Reasoning services for description logics Subsumption Classification Satisfiability Retrieval  Sanctioned term construction  Querying  Terminology Services

16 (Definitions revisited) sanction subsumption

17 An example of subsumption

18 GRAIL  A concept modelling language  A Description Logic in the KL-ONE family….  In this case, used to describe biological concepts  Two major services provided : Supporting transitive roles, role hierarchies, a powerful set of concept assertion axioms Novel multilayered sanctioning mechanism

19 TAMBIS Architecture  Three layers (“models”) Physical Conceptual Mapping  Five components Ontology of biological terms (A) Knowledge-driven query formulation interface (B) Sources and Services Model linking the biological ontology with the source schemas (C) Query transformation rewriting process (D) Wrapper service dealing with external sources (E)

20 Query translation

21 What can you do at TAMBIS?  Browse the ontology  Build a query with a visual interface and reference to an ontology  Give values to concepts (for a query)  Identify desired concepts as results  Bookmark your queries

22 Ontology browser

23 Specific questions for TAMBIS  Find human homologues of yeast receptor proteins  Find rat proteins that have a domain with a seven- propeller domain architecture  Find the binding sites of human enzymes with zinc cofactors …. How many sources are involved per question? …. How difficult to find these answers without integration?.... For someone unfamiliar with the resources?

24 TAMBIS Overview Natural language : Select motifs for antigenic human proteins that participate in apoptosis and are homologous to the lymphocyte associated receptor of death (also known as lard). TAMBIS Translation : Select patterns in the proteins that invoke an immunological response and participate in programmed cell death that are similar in their sequence of amino acids to the protein that is associate with triggering cell death in the white cells of the immune system. Concept expression in GRAIL : Motif which <isComponentOf (Protein which <hasOrganismClassification Species FunctionsInProcess Apoptosis HasFunction Antigen isHomologousTo Protein which )>)> (Species given value “human” and ProteinName given value “lard”)

25 Related Work  Closest work : Object-Protocol Model (OPM) No source transparency  SRS, Entrez, BioNavigator Does not handle as complex queries TAMBIS is query based, these are clicking-based  BioKleisli, DiscoveryLink Middleware solutions, TAMBIS sits on top of this  Carnot General rather than detailed ontology

26 More current work  DAML + OIL (new DL for TAMBIS) DARPA Agent Markup Language – provides a rich set of constructs to create ontologies and to markup information so that it is machine-readable  CPL/BioKleisli (wrapper language) replaced by DiscoveryHub (commercial)  GO – more completely and widely used  Protégé OWL Ontology editor for the Semantic Web  BioMOBY, BioConductor Complementary systems

27 Ongoing challenges to integration  Evaluation Technical efficiency User usability  Changing underlying resources Resources disappear Changes in popularity MAINTENANCE …. Widespread acceptance and use?

28 References  Goble, C.A. et al. (2001) “Transparent access to multiple bioinformatics information resources.” IBM Systems Journal. 40(2), 532-551.  Baker, P.G. et al. (1999) “An ontology for bioinformatics applications.” Bioinformatics. 15(6), 510-520.  Ontology definition : dli.grainger.uiuc.edu/glossary.htm  Description Logic defn : www.absoluteastronomy.com/encyclopedia/D/De/Descriptio n_Logic.htm  TAMBIS website : http://imgproj.cs.man.ac.uk/tambis/


Download ppt "Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), 532-551 Genome Analysis."

Similar presentations


Ads by Google