Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), 532-551 Genome Analysis.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Interoperability of Distributed Component Systems Bryan Bentz, Jason Hayden, Upsorn Praphamontripong, Paul Vandal.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Toward Making Online Biological Data Machine Understandable Cui Tao.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
TAMBIS Transparent Access to Multiple Biological Information Sources.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Automatic Data Ramon Lawrence University of Manitoba
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Edinburgh,UKBNCOD21 Heterogeneous Association Rules Mining Badr Al-Daihani School of Computer Science Cardiff University.
Integration of Biological Sources: Current Systems and Challenges Ahead ( Sigmod Record, Vol. 33. No. 3, September 2004 ) Thomas Hernandez & Sybbarao Kambhampati.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Integrating bio-ontologies with a workflow/Petri Net model to qualitatively represent and simulate biological systems Mor Peleg, Irene Gbashvili, and Russ.
Automatic methods for functional annotation of sequences Petri Törönen.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Knowledge representation
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
The Semantic Web William M Baker
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
7 Systems Analysis and Design in a Changing World, Fifth Edition.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Proposed NWI KIF/CG --> Common Logic Standard A working group was recently formed from the KIF working group. John Sowa is the only CG representative so.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Mining the Biomedical Research Literature Ken Baclawski.
Semantic Web BY: Josh Rachner and Julio Pena. What is the Semantic Web? The semantic web is a part of the world wide web that allows data to be better.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Object storage and object interoperability
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Improvement of Semantic Interoperability based on Metadata Registry(MDR) Doo-Kwon Baik Dept. of CSE Korea University.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Investigations of HIV-1 Env Evolution Evolutionary Bioinformatics Education: A BioQUEST Curriculum Consortium Approach Grand Valley State University August.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Web Ontology Language for Service (OWL-S)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2: Database System Concepts and Architecture
Chapter 2 Database Environment.
Investigations of HIV-1 Env Evolution
Supporting High-Performance Data Processing on Flat-Files
CIS Monthly Seminar – Software Engineering and Knowledge Management IS Enterprise Modeling Ontologies Presenter : Dr. S. Vasanthapriyan Senior Lecturer.
Toward an Ontology-Driven Architectural Framework for B2B E. Kajan, L
Presentation transcript:

Transparent access to multiple bioinformatics information sources (TAMBIS) Goble, C.A. et al. (2001) IBM Systems Journal 40(2), Genome Analysis Paper Presentation March 24, 2005

Presentation Overview  Why the need to integrate  Definitions (“MW”s)  Biologists’ burden  What is TAMBIS  The TaO  Brains of TAMBIS  What makes TAMBIS “service-oriented”?  GRAIL  TAMBIS Architecture  What can you do at TAMBIS?  Related Work  More current Work  Ongoing challenges for integration

Why the need to Integrate?  The Molecular Biology Database Collection has 500+ resources 719 in 2005 NAR DB issue Adding ~150 in the past two years  Independent development and differing scopes  heterogeneous formats, interfaces, input, outputs  Most popular resources : DNA and Protein sequences (GenBank, Swiss-Prot) Genome data (ACeDB) Protein structure and motifs (PDB, PROSITE) Similarity searching (BLAST)

Definitions (MW*)  Extensional coverage : number of entries / instances covered by the source  Intensional coverage : number of information fields / meta-data in each source  Description Logic : A family of knowledge representation languages which can be used to represent the terminological knowledge of an application domain in a structured and formally well- understood way.  CPL (Collection Programming Language) : A functional multidatabase language; models complex data types such as lists, sets, and variants with drivers (wrappers) that execute requests over data sources * MW = “misunderstood word” (from a Montessori class)

Definitions (MW*)  Terminology server : Encapsulates the reasoning services associated with the Description Logic, supporting concept reasoning, role sanctioning, thesaurus, extrinsics services  Sanctioning : Capability of inferring more (biological) concepts by way of compositional constraints encompassed in the ontology  Ontology : An explicit formal specification of how to represent the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them. * MW = “misunderstood word” (from a Montessori class)

Biologists’ burden  Construct a view of the meta-data  Resolve structural and semantic differences in the information  Locate and communicate with the sources  Interoperate between resources  Transformation process …. “fragile” process…. undoubtably specialized

TAMBIS  A prototype mediation system Designed to lessen the burden as described previously Service-oriented Based on an extensive source-independent global ontology of molecular biology and bioinformatics Represented in a Description Logic Managed by a terminology server  A mixed top-down and bottom-up iterative methodology  Providing a single access point for biological information sources around the world

Emphasis of TAMBIS  High transparency  Read-only access  Retrieval-oriented architecture Efficiency and correctness  Heterogeneity management  Visual query interface

Features of TAMBIS  Very rich domain ontology (1,800 biological concepts)  Web-based… Query formation Ontology browsing  Query translation and planning process  More than GO, more than SRS

The TaO  Aim is to capture biological and bioinformatics knowledge in a logical conceptual framework  Constraints… or features… Only biologically sensible concepts classify correctly Can encompass different user views Makes biological concepts and their relationships computationally accessible  Could have used another ontology but this one was developed concurrently for TAMBIS

The TaO

Current state of TaO  Big Model Covers proteins, nucleic acids, their components, function, location, publishing  Baby model (Baby TaO) Covers only the protein subset of the big model Used for the “fully functional version” of TAMBIS  Reconciled model Merged version of the big and baby TAMBIS ontologies

Brains of TAMBIS  … Query translation and planning process  “A concept formed as a query is resolved when its extension is retrieved” Sample query, Protein which hasFunction Receptor  Takes a query phrased in terms of the conceptual layer and converts it into an executable plan in terms of the classes and methods of the physical layer.  Plans an efficient way of executing a query i.e., evaluates the alternatives paths  The various resources do not need to provide query language interfaces

(Definitions revisited) concept relationship

What makes TAMBIS “service- oriented”?  Reasoning services for description logics Subsumption Classification Satisfiability Retrieval  Sanctioned term construction  Querying  Terminology Services

(Definitions revisited) sanction subsumption

An example of subsumption

GRAIL  A concept modelling language  A Description Logic in the KL-ONE family….  In this case, used to describe biological concepts  Two major services provided : Supporting transitive roles, role hierarchies, a powerful set of concept assertion axioms Novel multilayered sanctioning mechanism

TAMBIS Architecture  Three layers (“models”) Physical Conceptual Mapping  Five components Ontology of biological terms (A) Knowledge-driven query formulation interface (B) Sources and Services Model linking the biological ontology with the source schemas (C) Query transformation rewriting process (D) Wrapper service dealing with external sources (E)

Query translation

What can you do at TAMBIS?  Browse the ontology  Build a query with a visual interface and reference to an ontology  Give values to concepts (for a query)  Identify desired concepts as results  Bookmark your queries

Ontology browser

Specific questions for TAMBIS  Find human homologues of yeast receptor proteins  Find rat proteins that have a domain with a seven- propeller domain architecture  Find the binding sites of human enzymes with zinc cofactors …. How many sources are involved per question? …. How difficult to find these answers without integration?.... For someone unfamiliar with the resources?

TAMBIS Overview Natural language : Select motifs for antigenic human proteins that participate in apoptosis and are homologous to the lymphocyte associated receptor of death (also known as lard). TAMBIS Translation : Select patterns in the proteins that invoke an immunological response and participate in programmed cell death that are similar in their sequence of amino acids to the protein that is associate with triggering cell death in the white cells of the immune system. Concept expression in GRAIL : Motif which <isComponentOf (Protein which <hasOrganismClassification Species FunctionsInProcess Apoptosis HasFunction Antigen isHomologousTo Protein which )>)> (Species given value “human” and ProteinName given value “lard”)

Related Work  Closest work : Object-Protocol Model (OPM) No source transparency  SRS, Entrez, BioNavigator Does not handle as complex queries TAMBIS is query based, these are clicking-based  BioKleisli, DiscoveryLink Middleware solutions, TAMBIS sits on top of this  Carnot General rather than detailed ontology

More current work  DAML + OIL (new DL for TAMBIS) DARPA Agent Markup Language – provides a rich set of constructs to create ontologies and to markup information so that it is machine-readable  CPL/BioKleisli (wrapper language) replaced by DiscoveryHub (commercial)  GO – more completely and widely used  Protégé OWL Ontology editor for the Semantic Web  BioMOBY, BioConductor Complementary systems

Ongoing challenges to integration  Evaluation Technical efficiency User usability  Changing underlying resources Resources disappear Changes in popularity MAINTENANCE …. Widespread acceptance and use?

References  Goble, C.A. et al. (2001) “Transparent access to multiple bioinformatics information resources.” IBM Systems Journal. 40(2),  Baker, P.G. et al. (1999) “An ontology for bioinformatics applications.” Bioinformatics. 15(6),  Ontology definition : dli.grainger.uiuc.edu/glossary.htm  Description Logic defn : n_Logic.htm  TAMBIS website :