Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Semantic empowerment of Health Care and Life Science Applications WWW 2006 W3C Track, May WWW 2006 W3C Track, May Amit Sheth LSDIS LabLSDIS.
Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.
Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery K. Baclawski Northeastern University.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
GOAT: The Gene Ontology Annotation Tool Dr. Mike Bada Department of Computer Science University of Manchester
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Ontology Notes are from:
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
KEGG: Kyoto Encyclopedia of Genes and Genomes Susan Seo Intro to Bioinformatics Fall 2004.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Knowledge Enabled Information and Services Science GlycO.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory A. Henson, and Amit P. Sheth Kno.e.sis Center, Ohio Center of Excellence on Knowledge-enabled Computing,
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Common parameters At the beginning one need to set up the parameters.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Dimitrios Skoutas Alkis Simitsis
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
Knowledge Enabled Information and Services Science SAWSDL: Tools and Applications Amit P. Sheth Kno.e.sis Center Wright State University, Dayton, OH Knoesis.wright.edu.
Knowledge Enabled Information and Services Science Glycomics project overview.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
From Domain Ontologies to Modeling Ontologies to Executable Simulation Models Gregory A. Silver Osama M. Al-Haj Hassan John A. Miller University of Georgia.
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Mining the Biomedical Research Literature Ken Baclawski.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
APPLICATION OF ONTOLOGIES IN CANCER NANOTECHNOLOGY RESEARCH Faculty of Engineering in Foreign Languages 1 Student: Andreea Buga Group: 1241E – FILS Coordinating.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース PACDB - Pathogen Adherence to Carbohydrate Database The Pathogen Adherence.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Scientific Reproducibility using the Provenance for Healthcare and Clinical Research Framework Satya S. Sahoo Collaborators/Co-Authors: Joshua Valdez,
LSDIS Lab, Department of Computer Science,
Semantic Visualization
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Bioinformatics Capstone Project
Amit Sheth LSDIS Lab & Semagix University of Georgia
Ontology.
ece 627 intelligent web: ontology and beyond
Ontology.
Collaborative RO1 with NCBO
Presentation transcript:

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. SahooSatya S. Sahoo, Chris Thomas, Amit P. Sheth, William S. York, Samir TartirAmit P. Sheth PaperPaper Presented at 15 th International World Wide Web Conference, Edinburgh, Scotland May 25, 2006

Outline Background Ontology Structure Ontology Population: Knowledge base Ontology Size Measures Applications in Semantic Bioinformatics Conclusions

Study of structure, function and quantity of complex carbohydrate synthesized by an organism Carbohydrates added to basic protein structure - Glycosylation Folded protein structure (schematic) Background: glycomics

Outline Background Ontology Structure Ontology Population: Knowledge base Ontology Size Measures Applications in Semantic Bioinformatics Conclusions

Requirements from ontologies Storing, sharing of data + reasoning over biological data logical rigor Expressive as well as decidable language OWL-DL Incorporation of real world knowledge ontology population Ensure amenability to alignment with existing bio-medical ontologies

Challenge – model hundreds of thousands of complex carbohydrate entities But, the differences between the entities are small (E.g. just one component) How to model all the concepts but preclude redundancy ensure maintainability, scalability GlycO ontology

N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15: D -GlcpNAc - D -Manp -(1-4)- - D -Manp -(1-6)+ - D -GlcpNAc -(1-2)- - D -Manp -(1-3)+ - D -GlcpNAc -(1-4)- - D -GlcpNAc -(1-2)+ GlycoTree

Two aspects of glycoproteomics: oWhat is it? identification oHow much of it is there? quantification Heterogeneity in data generation process, instrumental parameters, formats Need data and process provenance ontology-mediated provenance Hence, ProPreO models both the glycoproteomics experimental process and attendant data ProPreO ontology

parent ion m/z fragment ion m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge Ontology-mediated provenance Mass Spectrometry (MS) Data

<parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer mode=ms-ms/> Ontological Concepts Ontology-mediated provenance Semantically Annotated MS Data

Compatibility with existing Biomedical ontologies Top level classes are modeled according to the Basic Formal Ontology (BFO) approach Taxonomy of relationships and multiple restrictions per class accuracy Hence, both GlycO and ProPreO are compatible with ontologies that follow BFO approach Exploring alignment with ontologies listed at Open Biomedical Ontologies (OBO)

Outline Background Ontology Structure Ontology Population: Knowledge base Ontology Size Measures Applications in Semantic Bioinformatics Conclusions

Multiple data sources used in populating the ontology oKEGG - Kyoto Encyclopedia of Genes and Genomes oSWEETDB oCARBANK Database Each data source has different schema for storing data There is significant overlap of instances in the data sources Hence, entity disambiguation and a common representational format are needed GlycO population

[][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}} GlycO population

GlycO population

ProPreO population: transformation to rdf Scientific Data Computational Methods Ontology instances

Protein RDF chemical mass monoisotopic mass amino-acid sequence n-glycosylation concensus Protein Data amino-acid sequence Chemical Mass RDF Monoisotopic Mass RDF Amino-acid Sequence RDF Peptide RDF chemical mass monoisotopic mass amino-acid sequence n-glycosylation concensus parent protein Calculate Chemical Mass Calculate Monoisotopic Mass Determine N-glycosylation Concensus Key Protein Path Peptide Path amino-acid sequence Extract Peptide Amino-acid Sequence from Protein Amino-acid Sequence ProPreO population: transformation to rdf Scientific Data Computational Methods RDF

Outline Background Ontology Structure Ontology Population: Knowledge base Ontology Size Measures Applications in Semantic Bioinformatics Conclusions

Measures of ontology size GlycOProPreO Classes Properties (datatype & object) 8232 Property restrictions instances million assertions 19, million

Outline Background Ontology Structure Ontology Population: Knowledge base Ontology Size Measures Applications in Semantic Bioinformatics Conclusions

Pathways do not need to be explicitly defined in GlycO. The residue-, glycan-, enzyme- and reaction descriptions contain all the knowledge necessary to infer pathways Glycan structure and function Biological pathways

The N-Glycan with KEGG ID is the substrate to the reaction R05987, which is catalyzed by an enzyme of the class EC The product of this reaction is the Glycan with KEGG ID Reaction R05987 catalyzed by enzyme adds_glycosyl_residue N-glycan_b-D-GlcpNAc_13 Zooming in a little….

Semantic Web Process to incorporate provenance Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/ Sequest) Results Post- process (ProValt) OIOIOIOIO Biological Information Semantic Annotation Applications

Formalized domain knowledge is in ontologies Data is annotated using concepts from the ontologies Semantic annotations enable identification and extraction of relevant information Relationships allow discovery of knowledge that is implicit in the data Overview - integrated semantic information system

Outline Background Ontology Structure Ontology Population: Knowledge base Ontology Size Measures Applications in Semantic Bioinformatics Conclusions

GlycO uses simple canonical entities to build complex structures thereby avoids redundancy ensures maintainability and scalability ProPreO is the first comprehensive ontology for data and process provenance in glycoproteomics Web process for entity disambiguation and common representational format populated ontology from disparate data sources The two ontologies are among the largest populated ontologies in life sciences Conclusions

Data, ontologies, more publications at Biomedical Glycomics project web site: Thank You