University of Michigan Medical School 1 Towards a Semantic Web application: Ontology-driven ortholog clustering analysis Yu Lin, Zuoshuang Xiang, Yongqun.

Slides:



Advertisements
Similar presentations
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Advertisements

Asking translational research questions using ontology enrichment analysis Nigam Shah
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical.
Gene Ontology John Pinney
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural Biology and Biocomputing Programme 1 Osvaldo Graña, CNIO Distributed Annotation System (DAS) part I Osvaldo Graña VIII.
Zuoshuang “Allen” Xiang, Yongqun “Oliver” He University of Michigan Medical School 12/08/2010.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
He Lab Overview: Systems Biology Studies of Host-pathogen Interactions and Vaccines (Workshop on Data, Text, Web & Social Network Mining) Yongqun “Oliver”
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
1 Introduction to (Geo)Ontology Barry Smith
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers Richard.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
Metadata management and statistical business process at Statistics Estonia Work Session on Statistical Metadata (Geneva, Switzerland 8-10 May 2013) Kaja.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Cells of E. coli bacteria Allow cells to respond to changes in their environment Only make proteins when they are needed.
Networks and Interactions Boo Virk v1.0.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Baba Piprani (SICOM Canada) Robert Henkel (Transport Canada)
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
1 MFI-5: Metamodel for Process models registration HE Keqing, WANG Chong State Key Lab. Of Software Engineering, Wuhan University
Contributions of the Vaccine Ontology (VO) to Immunology Research and Public Health (Buffalo Presentation, 6/11/2012)
A COMPREHENSIVE GENE REGULATORY NETWORK FOR THE DIAUXIC SHIFT IN SACCHAROMYCES CEREVISIAE GEISTLINGER, L., CSABA, G., DIRMEIER, S., KÜFFNER, R., AND ZIMMER,
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Sunday, July 22, 2012 Plan Areas of coverage: high-level neurological system process, inc. sensory perception, sensory processing, cognition transmission.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
7 Systems Analysis and Design in a Changing World, Fifth Edition.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Protein and RNA Families
Core 2: Bioinformatics NCBO-Berkeley. Core 2 Specific Aims 1.Apply ontologies  Software toolkit for describing and classifying data 2.Capture, manage,
What is an Ontology? A representation of knowledge in a domain In theory Thomas Gruber (1993) “An ontology is a formal, explicit specification of a shared.
Statistical Testing with Genes Saurabh Sinha CS 466.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
GO enrichment and GOrilla
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
The views expressed herein are those of the author and should not be attributed to the IMF, its Executive Board, or its management Meeting the Future Demands.
Importing KEGG pathway and mapping custom node graphics on Cytoscape Kozo Nishida Keiichiro Ono Cytoscape retreat 2010 University of Michigan Jul 18, 2010.
Preliminary Exploratory Data Analysis for Ruth’s G2P Workflow Bernice Rogowitz 4/30/2010.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
© Tata Consultancy Services ltd.12 June Metadata and Data Standards Levels of Metadata C. Anantaram Innovation Lab.
METADATA MANAGEMENT AT ISTAT: CONCEPTUAL FOUNDATIONS AND TOOLS Istituto Nazionale di Statistica ITALY.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Networks and Interactions
Development of an interactive pipeline for Genome wide association analysis Falola Damilare & Adigun Taiwo – Covenant University Bioinformatics research.
gene-to-gene relationships & networks
GSEA-Pro Tutorial Anne de Jong University of Groningen.
CCNT Lab of Zhejiang University
Stanford Medical Informatics
Mental Functioning and the Gene Ontology
Statistical Testing with Genes
Department of Genetics • Stanford University School of Medicine
Genome Annotation Continued
Karthi Sivaraman Department of Molecular and Microbiology,
OBI – Standard Semantic
Business Process Management and Semantic Technologies
Statistical Testing with Genes
Presentation transcript:

University of Michigan Medical School 1 Towards a Semantic Web application: Ontology-driven ortholog clustering analysis Yu Lin, Zuoshuang Xiang, Yongqun He

University of Michigan Medical School2 Outline ► Background of COG (Clusters of Orthologous Groups ) database ► COG-based gene set enrichment analysis ► COG Analysis Ontology (CAO) ► OntoCOG, the semantic web application for COG enrichment analysis

University of Michigan Medical School3 Ortholog & COG database ► ortholog : Orthologs are genes in different species that have evolved from a common ancestral gene by speciation. Orthologs usually share the same functions in the course of evolution. ► COG database: 1) collections of orthologs 2) clusters orthologs to functional groups. ► Entry in COG has COG ID, or may have a functional category assignment.

COG vs. GO ► ► Same: Classified categories with gene product assigned, provide gene function annotation and classification. ► ► Different:   Categories   Species: GO: model animals; COG: 66 genomes. (COG covers more bacteria.)   Only Schizosaccharomyces pombe (fission yeast), Saccharomyces cerevisiae (baker's yeast) and E. coli, have both COG and GO annotations.   In Brucella, only one gene BMEI0467 in B. melitensis has been annotated both in GO and COG. GO: : protein homodimerization activity COG0408: Coproporphyrinogen III oxidase (Coenzyme transport and metabolism H)protein homodimerization activity

University of Michigan Medical School5 COG enrichment analysis Given list Not given list Total catAqm-qM Not catA k-qt-m-(k-q)t-m totalKt-kT Fisher’s Exact Test Contingency table COG enrichment analysis is to find out the statistical significance of the distribution of the data, particularly, the p-value to test whether COG category catA annotated protein q is enriched (unevenly distributed) among the given protein list t. Given a list of k COG annotated proteins with a total of t proteins, for a given COG category A, there are q proteins within k and m proteins within t associated with it.

University of Michigan Medical School 6 A lot of GO enrichment analysis services are available, but not the COG enrichment analysis service

University of Michigan Medical School7 Design of OntoCOG OntoCOG : a Semantic Web service application for COG enrichment analysis. Input data: a list of protein defined by user for COG enrichment analysis Output data: proteins grouped by COG category with a p-value in OWL format. CAO (COG Analysis Ontology) supported

University of Michigan Medical School8 COG Analysis Ontology) COG Analysis Ontology (CAO) ► ► Scope 1) ontology-based software/service design; 2) supporting data integration and exchange in OWL format. ► ► Domain statistical analysis protein’s COG annotation

Design of CAO CAO includes models for major components of the OntoCOG application: input data transformation, Fisher’s exact test analysis, and minimum information of output data. Terms in yellow, light purple, and green boxes denote processes, generically dependent continuants, and independent continuants, respectively.

University of Michigan Medical School10 COG Analysis Ontology) : Core Terms COG Analysis Ontology (CAO) : Core Terms

Information captured by CAO ► ► The given list ► ► The proteins grouped by COG categories ► ► The size of each category in the given list * ► ► The p-value of each category in the given list * It captures more information than traditional COG enrichment analysis (non-SW technology supported) The traditional output of COG enrichment analysis. Format: Category: p-value; size; (* denotes p-value < 0.05 significant)

University of Michigan Medical School12 New relations in CAO ► denoted_by  describes a relation of an independent entity and a data item an independent entity “denoted_by” a data item  Not a reverse relation of “denotes” ► is_member_of has_member  Reverse relations  Describes relation of object and object aggregate

University of Michigan Medical School13 Axioms in CAO ► COG category clustered protein, user defined ≡ user defined protein and (denoted_by some COG Functional category) ► COG category E clustered protein, user defined ≡ COG category protein and (denoted_by min 1 COG Amino acid transport and metabolism) ► COG category E clustered protein group, user defined ≡ protein group and (has_member only COG category E protein)

University of Michigan Medical School14 Validation of CAO

Summaries on CAO ► ► An ontology to represent COG enrichment analysis ► ► An ontology to represent the COG enrichment analysis service : OntoCOG ► ► It is a use case of IAO (Information Artifact Ontology) and OBI (Ontology for Biomedical Investigation) ► ► It supports OntoCOG.

University of Michigan Medical School16 OntoCOG

University of Michigan Medical School17 OntoCOG analysis of Brucella virulence factors

Result

University of Michigan Medical School19 Final Conclusion ► OntoCOG provide a platform independent server for COG enrichment analysis ► CAO ontology supports the design and workflow of OntoCOG. ► OntoCOG is the first semantic web application used for such purpose. ► Future work: interface developing; expand to other statistical analysis; output data visualization.

Acknowledgement University of Michigan Medical School20 The OntoCOG project is supported by NIH grant 1R01AI People: Yu Lin Yongqun “Oliver” He Zuoshuang “Allen” Xiang Special thanks to ICBO Committee Thank Dr. Barry Smith for correcting the English in our manuscript.