Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 www.informatics.jax.org Mouse Genome Informatics.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Mouse Phenotype Ontology George Gkoutos. Phenotype Annotation Traditional phenotypic descriptions are captures as free text Information retrieval based.
Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine *
Mouse Genome Informatics Online Resource Joanne Berghout, PhD Oct 13,
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Disease Portals A Platform for Genetic and Genomic Research Disease and Phenotype Data in the Context of the Genome Victoria Petri, Mary Shimoyama, Andrew.
Gene Ontology John Pinney
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory.
Terry F. Hayamizu Mouse Genome Informatics, The Jackson Laboratory M OUSE A NATOMY O NTOLOGIES AND GXD.
COG and GO tutorial.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Internet tools for genomic analysis: part 2
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Mouse Genome Informatics November 2008 Paul Szauter MGI User Support.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Ontologies and vocabularies supporting data integration: emphasis on mouse phenotypes and disease model Control C3H/HeJ Homozygous Fasl gld /Fasl gld The.
PATO An ontology for phenotypes. The development of PATO is the work of George Gkoutos, supported by the NCBO, working in Cambridge.
The Plant Ontology: Linking Phenotypes and Genomics Across Plant Taxa Laurel D. Cooper* 1, Ramona L. Walls 2, Justin Elser 1, Justin Preece 1, Dennis W.
Using The Gene Ontology: Gene Product Annotation.
Managing Big Scientific Data Capturing, Integrating and Presenting Mouse Data at MGI Cynthia Smith Canberra April Mouse Genome.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Curatorial Procedures at Mouse Genome Informatics with an Emphasis on Expression Data Constance M. Smith The Jackson Laboratory Bar Harbor, ME.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
March 24, Integrating genomic knowledge sources through an anatomy ontology Gennari JH, Silberfein A, and Wiley JC Pac Symp Biocomputing 2005:
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Cell Ontology 2.0 Elimination of multiple is_a inheritance through instantiation of relationships to terms in outside ontologies, such as the GO cellular.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
The GUDMAP Database: An Online Resource for Genitourinary Research Dr. Simon Harding Stem Cells & Bioinformatics 22 nd September 2009.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
An International Centre for Mouse Genetics EuroPhenome and the International Mouse Phenotyping Consortium John Hancock MRC Harwell.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
PGA Workshop August 2003 Rat Genome Database an introduction Simon N. Twigger, Ph.D. Bioinformatics Research Center Medical College of Wisconsin, Milwaukee.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Copyright OpenHelix. No use or reproduction without express written consent1.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Core 2: Bioinformatics NCBO-Berkeley. Core 2 Specific Aims 1.Apply ontologies  Software toolkit for describing and classifying data 2.Capture, manage,
Phenotype Curation Susan R. McCouch Department of Plant Breeding Cornell University.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
MGI and Phenotyping Projects Mouse Genome Informatics.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Phenotype And Trait Ontology (PATO) and plant phenotypes
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
High throughput biology data management and data intensive computing drivers George Michaels.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
Towards a unified MOD resource: An Overview
Chicken QTL Phenotype Ontology
The Human-Mouse: Disease Connection in MGI (BETA)
Functional Annotation of the Horse Genome
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
QTL Annotation in MGI Susan M Bello, Ira Lu, Cynthia L Smith, Janan T Eppig, and the Mouse Genome Informatics Group.
Presentation transcript:

Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December Mouse Genome Informatics

Human FOXN1 forkhead box N1 T-CELL IMMUNODEFICIENY, CONGENITAL ALOPECIA, AND NAIL DYSTROPHY Frank J, et al. Nature 398, (1999) Mouse Foxn1. Homozygous “nude” mouse. One of 8 known phenotypic mutations in mouse for the forkhead box N1 gene.

Data Integration Primary literature Centers: mutagenesis, gene trap, etc Data Loads: GenBank, SNPs, clone collections, UniProt, RIKEN, etc Electronic Submissions (individual labs) Processing, QC, and curation Gather data from multiple sources Factor out common objects Assemble integrated objects

Integration is hard…not just a matter of combining data sources… Data from multiple sources can be of differing quality The same data can enter the system via various paths Naming conventions may or may not be to standards Some data sources don’t maintain unique accession numbers (or allow them to change) Periodic updates from data sources can cause problems if objects have disappeared… (or reappear) If objects have split in two

Data integration is hard “Bucketizing” establishe types of correspondence between objects in the input sets. Allows immediate incorporation of 1:1 corresponding data. Sorts conflicting data into bins that allow prioritization for curator resolution.

Data Acquisition Object Identity Standardizations Data Associations Integration with other bioinformatics resources Literature & Loads New Gene, Strain or Sequence? Controlled Vocabularies Evidence & Citation Co-curation of shared objects and concepts Annotation Pipeline

Making semantic sense Controlled vocabularies/nomenclatures Strains Genes Alleles (phenotypic or variant) Classes of genetic markers Types of mutations Types of assays Developmental stages Tissues Clone libraries ES cell lines and more… ….. organized as lists or simple hierarchies

Semantics plus relationship data Ontologies/structured vocabularies Gene Ontology (GO) Molecular function Biological process Cellular component Mouse Anatomy (MA) Embryonic Adult Mammalian Phenotype (MP) Sequence Ontology (SO) Trait Ontology ….. organized as directed acyclic graphs (DAGs) DAGs

Vocabularies in MGI DAGs Definition Synonyms MP:1956 Strain: AEJ Alleles:bd/bd Genotype Strain: C57BL/6 Alleles: Ppp1r3a tm1Adpt / Ppp1r3a tm1Adpt Terms … Respiratory failure Postnatal lethality Dilated renal tubules Growth retardation Vocabulary Note … J:65378 TAS J:62648 IDA J:65322 EE Annotations

Common software for users to access vocabularies in MGI

Mammalian Phenotype Ontology Structured as DAG >6,250 terms covering physiological systems, behavior, survival, and development Available in web browser and in OBO and text formats from MGI ftp and OBO sites Each term linked to all annotations to the term or its children >133,00 annotations genotype - MP Synonyms Term in context Links to all mouse genotypes with this phenotype

abnormal reflex opisthotonus tremors myoclonus abnormal muscle physiology muscle phenotype behavior/ neurological phenotype abnormal Involuntary movement

…make phenotype & disease model data robust & accessible to researchers & computational biologists semantically consistent search methods integrated access to all phenotypic variation sources (single-gene, genomic mutations, engineered mutations, QTL, strains) data on human disease correlation access to mouse models from various approaches - Genetic - Phenotypic - Computational Mammalian Phenotype (MP) Ontology

Developing the Mammalian Phenotype Ontology New terms from ongoing curation process Collaborative community efforts identify new terms suggest improved organization of terms Rat Genome Database Mutagenesis Centers Human (NCBI) OMIA (Online Mendelian Inheritance in Animals) Proprietary Databases Future (International Mouse Knockout Projects) Comparisons among Ontologies (GO Process, Mouse Anatomy, FMA, Cell Type, MPath, etc.) Systematic review by domain experts

Making Mammalian Phenotype Ontology Work DAGs accommodate bio-specific terms computationally useful human accessible practical for curation cross-reference to other ontologies

Terms in MP MP termEntityPATO Quality MP def microphthalmiaeyesmall sizereduced average size of the eyes hydrocephalycerebro- spinal fluid increased, excessive, accumulated excessive accumulation of cerebrospinal fluid in the brain, especially the cerebral ventricles, often leading to increased brain size and other brain trauma brainlarge size (dilated) trauma of brain observed

Complex Examples: id: MP: ! ocular albinism intersection_of: PATO: ! lacking processual parts intersection_of: inheres_in MA: ! eye intersection_of: towards GO: ! melanin metabolic process MP definition: absence of melanin (pigment) production in the eye with identifiable melanocytes present id: MP: ! ventricular fibrillation !intersection_of: PATO: ! asynchronous !intersection_of: inheres_in CL: ! cardiac muscle cell !intersection_of: towards GO: ! cardiac muscle contraction !intersection_of: located_in MA: ! ventricle endocardium !intersection_of: located_in MA: ! ventricle myocardium MP definition: asynchronous contraction or quivering of individual cardiac muscle fibers in the ventricles

Status of Phenotype & Disease Data Nov 2008 Phenotype terms in MP ontology 6,355 Phenotypic alleles cataloged number of genes represented targeted alleles number of genes targeted 21,996 8,225 13,549 5,547 Alleles with MP annotation Genotypes with MP annotation Total MP annotations 19,458 27, ,577 Genotypes with OMIM associations OMIM with associated genotypes 2, QTLs 4,015 Strains >10,500

Current QTL Display

Current QTL display + +

Genome coordinates: (MGI Mouse GBrowse) Changes planned for QTL Display

Need for a trait ontology What is measured –Blood pressure –% body fat –Coat color Annotation of –QTL –Strain characteristics / baseline –Measurements Some issues specificity vs broad synchronizing wih MP “how much” cross-species?

OBO-Edit, curation tool for building ontologies

Working on Trait Ontology MGI IMPC MPD RGD Domestic Species (Animal QTL) Currently: approx terms, built initially by stripping MP working systematically on branches

MGI Phenotype Data Staff Anna Anagnostopoulos Randal P. Babiuk Susan M. Bello Donna L. Burkart Howard Dene Michelle Knowlton Ira Lu Hiroaki Onda Cynthia L. Smith Monika Tomczuk Linda L. Washburn Jonathan S. Beal Kim L. Forthofer Peter Frost

The End NHGRI grant HG000330