PATO & Phenotypes: From model organisms to clinical medicine Suzanna Lewis September 4th, 2008 Signs, Symptoms and Findings Workshop First Steps Toward.

Slides:



Advertisements
Similar presentations
More than one way to dissect an animal Melissa Haendel ZFIN Scientific Curator.
Advertisements

Linking Animal Models to Human Diseases Supported by NIH P41 HG and U54 HG the University of Oregon, Eugene, OR.
Confessions/Disclaimers Ontologies and REDfly CARO SO OBO Foundry.
Species-Neutral vs. Multi-Species Ontologies Barry Smith.
On the Future of the NeuroBehavior Ontology and Its Relation to the Mental Functioning Ontology Barry Smith
Goal and Status of the OBO Foundry Barry Smith. 2 Semantic Web, Moby, wikis, crowd sourcing, NLP, etc.  let a million flowers (and weeds) bloom  to.
Linking Animal Models to Human Diseases Supported by NIH P41 HG and U54 HG the University of Oregon, Eugene, OR
PATO An Ontology of Phenotypic Qualities
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
Iowa State University Animal Science Department Bioinformatics & Computational Biology Program - 01/16/06 1 Overview of Animal Trait Ontology and PATO.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
1 The OBO Foundry 2 A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast.
The Problem of Reusability of Biomedical Data OBO Foundry & HL7 RIM Barry Smith.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Using Ontologies to Represent Immunological Networks Lindsay G. Cowell, Anne Lieberman, Anna Maria Masci Duke University Center for Computational Immunology.
Room for Lunch: Arlington Room Room for Evening Reception: Grand Prairie Room.
OBO-Foundry. OBO was conceived and announced in in october 2001 Michael Ashburner and Suzanna Lewis with acknowledgements of others in the GO.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
1 BIOLOGICAL DOMAIN ONTOLOGIES & BASIC FORMAL ONTOLOGY Barry Smith.
Ontological Model for Colon Carcinoma: A Case Study for Knowledge Representation in Clinical Bioinformatics Kumar A 1,2, Yip L 3, Jaremek M 2, Scheib H.
CoE Ontology Research Group (ORG) Barry Smith Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Department of Philosophy.
How to Organize the World of Ontologies Barry Smith 1.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.
PATO An ontology for phenotypes. The development of PATO is the work of George Gkoutos, supported by the NCBO, working in Cambridge.
Relating Animal Model Phenotypes to Human Disease Genes Project Goals: To develop methods and syntax for describing phenotypes using ontologies To compare.
Limning the CTS Ontology Landscape Barry Smith 1.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
OBD : technical overview Chris Mungall. Outline  The annotation lifecycle  OBD Model and modeling requirements  Current OBD architecture  Discussion.
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The National Center for Biomedical Ontology Stanford – Berkeley Mayo – Victoria – Buffalo UCSF – Oregon – Cambridge.
Imports, MIREOT Contributors: Carlo Torniai, Melanie Courtot, Chris Mungall, Allen Xiang.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Ontology of Disease and the OBO Foundry Chris Mungall NCBO GO Nov 2006.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
The “über-ontology” (Uberon) Melissa Häendel, Chris Müngall, George Gkoütos Cell Ontology Workshop May, 2010.
Linking Animal Models and Human Diseases Supported by NIH P41 HG002659, U54 HG004028, & R01 HG Cambridge University & the University of Oregon.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Core 2: Bioinformatics NCBO-Berkeley. Core 2 Specific Aims 1.Apply ontologies  Software toolkit for describing and classifying data 2.Capture, manage,
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
2 3 where in the body ? where in the cell ?
About ontologies Melissa Haendel. And who am I that I am giving you this talk? Melissa Haendel Anatomist, developmental neuroscientist, molecular biologist,
Need for common standard upper ontology
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Phenotype And Trait Ontology (PATO) and plant phenotypes
Gene Ontology Consortium The Pathogen Group Schizosaccharomyces pombe Genome Sequencing Project DictyBase.
2007 Mouse All Hands Meeting BIRN Ontology Day Jeff Grethe & Bill Bug (BIRN OTF) - March 7th, 2007.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Semantic Interoperability Framework Sarala M. Wimalaratne (RICORDO project)
Linking Animal Models and Human Diseases
A statistical method for comparing phenotypes in the OBD
The Common Anatomy Reference Ontology (CARO) and queries across species Melissa Haendel ZFIN.
Development of the Amphibian Anatomical Ontology
Phenoscape Data Jamboree 2
Why do we need upper ontologies? What are their purported benefits?
OBO Foundry Update: April 2010
Presentation transcript:

PATO & Phenotypes: From model organisms to clinical medicine Suzanna Lewis September 4th, 2008 Signs, Symptoms and Findings Workshop First Steps Toward an Ontology of Clinical Phenotypes

Describing phenotype using ontologies will aid in the identification of models of disease & candidate causative genes  GWAS: Genome Wide Association Studies  Any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease or condition.  Given an identified gene, then what?

Animal disease models Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model)

Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models

Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Animal disease models HumansAnimal models

Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models

Phenotype data mining = text searching?  Text-based phenotype resources:  OMIM (NCBI)  DECIPHER (Sanger)  HGMD (Cardiff)  Disease-specific databases  MODs  PubMed

Query# of records “ large bone”713 "enlarged bone"136 "big bones"16 "huge bones"4 "massive bones"28 "hyperplastic bones"8 "hyperplastic bone"34 "bone hyperplasia"122 "increased bone growth"543 Thanks to: M Ashburner Information retrieval from text-based resources (OMIM) is not straightforward:

Even if we can find what we are looking for in one organism, how can we associate that with phenotypes observed in different organisms? Methods to link phenotypic descriptions of human diseases to animal models currently don’t exist.

Goal: Turn text-based phenotypes into ontology-based computable annotations  Define a model for representing phenotypes

SHH -/+ SHH -/- shh -/+ shh -/-

Phenotype (clinical sign) = entity + attribute

Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric

Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic

Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied

Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +

Phenotype (clinical sign) = entity + attribute Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process molecular function cellular component + PATO (phenotype and trait ontology)

Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied Syndrome = P 1 + P 2 + P 3 (disease) (package) = holoprosencephaly

EntityQuality EvidenceQualifier relationship Units EnvironmentGenetic Phenotype annotation model Source Attribution Who makes the assertion Properties When, what organization Assertion

OBD and annotations Shh Absence of aorta publish/ create Experiment/ investigation query/ meta-analysis Direct annotation Shh - Absence Of aorta X observation Computational representation Agent (human/computer) Community/expert Information entity investigator read bio-entity Shh + Heart development Dev Biol 2005 Jul 15;283(2): “Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” communicate local db Multiple schemas influences Participates in represents subjobj relation annotation submit/ consume

Goal: Turn text-based phenotypes into ontology-based computable annotations  Define a model for representing phenotypes  Develop and extend requisite ontologies  For the entities being described: anatomies, processes, …

It is critical that ontologies are developed cooperatively so that their classification strategies augment one another. Building a suite of orthogonal interoperable reference (evidence based) ontologies in the biomedical domain. Truth springs from arguments amongst friends. (David Hume)

RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy?) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)

Requisite ontologies  An ontology of qualities (PATO)  Organism specific anatomies  A controlled vocabulary of homologous and analogous anatomical structures (Uberon)  Gene Ontology  Cell Types

Goal: Turn text-based phenotypes into ontology-based computable annotations  Define a model for representing phenotypes  Develop and extend requisite ontologies  For the entities being described: anatomies, processes, …  Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”)

Phenote: Simple software for annotating using ontologies  Provide tool for ontology-based annotation  Standardized model to record annotations for increased compatibility of data between disparate communities.  Simple & intuitive user interface  (especially for users that don’t know/care about what an ontology is)  Easy-to-configure for different user-communities  Pluggable architecture for external applications to interface/embed in application  Provide interfaces with external SOAP and REST services for streamlined workflow (OBD, NCBI, EBI, etc). 

CVS BioPortal External site Local file Ontologies can be utilized from various resources in OWL and OBO format

Phenote tour

Editor

Refining terms on- the-spot  Post-composition:  Join together 2 (or more) terms for specificity:  Apoptosis of neuron in skin (GO,CL,FMA)  S-phase of colon cancer cell (GO,CL)  Aster of human spermatocyte (GO,FMA)  Combine terms from different ontologies  Increase “information content” of an annotation  Pre-composed:  Have decomposed definitions of ~2/3 rd s of MP terms available to incorporate mouse data

Term Info Browser

Annotation Table

Retrieve data from NCBI: OMIM, PUBMED, … (SOAP plug-in)

Graphical Viewer

Goal: Turn text-based phenotypes into ontology-based computable annotations  Define a model for representing phenotypes  Develop and extend requisite ontologies  For the entities being described: anatomies, processes, …  Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”)  Develop a set of guidelines for biocurators  Annotate mutant phenotypes (OMIM and models)

General Annotation Standards  Remarkable normality  Absence  Relative qualities (what does “small” mean?)  Rates/frequencies  does it inhere in the heart or a process?  Homeotic transformation  Phenotypes specific to a stage or temporal duration

Testing the methodology  Annotated 11 gene-linked human diseases described in OMIM, and their homologs in zebrafish and fruitfly.  ATP2A1, BRODY MYOPATHY  EPB41, ELLIPTOCYTOSIS  EXT2, MULTIPLE EXOSTOSES  EYA1, EYES ABSENT  FECH, PROTOPORPHYRIA  PAX2, RENAL-COLOBOMA SYNDROME  SHH, HOLOPROSENCEPHALY  SOX9, CAMPOMELIC DYSPLASIA  SOX10, PERIPHERAL DEMYELINATING NEUROPATHY  TNNT2, FAMILIAL HYPERTROPHIC CARDIOMYOPATHY  TTN, MUSCULAR DYSTROPHY

An OMIM Record

Goal: Turn text-based phenotypes into ontology-based computable annotations  Define a model for representing phenotypes  Develop and extend requisite ontologies  For the entities being described: anatomies, processes, …  Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”)  Develop a set of guidelines for biocurators  Annotate mutant phenotypes (OMIM and models)  Collect & store annotations in a common resource (OBD) and make these broadly available

4355 genes and genotypes in OBD entity-quality annotations in OBD

OBD model: Requirements  Generic  We can’t define a rigid schema for all of biomedicine  Let the domain ontologies do the modeling of the domain  Expressive  Use cases vary from simple ‘tagging’ to complex descriptions of biological phenomena  Formal semantics  Amenable to logical reasoning  First Order Logic and/or OWL1.1  Standards-compatible  Integratable with semantic web

OBD Model: overview  Graph-based: nodes and links  Nodes: Classes, instances, relations  Links: Relation instances  Connect subject and object via relation plus additional properties  Annotations: Posited links with attribution / evidence  Equivalent expressivity as RDF and OWL  Links aka axioms and facts in OWL  Attributed links:  Named graphs  Reification  N-ary relation pattern  Supports construction of complex descriptions through graph model

OBD Dataflow

key Post-composition of phenotype classes (PATO EQ formalism) Post-composition of complex anatomical entity descriptions Example of Annotation in OBD

OBD Architecture  Two stacks 1.Semantic web stack  Built using Sesame triplestore + OWLIM  Future iterations: Science-commons Virtuoso 2.OBD-SQL stack  Current focus  Traditional enterprise architecture  Plugs into Semantic Web stack via D2RQ

OBD-SQL Stack  Alpha version of API implemented  Test clients access via SOAP  Phenote current accesses via org.obo model & JDBC  Wraps org.obo model and OBD schema  Share relational abstraction layer  Org.obo wraps OWLAPI  Phenote currently connects via JDBC connectivity in org.obo

Goal: Turn text-based phenotypes into ontology-based computable annotations  Define a model for representing phenotypes  Develop and extend requisite ontologies  For the entities being described: anatomies, processes, …  Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”)  Develop a set of guidelines for biocurators  Annotate mutant phenotypes (OMIM and models)  Collect & store annotations in a common resource (OBD) and make these broadly available  Develop tools & resources for mining data for novel discovery  Developed a similarity search algorithm to identify genotypes with similar phenotype.

sox9 mutations curated in PATO syntax Human, SOX9 (Campomelic dysplasia) Zebrafish, sox9a (jellyfish) Male sex determination: disrupted Scapula: hypoplasticScapulocorocoid: aplastic Lower jaw: decreased size Cranial cartilage: hypoplastic Heart: malformed or edematousHeart: edematous Phalanges: decreased lengthPectoral fin: decreased length Long bones: bowedCartilage development: disrupted

EYA1SOX10SOX9PAX # Annotations congruence total annotations similar annotations Average annotation consistency

Reasoning over phenotype descriptions recorded with ontologies provides linkages in annotations.

Ontologies and reasoning can reveal similarities in phenotype annotations.

GeneSimilarityCitationRole in hedgehog pathway smo0.445Ochi, et al. 2006Membrane protein binds shh receptor ptc1 disp10.444Nakano, et al Regulates secretion of lipid modified shh from midline prdm1 a 0.43Roy, et al., 2001Zinc-finger domain transcription factor, downstream target of shh signaling hdac10.427Cunliffe and Casaccia-Bonnefil, 2006 Transcriptional regulator required for shh mediated expression of olig2 in ventral hindbrain scube Hollway et al., 2006 May act during shh signal transduction at the plasma membrane wnt Mullor et al., 2001Extracellular cysteine rich glycoprotein required for gli2/3 induced mesoderm development gli2a0.348Kalstrom, et al., 1999 Zinc finger transcription factor target of shh signaling bmp2b0.303Ke et al., 2008Downstream target of gli2 gene repression gli10.303Karlstrom, et al., 2003 Zinc finger transcription factor target of shh signaling ndr20.289Muller, et al., 2000 TGFbeta family member upstream of hedgehog signaling in the ventral neural tube hhip0.265Ochi et al., 2006Binds shh in membrane and modulates interaction with smo A zebrafish shh similar-phenotype query returns known hedgehog pathway members

OBD similarity query  A computational search that enables comparison of phenotypes within and across species.  Given a set of phenotype annotations recorded for a mutant allele we can identify other alleles in the same gene.  We can identify other known pathway members in the same species and known gene orthologs in other species simply by comparing phenotypes alone.  This annotation and search method provides a novel means for laboratory researchers to identify potential gene candidates participating in regulatory and/or disease pathways.

Summary of (some of) the challenges  Curating the information  Efficiency (pre-composed vs. post-composed)  Consistency between curators  Missing contextual information (genetic background and environment)  Observation vs. the inference made from this  Representing homology (bones named by relative position)  Those attempting this anyway: zebrafish, Drosophila, C. elegans, Cyprinoid fish (evolution), Dictyostelium, mouse (many), Xenopus, paramecium…

Credit to Berkeley  Christopher Mungall  Mark Gibson  Nicole Washington  Rob Bruggner U of Oregon  Monte Westerfield  Melissa Haendel National Institutes of Health U of Cambridge  Michael Ashburner  George Gkoutos (PATO)  David Osumi-Sutherland OBO Foundry  Michael Ashburner  Christopher Mungall  Alan Ruttenberg  Richard Scheuermann  Barry Smith