Ontologies in biology, biomedicine and disease genetics.

Slides:



Advertisements
Similar presentations
Biomedical Informatics Reference Ontologies in Biomedicine Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College.
Advertisements

The Diagnostic Laboratory ……the ideal system……. Molecular Genetics Diagnostic Laboratory Exciting area of medical pathology Need to continually up-date.
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
Knowledge Graph: Connecting Big Data Semantics
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
Ontology Notes are from:
Biomedical Informatics Some Observations on Clinical Data Representation in EHRs Christopher G. Chute, MD DrPH, Mayo Clinic Chair, ICD11 Revision, World.
Lecture 5 Standardized Terminology and Language in Health Care (Chapter 15)
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
1 CIS607, Fall 2004 Semantic Information Integration Attendees: Vikash Agarwal, Julian M Catchen Kevin A Huck, Kushal M Koolwal, Paea J Le Pendu Xiangkui.
1 The Future of Clinical Bioinformatics: Overcoming Obstacles to Information Integration Barry Smith Brussells, Eurorec Ontology Workshop, 25 November.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
PATO An ontology for phenotypes. The development of PATO is the work of George Gkoutos, supported by the NCBO, working in Cambridge.
1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.
The Mapping Problem: How do experimental biological models relate to each other, and how can dynamic computational models be used to link them? Gary An,
Representing, Querying and Mining Knowledge about Autism Phenotypes
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Bioinformatics and medicine: Are we meeting the challenge?
Developing anatomy ontologies in the context of others Melissa Haendel, Chris Mungall, Carlo Torniai, Matt Yoder.
Symmetrical Positioning of Learners in Learning Networks with Content Analysis, Metadata and Ontologies. Presentation TENCompetence “Learning Networks.
Resurrecting SOWG BS, Baltimore, CTS Ontology Workshop April
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
From Bench to Bedside: Applications to Drug Discovery and Development Eric Neumann W3C HCLSIG co-chair Teranode Corporation HCLSIG F2F Cambridge MA.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
This material was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
An International Centre for Mouse Genetics EuroPhenome and the International Mouse Phenotyping Consortium John Hancock MRC Harwell.
Authoring: In and Out of the Real World MesMuses 2003 Andy Dingley Codesmiths
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
ADVANCED DB SYSTEMS BIOMEDICAL ENGINEERING. Index INTRODUCTION  BIOMEDICAL ENGINEERING  B.E. DATASETS APPLICATIONS  DATA MINING ON FDA DATABASE  ONTOLOGY-BASED.
Linking Animal Models and Human Diseases Supported by NIH P41 HG002659, U54 HG004028, & R01 HG Cambridge University & the University of Oregon.
Asp/IEETA Health-Grid Workshop Brussels 20 th September 2002 A. Sousa Pereira Univ. Aveiro - IEETA.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Melanie Feinberg, Spring 2010 Organizing Information 7 statements.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Phenotype And Trait Ontology (PATO) and plant phenotypes
APPLICATION OF ONTOLOGIES IN CANCER NANOTECHNOLOGY RESEARCH Faculty of Engineering in Foreign Languages 1 Student: Andreea Buga Group: 1241E – FILS Coordinating.
Health IT Workforce Curriculum Version 1.0 Fall Networking and Health Information Exchange Unit 4a Basic Health Data Standards Component 9/Unit.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. McGraw-Hill/Irwin Chapter 2 Clinical Information Standards – Unit 3 seminar Electronic.
Clinical terminology for personalized medicine: Deploying a common concept model for SNOMED CT and LOINC Observables in service of genomic medicine James.
Genomic Definition by Reference Mapping
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The UMLS and the Semantic Web
NeurOn: Modeling Ontology for Neurosurgery
The Teleost Anatomy Ontology: computable evolutionary morphology for teleost fishes Wasila Dahdul University of South Dakota & National Evolutionary Synthesis.
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Major Databases/Portals
Genomes and Their Evolution
Deep Phenotyping for Deep Learning (DPDL): Progress Report
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Ontological analysis of the semantic types
Presentation transcript:

Ontologies in biology, biomedicine and disease genetics

The problems with biological data Recorded mainly in natural language. E.g. mutant phenotypes –Language uses symbols and rules (natural language) to communicate knowledge –Expressive –Semantically ambiguous –Hard to compute on Computational language –Precise –Less expressive –Allows grouping and data exploration Why do we need to compute? –Database searching, query extension –Data/Literature mining –Knowledge transfer between databases and analytical packages –Complex queries –Data integration –Reasoning –Machine learning

The naming of things... Naming and classification are essential for the capture and use of knowledge about the world. Names (labels) are a common reference that can be used by everyone to refer to the same entity Labels can be attached to many things –Physical entities in the real world –Concepts –Processes –Qualities –Relationships

The naming of things... Aristotle ( BC) First systematic taxonomy of biology, Classification of organisms by shared properties and value-based hierarchy Binomial genus-differentia nomenclature Galen ( AD) Systematic description of diseases, signs and symptoms. In De Febrium Differentia description of fever symptoms he uses the terms Intermittent, remittent and intermittent fevers adopting the Aristotelian genus- differentia approach

A physiological system of Nosology.....John Mason Good 1820

The problem hasn’t gone away.... OMIM Query# Records “large bone”785 “enlarged bone”156 “big bone”16 “huge bones”4 “massive bones”28 “hyperplastic bones”12 “hyperplastic bone”40 “bone hyperplasia”134 “increased bone growth”612

Classification systems 1. those that belong to the Emperor, 2. embalmed ones, 3. those that are trained, 4. suckling pigs, 5. mermaids, 6. fabulous ones, 7. stray dogs, 8. those included in the present classification, 9. those that tremble as if they were mad, 10. innumerable ones, 11. those drawn with a very fine camelhair brush, 12. others, 13. those that have just broken a flower vase, 14. those that from a long way off look like flies. The Celestial Emporium of Benevolent Knowledge Borges

Lessons Systematic, meaningful and unambiguous nomenclature is important in handling concepts The definitions of terms is as important, if not more so, than the terms themselves

Turning data into knowledge through concept relationships Ontologies Capture a shared understanding of a domain of interest Provide a formal and machine manipulable model of the domain linking concepts through defined and scientifically meaningful relationships Contain semantic links between concepts. –eg. is_a, part_of, descended_from, has_symptom The scientific knowledge implicit in an ontology can make the reasons for classification explicit using reasoning, and can detect errors.

Detection of incorrect assertion by reasoning Genome Biology :R5

Functional Genomics Understanding the link between – DNA sequence Biology/Disease (Genotype) (Phenotype) Modifiers Environment Drugs ATTCGCATGGACC C A

Sources of phenotype/genotype information Mouse Mouse Genome Informatics – >8800 genes have phenotype annotations in the mouse Phenome DB – 1300 strains to date Europhenome International Mouse Phenotyping Consortium – >3,575 strains systematically phenotyped to date Human OMIM – 3100 phenotype descriptions with molecular basis known – 3600 phenotype description or locus with basis unknown Orphanet – 6000 diseases with phenotype descriptions dbGaP ClinVar GWAS central GWAS catalog

Main ontologies for diseases and phenotypes Mammalian Pathology (MPATH) –900 terms –mapped to other terminologies –describes pathological lesions and processes Disease Ontology (DO) –About 9000 terms –Semantically mapped to major terminologies, UMLS, MeSH, ICD10 etc. Experimental Factor ontology (EFO) –“application ontology”18596 terms –Imports classes from other phenotype and related ontologies (MIREOT) Orphanet Ontology (ORDO) –13105 terms –structured vocabulary for rare diseases capturing relationships between diseases, genes and other relevant features Human Phenotype Ontology ( HPO ) –15, 319 terms –derived from OMIM clinical synopses Mammalian Phenotype Ontology (MP) –11, 720 classes –Used by MGI for annotating mutant strains from literature –Used by IMPC for annotating phenotyping pipleline Unified Medical Language System (UMLS) –US National Library of Medicine – terminology, classification and coding standards – 8M normalised concepts SNOMED-CT –321,000 classes –clinical terminology –diseases diagnostics and procedures –proprietary NCI thesaurus –119,000 classes – vocabulary for clinical care, translational and basic research, and public information and administrative activities. LOINC –medical diagnositics and observations –180, 000 classes ICD-10 –12,450 classes –disease, epidemiology, billing –soon to be replaced with ICD-11

Current anatomy ontologies Ontology Domain and applicability Class count Object Properties Count Axioms count Text definitions Count Computable definitions Count Text definition s % Computable definitions % UberonAnimalia %35.13% FMA Homo sapiens (A) %None EHDAA2 Homo sapiens (AE) %None MAMus (A) None EMAPAMus (E) %None ZFA Danio rerio (zebrafish) (AE) %*None TAO Teleosti (bony fishes) (AE) %0.59% XAO Xenopus (frog) (AE) %None AAOAmphibia (A) None FBbt Drosophila (fruitfly) (AE) %27.81% WBbt C. elegans (nematode) (AE) %0.14%

Anatomy ontologies

Main applications of ontologies in biomedical data Annotation of genes, genetic variants Annotation of disease entities Data recovery, integration and analysis –literature, EMRs and databases Patient/animal data capture Genome-Phenome relationships –Overrepresentation analysis of phenotypes on patient or animal cohorts –Correlation between variants and phenomes, eg in CNV analysis –Establishment of disease similarity, phenotype modularity, network identity, through constituent phenotypes.

Main applications of ontologies in biomedical data Annotation of genes, genetic variants –OMIM, Orphanet, GWAS Catalog, Mouse Genome Informatics, Zfin, CLINVAR, Annotation of disease entities –OMIM, ORPHANET, Human Phenotype database (HPO), Aber-OWL-disease Data recovery, integration and analysis –Literature, EMRs, and databases Patient/animal data capture –Phenotips/Phenome central –International Mouse Phenotyping Consortium (IMPC) –Mouse Genome Informatics Genome-Phenome relationships –Overrepresentation analysis of phenotypes on patient or animal cohorts –Correlation between variants and phenomes, eg in CNV analysis –Establishment of disease similarity, phenotype modularity, network identity, through constituent phenotypes.

Phenotypes and diseases in MGI

Phenotypes in IMPC

GWAS phenotypes traits and diseases GWAS study traits annotated using the Experimental Factor ontology in Catalog Used for many of the databases at EBI Application Ontology Imports classes from other ontologies across a wide range of themes using MIREOT GWAS study traits annotated to MeSH or HPO in GWAS Central

OMIM PhenomeNET Phenomizer

PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients with Rare Genetic Diseases Human Mutation Volume 36, Issue 10, pages , 31 AUG 2015 DOI: /humu Volume 36, Issue 10,

Challenges: integrating disease and phenotype data

Patient Records Human Variation Databases PubMed Clinical Trials Bridging the gap Model Organism Databases Medical Informatic s Bioinformatics Mouse Genome Informatics >8800 genes have phenotype annotations in the mouse Phenome DB 1300 strains to date Europhenome International Mouse Phenotyping Consortium >1300 strains systematically phenotyped to date OMIM 3100 phenotype descriptions with molecular basis known 3600 phenotype description or locus with basis unknown Orphanet 6000 diseases with phenotype descriptions dbGaP ClinVar GWAS central GWAS catalog

lung lobular organ parenchymatous organ solid organ pleural sac thoracic cavity organ thoracic cavity abnormal lung morphology abnormal respiratory system morphology Mammalian Phenotype(MPO) Mouse Anatomy (MA) FMA abnormal pulmonary acinus morphology abnormal pulmonary alveolus morphology lung alveolus organ system respiratory system Lower respiratory tract alveolar sac pulmonary acinus organ system respiratory system Human development (EGDAA2) lung lung bud respiratory primordium pharyngeal region Data silos is_a (SubClassOf) develops_from part_of surrounded_by Genome Biology :R5