Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015
National Center for Biomedical Ontology (NCBO) NIH Roadmap Center Stanford University School of Medicine Mayo Clinic University at Buffalo 2
3
Old biology data 4/
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSF YEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFV EDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLF YLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIV RSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDT ERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRL RKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVA QETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTD YNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFN HDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYAT FRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYES ATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQ WLGLESDYHCSFSSTRNAEDVDISRIVLYSYMFLNTAKGCLVEYA TFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYE SATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWI QWLGLESDYHCSFSSTRNAEDV New biology data 5
How to do biology across the genome? MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGIS LLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWM DVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSR FETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVM KVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISV MVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERC HEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLK RDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCK LRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR DLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKL RSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR DLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKL RSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV 6
how to link the kinds of phenomena represented here 7
or here 8
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRK RSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSL FYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLL HVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLD IFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDY NKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDIS RIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESA TSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVV AGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQA PPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDL YVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEK AIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKI RKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKE FVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKG ELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVAL PSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTN ASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNA TTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNT NATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDG NAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYF CPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDP VGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNL RESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRH HRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHW LDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLE YLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVG ELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 9 to data like this?
Answer Semantic enhancement (Annotation, labeling, tagging …) An ontology is a controlled structured vocabulary to support annotation of data 10
Questions How to build an ontology? How to bring it about that all scientists in a domain use the same ontology aggressively to annotate their data? How to bring it about that scientists in neighboring domains use ontologies for this purpose that are interoperable? 11
Precursor: International System of Units (SI) 12
By far the most successful: GO (Gene Ontology) 15
GO provides a controlled vocabulary of terms for use in annotating (describing, tagging) data multi-species, multi-disciplinary, open source contributing to the cumulativity of scientific results obtained by distinct research communities 16
Gene products involved in cardiac muscle development in humans 17
Prerequisites for ontology success Aggressive use in tagging data across multiple communities Feedback cycle between ontology editors and ontology users to ensure continuous update Logically and biologically coherent definitions – logical = to allow computational reasoning and quality assurance 18
> $100 mill. invested in literature curation using GO over 200 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO (Gigascience 3/1/4) experimental results reported in 52,000 scientific journal articles manually annotated by expert biologists using GO 19
GO is amazingly successful in overcoming problems of balkanization but it covers only generic biological entities of three sorts: – cellular components – molecular functions – biological processes and it does not provide representations of diseases, symptoms, anatomy, pathways, … 20
Ontology success stories, and some reasons for failure So people started building the needed extra ontologies more or less at random 21
Reviewing ontologies which are candidates for the imaging framework B. Smith, … J. Tomaszewski, and M. Gurcan, “Biomedical Imaging Ontologies: A Survey and Proposal for Future Work”, Journal of Pathology Informatics, in press 22
23
RadLex 24
25 QIBO
DICOM
27
28
29
30
31
32
33
34 Definition: Reaching a decision through the application of an algorithm designed to weigh the different factors involved.
35 Definition: Reaching a decision through the application of an algorithm designed to weigh the different factors involved. Confuses an algorith with an act of reaching a decision Defines ‘algorithm’ as a special kind of application of an algorithm. (This is worse than circular.)
John Fox (Director, OpenClinical) As a user and teacher of ontological methods in medicine and engineering I have for years warned my students that the design of domain ontologies is a black art with no theoretical foundations and few practical principles. 36
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Original OBO Foundry ontologies (Gene Ontology in yellow) 38
– CHEBI: Chemical Entities of Biological Interest – CL: Cell Ontology – GO: Gene Ontology – OBI: Ontology for Biomedical Investigations – PATO: Phenotypic Quality Ontology – PO: Plant Ontology – PATO: Phenotypic Quality Ontology – PRO: Protein Ontology – XAO: Xenopus Anatomy Ontology – ZFA: Zebrafish Anatomy Ontology 39
Anatomy Ontology (FMA*, CARO) Disease Ontology (OGMS, IDO, HDO, HPO) Biological Process Ontology (GO) Cell Ontology (CL) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology (PATO) Sequence Ontology (SO) Molecular Function Ontology (GO) Protein Ontology (PRO) Extension Strategy + Modular Organization top level mid-level domain level I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) Basic Formal Ontology (BFO) 40
Example: The Cell Ontology
CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) rationale of OBO Foundry coverage GRANULARITY RELATION TO TIME 42
RELATION TO TIME GRANULARITY CONTINUANT OCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT MOLECULE Rationale of OBO Foundry coverage 43
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology (EnvO) Environments 44
OBO Foundry Principles The ontology is open and able to be integrated freely with other resources It is instantiated in a common formal language. Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology, and agree in advance to collaborate with developers of ontologies in adjacent domains. 45
OBO Foundry Principles Modular development to guarantee additivity of annotations Single locus of authority (for editing, error tracking, …) Common architecture (BFO) Common governance (coordinating editors) Common training – expertise is portable, lessons learned through practice can be pooled 46
OBO Foundry approach extended into other domains 47 NIF StandardNeuroscience Information Framework IDO ConsortiumInfectious Disease Ontology Suite cROPCommon Reference Ontologies for Plants United Nations Environment Program UNEP Ontology Framework
48 IMAGING
RELATION TO TIME CONTINUANT OCCURRENT GRANULARITY INDEPENDENT CONTINUANT DEPENDENT CONTINUANT ORGAN AND ORGANISM Organism NCBI Taxonomy Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Biological Process (GO) Ontology for Biomedical Investigations (OBI) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) OBI: Imaging Ontology Branch Environment Ontology (ENVO) 49 Phenotypic Quality (PATO) Recognizing a new family of processes (investigation, assay, protocol-driven process)
Anatomy Ontology (FMA*, CARO) Disease Ontology (OGMS, IDO, HDO, HPO) Biological Processes Assays (Protocol- driven processes) Cell Ontology (CL) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology (PATO) Sequence Ontology (SO) Molecular Function Ontology (GO) Protein Ontology (PRO) Extension Strategy + Modular Organization I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) Basic Formal Ontology (BFO) 50
Structure of a typical investigation as viewed by OBI (from The Ontology for Biomedical Investigations
RELATION TO TIME CONTINUANT OCCURRENT GRANULARITY INDEPENDENT CONTINUANT DEPENDENT CONTINUANT INFORMATION ARTIFACT ORGAN AND ORGANISM Organism NCBI Taxonomy Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Software, Algorithms … Patient Demographic Data, EHR Data, Public Health Data, … Biological Process (GO) OBI CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Images, Image Data, Flow Cytometry Data, … Molecular Process (GO) OBI: Imaging Environment Ontology (ENVO) 52 Phenotypic Quality (PATO) Recognizing a new family of attributes (data, information artifacts, including images)
Anatomy Ontology (FMA*, CARO) Disease Ontology (OGMS, IDO, HDO, HPO) Images, Image Data, Image Metadata … Biological Process Ontology (GO) Assays Cell Ontology (CL) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology (PATO) Sequence Ontology (SO) Molecular Function Ontology (GO) Protein Ontology (PRO) Extension Strategy + Modular Organization I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) INFORMATION A RTIFACT (~D ATA ) O CCURRENT (~P ROCESS ) Basic Formal Ontology (BFO) 53
Structure of a typical investigation as viewed by OBI (from The Ontology for Biomedical Investigations
specimen extraction purification imaging data collection assay transformation OBI Pipeline applied to Imaging Assays
56 need to extend this pipeline also to clinical diagnosis and treatment (from OGMS*) *Ontology for General Medical Science
specimen extraction purification imaging data collection assay transformation need to extend this pipeline also to clinical diagnosis and treatment
58 Even here, things are not as bad as they seem
59
60
61
62 obo/IAO_ http://purl.obolibrary.org/ obo/IAO_ : algorithm
IAO = Information Artifact Ontology: on-artifact-ontology/ 63
64
A list of ontologies using IAO Adverse Event Reporting Ontology (AERO) Bioinformatics Web Service Ontology Biological Collections Ontology (BCO) Chemical Methods Ontology (CHMO) Cognitive Paradigm Ontology (COGPO) Comparative Data Analysis Ontology Computational Neuroscience Ontology Core Clinical Protocol Ontology (C2PO) Document Act Ontology Eagle-I Research Resource Ontology (ERO) The Ontology Emotion Ontology (MFOEM) Experimental Factor Ontology (EFO) Exposé Ontology IAO-Intel Infectious Disease Ontology (IDO) Influenza Research Database (IRD) Information Entity Ontology Mental Functioning Ontology (MF) Ontology for Biomedical Investigations Ontology for Drug Discovery Investigations Ontology for General Medical Science (OGMS) Ontology for Newborn Screening Follow- up and Translational Research (ONSTR) Ontology of Clinical Research (OCRE) Ontology of Data Mining (OntoDM) Ontology of Medically Related Social Entities (OMRSE) Ontology of Vaccine Adverse Events Oral Health and Disease Ontology (OHDO) Population and Community Ontology Proper Name Ontology Semanticscience Integrated Ontology Software Ontology (SWO) Translational Medicine Ontology (TMO) Twitter Ontology Vaccine Ontology (VO)
The Ontology for Biomedical Investigations 66
The Ontology for Biomedical Investigations 67
OBI and IAO 68
Patient Demograp hics Phenotype (Disease, …) Disease processes Data about all of these things including image data … Algorithms, software, protocols, … Instruments, Biomaterials, Functions Parameters, Assay types, Statistics … Anatomy Histology Genotype (GO) Biological processes (GO) Chemistry I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) IAOOBI Basic Formal Ontology (BFO) 69
Patient Demograp hics Phenotype (Disease, …) Disease processes Data about all of these things including image data … Algorithms, software, protocols, … Instruments, Biomaterials, Functions Parameters, Assay types, Statistics … Anatomy Histology Genotype (GO) Biological processes (GO) Chemistry I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) IAOOBI Basic Formal Ontology (BFO) 70 CMPO OBI- Imaging
BFO 71 Ontology for General Medical Science Cardiovascular Disease Ontology Genetic Disease Ontology Cancer Disease Ontology Genetic Disease Ontology Immune Disease Ontology Environmental Disease Ontology Oral Disease Ontology Infectious Disease Ontology IDO Staph Aureus IDO MRSA IDO Australian MRSA IDO Australian Hospital MRSA …