Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015.

Slides:



Advertisements
Similar presentations
Developing an application ontology for biomedical resource annotation and retrieval: challenges and lessons learned C. Torniai, M. Brush, N. Vasilevsky,
Advertisements

Upper Ontology Summit Wednesday March 15 The BFO perspective Barry Smith Department of Philosophy, University at Buffalo National.
Species-Neutral vs. Multi-Species Ontologies Barry Smith.
PRIME Program for Research on Immune Modeling and Experimentation PI: Stuart Sealfon, Mount Sinai School of Medicine.
On the Future of the NeuroBehavior Ontology and Its Relation to the Mental Functioning Ontology Barry Smith
Goal and Status of the OBO Foundry Barry Smith. 2 Semantic Web, Moby, wikis, crowd sourcing, NLP, etc.  let a million flowers (and weeds) bloom  to.
Ontology Notes are from:
1 Introduction to Biomedical Ontology Barry Smith University at Buffalo
1 The OBO Foundry Towards Gold Standard Terminology Resources in the Biomedical Domain Thomas Bittner (based on a presentation by Barry Smith)
1 How Ontologies Create Research Communities Barry Smith
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
1 The OBO Foundry 2 A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast.
The Problem of Reusability of Biomedical Data OBO Foundry & HL7 RIM Barry Smith.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Underlying Ontologies for Biomedical work - The Relation Ontology (RO) and Basic Formal Ontology (BFO) Thomas Bittner SUNY Buffalo
Using Ontologies to Represent Immunological Networks Lindsay G. Cowell, Anne Lieberman, Anna Maria Masci Duke University Center for Computational Immunology.
1 Logical Tools and Theories in Contemporary Bioinformatics Barry Smith
The Future of Ontology in Buffalo Barry Smith 1.
Room for Lunch: Arlington Room Room for Evening Reception: Grand Prairie Room.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
Why a Credit Card Number is Not a Number Barry Smith 1.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
1 BIOLOGICAL DOMAIN ONTOLOGIES & BASIC FORMAL ONTOLOGY Barry Smith.
CoE Ontology Research Group (ORG) Barry Smith Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Department of Philosophy.
How to Organize the World of Ontologies Barry Smith 1.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
What is “Biomedical Informatics”?. Biomedical Informatics Biomedical informatics (BMI) is the interdisciplinary field that studies and pursues.
The Core Infectious Disease Ontology. Purpose: To make infectious disease-relevant data deriving from different sources comparable and computable Across.
1 How Ontologies Create Research Communities Barry Smith
The OBO Foundry approach to ontologies and standards with special reference to cytokines Barry Smith ImmPort Science Talk / Discussion June 17, 2014.
From speech acts to document acts: an ontology of institutions
UCore SL Training Event March 17, 2010 Presenters Barry Smith, , Lowell Vizenor, ,
Limning the CTS Ontology Landscape Barry Smith 1.
Ontological Engineering Barry Smith Computers and Information in Engineering Conference, Buffalo August 19,
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
2007 CDISC International Interchange Ontologies in Clinical Research: Representation of clinical research data in the framework of formal biomedical ontologies.
The CROP (Common Reference Ontologies for Plants) Initiative Barry Smith September 13,
Ontology of Sensors: Some Examples from Biology
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
Intelligence Ontology A Strategy for the Future Barry Smith University at Buffalo
Ontology for General Medical Science Overview and OBO Foundry Criteria Albert Goldfain Blue Highway / University at Buffalo ICBO.
Resurrecting SOWG BS, Baltimore, CTS Ontology Workshop April
Ontological Engineering Barry Smith Computers and Information in Engineering Conference, Buffalo August 19,
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
What is an ontology? Barry Smith 1.
Ontologies for Neuroscience and Neurology The Neuroscience Information Framework Fahim Imam, Stephen Larson, Georgio Ascoli, Gordon Shepherd, Anita Bandrowski,
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
Towards an Ontology of Military Plans and Planning Barry Smith National Center for Ontological Research, Buffalo.
How to integrate data Barry Smith. The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business.
Core 2: Bioinformatics NCBO-Berkeley. Core 2 Specific Aims 1.Apply ontologies  Software toolkit for describing and classifying data 2.Capture, manage,
2 3 where in the body ? where in the cell ?
Ontology and the Semantic Web Barry Smith August 26,
Need for common standard upper ontology
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Immunology Ontology Rho Meeting October 10, 2013.
OBO Foundry Principles BFO RO Barry Smith 1. OBO Foundry Principles  open  common formal language (OBO Format, OWL DL, CL)  commitment to collaboration.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Basic Formal Ontology Barry Smith August 26, 2013.
Immunology Ontology Workshop Buffalo, NY June 11-13, 2012.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Upper Ontology Summit The BFO perspective Barry Smith Department of Philosophy, University at Buffalo National Center for Ontological Research National.
New York State Center of Excellence in Bioinformatics & Life Sciences R T U Buffalo Blue Cloud Health Information Center: the vision Werner Ceusters, MD.
An Ontology Ecosystem Approach to Electronic Health Record Interoperability Barry Smith Ontology Summit April 7,
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Why do we need upper ontologies? What are their purported benefits?
What is “Biomedical Informatics”?
OBI – Standard Semantic
What is “Biomedical Informatics”?
OBO Foundry Update: April 2010
Presentation transcript:

Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015

Addressing cancer big data challenges Session 1: through imaging ontologies (BS) Session 2: by capturing metadata for data integration and analysis (Chris Stoeckert) Session 3: through the Ontology of Disease (Lynn Schriml and Lindsay Cowell) Public Session: Cancer Big Data to Knowledge (BS) 2

National Center for Biomedical Ontology (NCBO) NIH Roadmap Center Gene Ontology Semantic Web 3 NCBO

Old biology data 4

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSF YEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFV EDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLF YLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIV RSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDT ERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRL RKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVA QETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTD YNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFN HDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYAT FRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYES ATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQ WLGLESDYHCSFSSTRNAEDVDISRIVLYSYMFLNTAKGCLVEYA TFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYE SATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWI QWLGLESDYHCSFSSTRNAEDV New biology data 5

How to do biology across the genome? MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGIS LLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWM DVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSR FETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVM KVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISV MVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERC HEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLK RDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCK LRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR DLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKL RSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVMK VSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVM VGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCH EIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKR DLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKL RSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLL AFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMD VVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRF ETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV 6

how to link the kinds of phenomena represented here 7

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRK RSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSL FYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLL HVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNF GAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLD IFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDY NKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDIS RIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESA TSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVV AGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQA PPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDL YVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEK AIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKI RKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKE FVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKG ELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVAL PSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTN ASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNA TTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNT NATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDG NAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYF CPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDP VGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNL RESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRH HRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHW LDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLE YLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVG ELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 8 to data like this?

Answer Tag the data with meaningful labels which together form an ontology ~ Semantic enhancement An ontology is a controlled structured vocabulary to support annotation of data 9

Questions How to build an ontology? How to bring it about that all scientists in each domain use the same ontology to annotate their data? How to bring it about that scientists in neighboring domains use ontologies that are interoperable? 10

By far the most successful: GO (Gene Ontology) 11

GO provides a controlled vocabulary of terms for use in annotating (describing, tagging) data multi-species, multi-disciplinary, open source built by biologists, maintained and improved by biologists contributes to the cumulativity of scientific results obtained by distinct research communities 12

International System of Units (SI) 13

Gene products involved in cardiac muscle development in humans 14

Prerequisites for ontology success Aggressive use in tagging data across multiple communities Feedback cycle between ontology editors and ontology users to ensure continuous update Logically and biologically coherent definitions – logical = to allow computational reasoning and quality assurance – biological = to ensure consistency between ontologies 15

GO is amazingly successful but it covers only generic biological entities of three sorts: – cellular components – molecular functions – biological processes and it does not provide representations of diseases, symptoms, anatomy, pathways, experiments … 16

Ontology success stories, and some reasons for failure So people started building the needed extra ontologies more or less at random 17

18

19

20

21

22

23

24

25

26

27 Definition: Reaching a decision through the application of an algorithm designed to weigh the different factors involved.

28 Definition: Reaching a decision through the application of an algorithm designed to weigh the different factors involved. Confuses an algorithm with an act of reaching a decision Defines ‘algorithm’ as a special kind of application of an algorithm. (This is worse than circular.)

John Fox (Director, OpenClinical) As a user and teacher of ontological methods in medicine and engineering I have for years warned my students that the design of domain ontologies is a black art with no theoretical foundations and few practical principles. 29

Ontology success stories, and some reasons for failure Linked Open Data, from Musicbrainz to Mouse Genome Informatics 30

What are the criteria of success for ontologies in supporting reasoning over Big Data? 1. logically and biologically correct subsumption hierarchies – correct: Beta cell is_a cell – incorrect: allergy is_a allergy record in Microsoft Healthvault 31

John Fox, again As a user and teacher of ontological methods in medicine and engineering I have for years warned my students that the design of domain ontologies is a black art with no theoretical foundations and few practical principles. … I now have a much more positive story for my students. … In the journey from black art to a truly scientific theory for ontology design this book is an important milestone. 32

33

RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Original OBO Foundry ontologies (Gene Ontology in yellow) 34

– CHEBI: Chemical Entities of Biological Interest – CL: Cell Ontology – GO: Gene Ontology – OBI: Ontology for Biomedical Investigations – PATO: Phenotypic Quality Ontology – PO: Plant Ontology – PATO: Phenotypic Quality Ontology – PRO: Protein Ontology – XAO: Xenopus Anatomy Ontology – ZFA: Zebrafish Anatomy Ontology 35

Anatomy Ontology (FMA*, CARO) Disease Ontology (OGMS, IDO, HDO, HPO) Biological Process Ontology (GO) Cell Ontology (CL) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology (PATO) Sequence Ontology (SO) Molecular Function Ontology (GO) Protein Ontology (PRO) Extension Strategy + Modular Organization top level mid-level domain level I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) Basic Formal Ontology (BFO) 36

Example: The Cell Ontology

CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) rationale of OBO Foundry coverage GRANULARITY RELATION TO TIME 38

RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology (EnvO) Environments 39

OBO Foundry Principles  The ontology is open and able to be integrated freely with other resources  It is instantiated in a common formal language.  Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology,  and agree in advance to collaborate with developers of ontologies in adjacent domains. 40

OBO Foundry Principles  Modular development to guarantee additivity of annotations  Single locus of authority (for editing, error tracking, …)  Common architecture (BFO)  Common governance (coordinating editors)  Common training – expertise is portable, lessons learned through practice can be pooled 41

examples of OBO Foundry approach extended into other domains 42 NIF StandardNeuroscience Information Framework IDO ConsortiumInfectious Disease Ontology Suite cROPCommon Reference Ontologies for Plants UNEP Ontology Framework United Nations Environment Program Ontologies

Common Reference Ontologies for Plants (cROP)

The second important criterion of ontology success in supporting reasoning over Big Data is: keeping track of provenance = recording how data was generated and processed in a way external users can understand, to enhance combinability reproducibility 44

RELATION TO TIME CONTINUANT OCCURRENT GRANULARITY INDEPENDENT CONTINUANT DEPENDENT CONTINUANT ORGAN AND ORGANISM Organism NCBI Taxonomy Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Biological Process (GO) Ontology for Biomedical Investigations (OBI) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology (ENVO) 45 Phenotypic Quality (PATO) Recognizing a new family of protocol-driven processes (investigation, assay, …)

Anatomy Ontology (FMA*, CARO) Disease Ontology (OGMS, IDO, HDO, HPO) Bio- logical Process Protocol- driven process (OBI) Cell Ontology (CL) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology (PATO) Sequence Ontology (SO) Molecular Function Ontology (GO) Protein Ontology (PRO) Extension Strategy + Modular Organization I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) Basic Formal Ontology (BFO) 46

Structure of a typical investigation as viewed by OBI (from The Ontology for Biomedical Investigations

RELATION TO TIME CONTINUANT OCCURRENT GRANULARITY INDEPENDENT CONTINUANT DEPENDENT CONTINUANT INFORMATION ARTIFACT ORGAN AND ORGANISM Organism NCBI Taxonomy Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) IAO Software, Algorithms, … Sequence Data, EHR Data … Biological Process (GO) OBI CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Images, Image Data, Flow Cytometry Data, … Molecular Process (GO) OBI: Imaging Environment Ontology (ENVO) 48 Phenotypic Quality (PATO) Recognizing a new family of information entities: data, publications, images, algorithms …

Anatomy Ontology (FMA*, CARO) Disease Ontology (OGMS, IDO, HDO, HPO) Data Biological Process Assays Cell Ontology (CL) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology (PATO) Sequence Ontology (SO) Molecular Function Ontology (GO) Protein Ontology (PRO) Extension Strategy + Modular Organization I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) INFORMATION A RTIFACT (~D ATA ) O CCURRENT (~P ROCESS ) Basic Formal Ontology (BFO) 49

50 Even here, things are not as bad as they seem

51

52

53

54 obo/IAO_ http://purl.obolibrary.org/ obo/IAO_ : algorithm

IAO = Information Artifact Ontology: on-artifact-ontology/ 55

56

A list of ontologies using IAO Adverse Event Reporting Ontology (AERO) Bioinformatics Web Service Ontology Biological Collections Ontology (BCO) Chemical Methods Ontology (CHMO) Cognitive Paradigm Ontology (COGPO) Comparative Data Analysis Ontology Computational Neuroscience Ontology Core Clinical Protocol Ontology (C2PO) Document Act Ontology Eagle-I Research Resource Ontology (ERO) The Ontology Emotion Ontology (MFOEM) Experimental Factor Ontology (EFO) Exposé Ontology IAO-Intel Infectious Disease Ontology (IDO) Influenza Research Database (IRD) Information Entity Ontology Mental Functioning Ontology (MF) Ontology for Biomedical Investigations Ontology for Drug Discovery Investigations Ontology for General Medical Science (OGMS) Ontology for Newborn Screening Follow- up and Translational Research (ONSTR) Ontology of Clinical Research (OCRE) Ontology of Data Mining (OntoDM) Ontology of Medically Related Social Entities (OMRSE) Ontology of Vaccine Adverse Events Oral Health and Disease Ontology (OHDO) Population and Community Ontology Proper Name Ontology Semanticscience Integrated Ontology Software Ontology (SWO) Translational Medicine Ontology (TMO) Twitter Ontology Vaccine Ontology (VO)

Patient Demograp hics Phenotype (Disease, …) Disease processes Data about all of these things including image data … algorithms, software, protocols, … Instruments, Biomaterials, Functions Parameters, Assay types, Statistics … Anatomy Histology Genotype (GO) Biological processes (GO) Chemistry I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) IAOOBI Basic Formal Ontology (BFO) 58 aboutness

Patient Demograp hics Phenotype (Disease, …) Disease processes Data about all of these things including image data … algorithms, software, protocols, … Instruments, Biomaterials, Functions Parameters, Assay types, Statistics Anatomy Histology Genotype (GO) Biological processes (GO) Chemistry I NDEPENDENT C ONTINUANT (~T HING )) D EPENDENT C ONTINUANT (~A TTRIBUTE ) O CCURRENT (~P ROCESS ) IAOOBI Basic Formal Ontology (BFO) 59 biomedical imaging ontology

The third important criterion of ontology success in supporting reasoning over Big Data is: use the framework of modular, general-purpose reference ontologies as starting points for creating families of purpose-specific application ontologies in ever widening circles (scalability) 60

BFO 61 Ontology for General Medical Science (OGMS) Cardiovascular Disease Ontology Genetic Disease Ontology Cancer Disease Ontology Genetic Disease Ontology Immune Disease Ontology Environmental Disease Ontology Oral Disease Ontology Infectious Disease Ontology IDO Staph Aureus IDO MRSA IDO Australian MRSA IDO Australian Hospital MRSA …

Problems with: Denys-Drash syndrome is_a rare non- neoplastic disorder 1.Denys-Drash syndrome involves nephroblastoma and is therefore neoplastic 2.X is_a rare Y does not track biology

What are the criteria of success for ontologies in supporting reasoning over Big Data? correct: Beta cell is_a cell incorrect: rare disease is_a disease If the ontology hierarchy is to support biologically useful reasoning it must track biology 66