1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV How to do biology across the genome?
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 3
scientists need help from ontologists 4
Uses of ‘ontology’ in PubMed abstracts 5
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 6
7 what cellular component? what molecular function? what biological process? Gene Ontology
8 what cellular component? what molecular function? what biological process? and through curation of literature
9 GO as Common Controlled Vocabulary MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity
10 GO as Common Controlled Vocabulary MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex
11
Science is about universals We learn about universals from looking at the results of scientific experiments as expressed in the form of scientific theories – which describe what is general 12
siamese mammal cat organism substance animal instances frog universals 13
Ontology engineers: LET’S GENERALIZE THESE BENEFITS BY BUILDING ONTOLOGIES IN OTHER AREAS … 14
The standard engineering methodology Pragmatics (‘usefulness’) is everything Usefulness = we get to write software which runs on our machines 15
It is easier to write useful software if one works with a simplified model (“…we can’t know what reality is like in any case; we only have our concepts…”) Engineer A: This looks like a useful model to me (One week goes by:) Engineer B: This other thing looks like a useful model to me The standard engineering methodology
Data in Pittsburgh does not interoperate with data in Vancouver Science is siloed 17
Implications Ontology decisions should be made on strictly scientific grounds They should be relatively independent of tools and applications 18
Ontology Development 101: A Guide to Creating Your First Ontology Natalya Noy and Deborah McGuinness “An ontology together with a set of individual instances of classes constitutes a knowledge base. In reality, there is a fine line where the ontology ends and the knowledge base begins.” 19
Classes “Classes are the focus of most ontologies. Classes describe concepts in the domain. For example, a class of wines represents all wines. Specific wines are instances of this class. The Bordeaux wine in the glass in front of you … is an instance of the class of Bordeaux wines.” 20
Instances “we can create [!] an individual instance Chateau-Morgon-Beaujolais to represent a specific type of Beaujolais wine. Chateau-Morgon-Beaujolais is an instance of the class Beaujolais” How can ontology engineers create an instance? 21
An instance or a class? “Deciding whether a particular concept [e.g. the Bourgogne region] is a class in an ontology or an individual instance depends on what the potential applications of the ontology are.” 22
Wines are the instances? “Individual instances are the most specific concepts represented in a knowledge base: if we are only going to talk about pairing wine with food we will not be interested in the specific physical bottles of wine. … the Wine class is [then] a collection not of individual bottles of wines but rather of the specific wines produced by specific wineries.” 23
Or bottles “On the other hand, if we would like to maintain an inventory of wines in the restaurant … individual bottles of each wine may become individual instances in our knowledge base.” 24
Or vintages “Similarly, if we would like to record different properties for each specific vintage of the Sterling Vineyards Merlot, then the specific vintage of the wine is an instance in a knowledge base and Sterling Vineyards Merlot is a class containing instances for all its vintages.” 25
Another rule “If concepts form a natural hierarchy, then we should represent them as classes. Consider the wine regions. Initially, we may define main wine regions, such as France, United States, Germany, and so on, as classes and specific wine regions within these large regions as instances. For example, Bourgogne region is an instance of the French region class.” 26
“However, we would also like to say that the Cotes d’Or region is a Bourgogne region. Therefore, Bourgogne region must be a class (in order to have subclasses or instances). However, making Bourgogne region a class and Cotes d’Or region an instance of Bourgogne region seems arbitrary: it is very hard to clearly distinguish which regions are classes and which are instances. Therefore, we define all wine regions as classes.” 27
The Alsace region does not exist “Only classes can be arranged in a hierarchy – knowledge-representation systems do not have a notion of sub- instance. Therefore, if there is a natural hierarchy among terms, …, we should define these terms as classes even though they may not have any instances of their own.” 28
From the Protégé glossary: “Instance: Concrete occurrence of information about a domain that is entered into a knowledge base. For example, Fran Smith might be an instance for a Name slot. An instances is entered via a form generated by Protégé-2000.” “The Bordeaux wine in the glass in front of you … is an instance of the class of Bordeaux wines.” 29
From the Protégé glossary: “… individual bottles of each wine may become individual instances in our knowledge base.” 30
Classes “Classes are the focus of most ontologies. Classes describe concepts in the domain. For example, a class of wines represents all wines. Specific wines are instances of this class. The Bordeaux wine in the glass in front of you … is an instance of the class of Bordeaux wines.” 31
Instances “we can create an individual instance Chateau-Morgon-Beaujolais to represent a specific type of Beaujolais wine. Chateau-Morgon-Beaujolais is an instance of the class Beaujolais” How can you create an instance? 32
An instance or a class? “Deciding whether a particular concept [e.g. the Bourgogne region] is a class in an ontology or an individual instance depends on what the potential applications of the ontology are.” 33
Wines are the instances? “Individual instances are the most specific concepts represented in a knowledge base: if we are only going to talk about pairing wine with food we will not be interested in the specific physical bottles of wine. … the Wine class is [then] a collection not of individual bottles of wines but rather of the specific wines produced by specific wineries.” 34
Or bottles “On the other hand, if we would like to maintain an inventory of wines in the restaurant in addition to the knowledge base of good wine-food pairings, individual bottles of each wine may become individual instances in our knowledge base.” 35
Or vintages “Similarly, if we would like to record different properties for each specific vintage of the Sterling Vineyards Merlot, then the specific vintage of the wine is an instance in a knowledge base and Sterling Vineyards Merlot is a class containing instances for all its vintages. 36
Another rule “If concepts form a natural hierarchy, then we should represent them as classes. Consider the wine regions. Initially, we may define main wine regions, such as France, United States, Germany, and so on, as classes and specific wine regions within these large regions as instances. For example, Bourgogne region is an instance of the French region class.” 37
“However, we would also like to say that the Cotes d’Or region is a Bourgogne region. Therefore, Bourgogne region must be a class (in order to have subclasses or instances). However, making Bourgogne region a class and Cotes d’Or region an instance of Bourgogne region seems arbitrary: it is very hard to clearly distinguish which regions are classes and which are instances. Therefore, we define all wine regions as classes.” 38
The Alsace region does not exist “Only classes can be arranged in a hierarchy – knowledge-representation systems do not have a notion of sub- instance. Therefore, if there is a natural hierarchy among terms, …, we should define these terms as classes even though they may not have any instances of their own.” 39
From the Protégé glossary: “Instance: Concrete occurrence of information about a domain that is entered into a knowledge base. For example, Fran Smith might be an instance for a Name slot. An instances is entered via a form generated by Protégé-2000.” “The Bordeaux wine in the glass in front of you … is an instance of the class of Bordeaux wines.” 40
From the Protégé glossary: “… individual bottles of each wine may become individual instances in our knowledge base.” 41
Why build scientific ontologies “There are many ways to create ontologies …” Multiple ontologies simply make our data silo problems worse We need to constrain ontologies so that they converge Just as bad scientific theories must die, so also bad ontologies must die 42
Science-based ontology development Q: What is to serve as constraint in order to avoid silo creation ? A: Reality, as revealed, incrementally, by experimentally-based science 43
Ontological realism Find out what the world is like (= by doing science) Build representations adequate to this world, not to some simplified model in your laptop 44
Goal of the OBO Foundry to provide a suite of controlled structured vocabularies for the callibrated annotation of data to support integration and algorithmic reasoning across the entire domain of biomedicine as biomedical knowledge grows, these ontologies must be evolved in tandem 45
46 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck
47 OBO FOUNDRY CRITERIA The developers of each ontology commit to its maintenance in light of scientific advance. and to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.
Orthogonality one ontology for each domain no need for ‘semantic matching’ no need for ‘ontology integration’ no need for ‘mappings’ (too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change) 48
Orthogonality is our best (perhaps our only) hope of solving the data silo problem Why do computer engineers hate orthogonality so much? (and like ‘relativism’ – every project its own, new ontology?) 49
Ontology (Science) Experimental results are being described in algorithmically useful ways with the help of ontologies like the GO Such ontologies are authored and maintained by scientists to support the sharing, retrieval, integration and analysis of their data Thesis: these ontologies are part of science.
Ontologies like the GO are part of science They must be associated with computer implementations (with engineering artifacts) But the ontologies are not themselves engineering artifacts The same ontology can be associated with multiple engineering artifacts 51
Ontologies like the GO are comparable to –scientific theories –scientific databases –scientific journal publications 52
Ontologies like the GO are being used by scientific journal publications – to provide more useful access to article content via controlled structured keyword lists – to provide a basis for creating formally structured versions of journal articles themselves 53
OBO Foundry working with journal publishers to advance orthogonality by creating a methodology for expert peer review of ontologies 54
Benefits of orthogonality helps those new to ontology to find what they need to find models of good practice ensures mutual consistency of ontologies (trivially) and thereby ensures additivity of annotations 55
More benefits of orthogonality it rules out simplification and partiality brings an obligation on the part of ontology developers to commit to scientific accuracy and domain- completeness 56
More benefits of orthogonality helps to eliminate redundancy serves the division of ontological labor: allows experts to focus on their own domains of expertise makes possible the establishment of clear lines of authority 57
The goal of orthogonality is a basic goal of science it is a pillar of the scientific method that scientists should strive always to seek out and resolve conflicts between competing theories 58
is there a problem with orthogonality? what if I need my own ontology of cellular membranes to meet my own special purposes? strategy of application ontologies should be developed from the start using terms whose definitions employ the resources of orthogonal ontologies like those within the Foundry any other approach creates silos 59
Better to have one consensus ontology serving multiple purposes imperfectly because multiple ontologies addressing the same domain, whether they are good ones or bad ones, create silos 60
Benefits of ontology peer review 1.will provide an impetus to the improvement of scientific knowledge over time 2.brings benefits to readers, since they need only absorb and collate vetted ontologies, as opposed to all the ontologies available e.g. on the Semantic Web 61
Peer review creates incentives for investment of effort in ontology work It gives career-related credit to both authors and reviewers (university promotions and funding based on peer review credit) Supports creation of a professional career path for ontologists It gives credit to scientific experts for investment of scientific expertise It allows measurement of citations of ontologies It magnifies the motivating potential of the factor of influence 62
For engineers, ontologies 1.can be bought and sold 2.need have no well-demarcated scientific domains 3.need not be subject to further maintenance 4.can be stand-alone products 5.are typically tied to one specific implementation Ontology (engineering) thereby makes the silo problem worse 63
Cntologies created to serve scientific purposes 1.are developed to be common resources (thus they cannot be bought or sold) 2.for representation of well-demarcated scientific domains 3.subject to constant maintenance by domain experts 4.designed to be used in tandem with other, complementary ontologies 5.maximally independent of format and implementation 64
Background assumptions Scientific hypotheses should be formulated by scientists Scientific experiments should be carried out by scientists Scientific databases should be developed and maintained by scientists Scientific textbooks should be written by scientists 65
Question: Who should build scientific ontologies? 66
THE END 67
In the olden days people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Portugal, varas of Texas, etc., etc. 68
on June 22, 1799, in Paris, everything changed 69
we now have the International System of Units 70
The SI is a Controlled Vocabulary Each SI unit is represented by a symbol, not an abbreviation. The use of unit symbols is regulated by precise rules. The symbols are designed to be the same in every language. Use of the SI system makes scientific results comparable 71
The SI is an Ontology Quantities are universals one each for each measurable dimension of reality Can we provide an analogue of the SI system for (basic dimensions of) biology? 72
First step OBO (Open Biomedical Ontologies) library comprehends some 70 ontologies now made available also on the NCBO Bioportal the majority of these ontologies are built to work well with the Gene Ontology 73
All OBO Foundry ontologies work in the same way –we have data (biosample, haplotype, clinical data, survey data,...) –we need to make this data available for not just string-based search and algorithmic processing –we create a consensus-based ontology for annotating the data
We have data BioHealthBase:Tuberculosis Database, VFDB: Virulence Factor DB TropNetEurop: Dengue Case Data BioHealthBase: Influenza Database PathPort: Pathogen Portal Project IMBB: Malaria Data 75
We need to annotate this data to allow retrieval and integration of –sequence and protein data for pathogens –case report data for patients –clinical trial data for drugs, vaccines –epidemiological data for surveillance, prevention –... Goal: to make data deriving from different sources comparable and computable 76
We need common controlled vocabularies to describe these data in ways that will assure comparability and cumulation What content is needed to adequately cover the infectious domain? –Host-related terms (e.g. carrier, susceptibility) –Pathogen-related terms (e.g. virulence) –Vector-related terms (e.g. reservoir) –Terms for the biology of disease pathogenesis (e.g. evasion of host defense) –Population-level terms (e.g. epidemic, endemic, pandemic) 77
IDO provides a common template It contains terms (like ‘pathogen’, ‘vector’, ‘host’) which apply to organisms of all species involved in infectious disease and its transmission Disease- and organism-specific ontologies are then built as refinements of the IDO core – the common core guarantees some level of comparability of data 78