Download presentation
Presentation is loading. Please wait.
1
1 Ontology Tutorial Part 1 What is Ontology and What Can It Do? Barry Smith http://ontology.buffalo.edu/smith
2
2 The problem of data integration / information fusion About 30,000 genes in a human Probably 100-200,000 proteins Individual variation in most genes 100s of cell types 100,000s of disease types
3
3 DNA Protein Organelle Cell Tissue Organ Organism Muscle tissue Nerve tissue Connective tissue Epithelial tissue Blood Musculo-skeletal system Circulatory system Respiratory system Digestive system Nervous system Urinary system Reproductive system Endocrine system Lymphoidal system Mitochondria Nucleus Endoplasmic reticulum Cell membrane
4
4 The Challenge Each (clinical, pathological, genetic, proteomic, pharmacological …) information system uses its own terminology and category system biomedical research demands the ability to navigate through all such information systems How can we overcome the incompatibilities which become apparent when data from distinct sources is combined?
5
5 Answer: “Ontology”
6
6 Three senses of ontology 1.Philosophical sense: an inventory of the types of entities and relations in reality 2.Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium)
7
7 Three senses of ontology 1.Philosophical sense: an inventory of the types of entities and relations in reality 2.Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium)
8
8 Ontology as a branch of philosophy seeks to establish the basic formal-ontological structures the kinds and structures of objects, properties, events, processes and relations in each material domain of reality
9
9 Formal ontology an analogue of pure mathematics Can be applied to different domains
10
10 Material ontology a kind of generalized chemistry or zoology (Aristotle’s ontology grew out of biological classification)
11
11 Aristotle world’s first ontologist
12
12 World‘s first ontology ( from Porphyry’s Commentary on Aristotle’s Categories)
13
13 Linnaean Ontology
14
14 Formal Ontology –theory of part and whole –theory of dependence / unity –theory of boundary, continuity and contact –theory of universals and instances –theory of continuants and occurrents (objects and processes) –theory of functions and functioning –theory of granularity
15
15 Formal Ontology the theory of those ontological structures (such as part-whole, universal-particular) which apply to all domains whatsoever
16
16 Formal-Ontological Categories substance process function unity plurality site dependent part independent part are able to form complex structures in non- arbitrary ways joined by relations such as part, dependence, location.
17
17 A Network of Domain Ontologies Material (Regional) Ontologies Basic Formal Ontology
18
18
19
19 Three senses of ontology 1.Philosophical sense: an inventory of the types of entities and relations in reality 2.Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium)
20
20 Assumptions Communication / compatibility problems should be solved automatically (by machine) Hence ontologies must be applications running in real time
21
21 Application ontology: Ontologies are inside the computer thus subject to severe constraints on expressive power (effectively the expressive power of Description Logic)
22
22 Problem: Confusion of concepts and entities in reality Don’t construct theories of reality; construct ‘models’ of ‘concepts’
23
23 The Semantic Web Ontology in the Knowledge Engineering Sense
24
24 A new silver bullet
25
25 The Semantic Web designed to integrate the vast amounts of heterogeneous online data and services via dramatically better support at the level of metadata designed to yield the ability to query and integrate across different conceptual systems
26
26 Tim Berners-Lee, inventor of the internet ‘ sees a more powerful Web emerging, one where documents and data will be annotated with special codes allowing computers to search and analyze the Web automatically. The codes … are designed to add meaning to the global network in ways that make sense to computers’
27
27 hyperlinked vocabularies, called ‘ontologies’ will be used by Web authors ‘to explicitly define their words and concepts as they post their stuff online. ‘The idea is the codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’
28
28 Exploiting tools such as: XML OWL (Ontology Web Language) RDF (Resource Descriptor Framework) DAML-OIL (Darpa Agent Mark-Up Language – Ontology Inference Layer) (confusing syntactic integration with semantic integration)
29
29 Ontology confused with: the language of ontology ‘Ontology’ for semantic webbers is without content Philosophical ontology = build a theory of reality Semantic-web-style ontology = build a model of the data in our computers
30
30 Defining ‘gene’ GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype
31
31 Example: The Enterprise Ontology A Sale is an agreement between two Legal- Entities for the exchange of a Product for a Sale- Price. A Strategy is a Plan to Achieve a high-level Purpose. A Market is all Sales and Potential Sales within a scope of interest.
32
32 Example: Statements of Accounts Company Financial statements may be prepared under either the (US) GAAP or the (European) IASC standards These allocate cost items to different categories depending on the laws of the countries involved.
33
33 Job: to develop an algorithm for the automatic conversion of income statements and balance sheets between the two systems. Not even this relatively simple problem has been satisfactorily resolved … why not? Because the very same terms mean different things and are applied in different ways in different cultures
34
34 The Semantic Web Initiative The Web is a vast edifice of heterogeneous data sources Needs the ability to query and integrate across different conceptual systems
35
35 How resolve incompatibilities? enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which 1. satisfy the constraints of a description logic (DL) 2. are applied as meta-tags to the content of websites
36
36 Clay Shirky The Semantic Web is a machine for creating syllogisms. Humans are mortal Greeks are human Therefore, Greeks are mortal
37
37 Lewis Carroll - No interesting poems are unpopular among people of real taste - No modern poetry is free from affectation - All your poems are on the subject of soap- bubbles - No affected poetry is popular among people of real taste - No ancient poetry is on the subject of soap- bubbles Therefore: All your poems are bad.
38
38 the promise of the Semantic Web it will improve all the areas of your life where you currently use syllogisms
39
39 We can use the Semantic Web to prove that Joe loves Mary we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site. To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/]http://infomesh.net/2001/swintro/
40
40 Merging Databases Merging databases simply becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it. [http://infomesh.net/2001/swintro/]http://infomesh.net/2001/swintro/ Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web
41
41 XML-syntax does not help Jules Deryck Newco XTC Group Business Manager +32(0)3.471.99.60 +32(0)3.891.99.65 +32(0)465.23.04.34 www.newco.com Dendersesteenweg 17 2630 Aartselaar Belgium
42
42 and with correct XML-syntax: Jules Deryck Newco XTC Group Business Manager +32(0)3.471.99.60 +32(0)3.891.99.65 +32(0)465.23.04.34 www.newco.com Dendersesteenweg 17
43
43 and with correct XML-syntax: Jules Deryck Newco XTC Group Business Manager +32(0)3.471.99.60 +32(0)3.891.99.65 +32(0)465.23.04.34 www.newco.com Dendersesteenweg 17 2630 Aartselaar Belgium Is "Jules" the first name of the person, or of the business-card?
44
44 and with correct XML-syntax: Jules Deryck Newco XTC Group Business Manager +32(0)3.471.99.60 +32(0)3.891.99.65 +32(0)465.23.04.34 www.newco.com Dendersesteenweg 17 2630 Aartselaar Belgium Is Jules or Newco the member of XTC Group?
45
45 and with correct XML-syntax: Jules Deryck Newco XTC Group Business Manager +32(0)3.471.99.60 +32(0)3.891.99.65 +32(0)465.23.04.34 www.newco.com Dendersesteenweg 17 2630 Aartselaar Belgium Do the phone numbers and address belong to Jules or to the business?
46
46 Shirkey: The Semantic Web's philosophical argument -- the world should make more sense than it does -- is hard to argue with. The Semantic Web, with its neat ontologies and its syllogistic logic, is a nice vision. However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to be effective in the real world …
47
47 Semantic Web effort thus far devoted primarily to developing systems for standardized representation of web pages and web processes (= ontology of web typography) not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages
48
48 Cory Doctorow A world of exhaustive, reliable metadata would be a utopia.
49
49 Problem 1: People lie Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners Metadata exists in a competitive world. Some people are crooks. Some people are cranks. Some people are French philosophers.
50
50 Problem 2: People are lazy Half the pages on Geocities are called “Please title this page”
51
51 Problem 3: People are stupid The vast majority of the Internet's users (even those who are native speakers of English) cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DL- hierarchy they're supposed to be using?
52
52 Problem 4: Ontology Impedance = semantic mismatch between ontologies being merged This problem recognized in Semantic Web literature: http://ontoweb.aifb.uni-karlsruhe.de /About/Deliverables/ontoweb-del-7.6-swws1.pdf
53
53 Solution 1: treat it as (inevitable) ‘impedance’ and learn to find ways to cope with the disturbance which it brings Suggested here: http://ontoweb.aifb.uni-karls-ruhe.de/Ab- out/Deliverables/ontoweb-del-7.6-swws1.pdf
54
54 Solution 2: resolve the impedance problem on a case-by-case basis Suppose two databases are put on the web. Someone notices that "where" in the friends table and "zip" in the places table mean the same thing. http://www.w3.org/DesignIssues/Semantic.html
55
55 Both solutions fail 1. treating mismatches as ‘impedance’ ignores the problem of error propagation (and is inappropriate in an area like medicine) 2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web
56
56 Ontology Impedance ‘gene’ used in websites issued by biotech companies involved in gene patenting medical researchers interested in role of genes in predisposition to smoking insurance companies
57
57 The idea: distinguish two separate tasks: -developing an expressively rich correct ontologies of given domains -developing on this basis computer applications capable of running in real time
58
58 Basic Formal Ontology BFO The Vampire Slayer
59
59
60
60 BFO ontology not the ‘standardization’ or ‘specification’ of concepts (not a branch of knowledge or concept engineering) but an inventory of the types of entities existing in reality
61
61 BFO not a computer application but a reference ontology in the sense of Aristotelian philosophy - it sacrifices tractability for the sake of expressive power
62
62 Defining ‘gene’ GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype
63
63 Ontology ‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ … are ontological terms in the sense of traditional (philosophical) ontology
64
64 BFO not just a system of categories but a formal theory with definitions, axioms, theorems designed to provide formal resources for the building of reference ontologies for specific domains the latter should be of sufficient richness that terminological incompatibilities can be resolved intelligently rather than by brute force
65
65 The Reference Ontology Community IFOMIS (Saarbrücken) Laboratories for Applied Ontology (Trento/Rome, Turin) Foundational Ontology Project (Leeds) Ontology Works (Baltimore Department of Structural Biology (Seattle) Virtual Soldier Project (DARPA) Open Biological Ontologies Consortium (Cambridge, Berkeley, Bar Harbor)
66
66
67
67 Ontology Tutorial Part 2 The Future of Ontology in Biomedicine
68
68 Ontology Tutorial Part 2: The Future of Ontology in Buffalo
69
69 Ontology Tutorial Part 2 The Future of Ontology in Biomedicine
70
70 Three senses of ontology 1.Philosophical sense: an inventory of the types of entities and relations in reality 2.Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium)
71
71 Philosophical Ontology Ontologies are WINDOWS ON REALITY Ontologies deal with classes/universals/invariants in reality which exist independently of our theorizing and independently of our language
72
72 What are universals? invariants in reality satisfying biological laws (there are truths about universals in biological textbooks)
73
73 A universal is not determined by its instances as a state is not determined by its citizens A universal may vary with time as an organism may vary with time (by gaining and losing molecules)
74
74 Universals are Not Sets A set is an abstract structure, existing outside time and space. The set of Romans timelessly has Julius Caesar as a member. Universals exist in time.
75
75 A Window on Reality
76
76 Medical Diagnostic Hierarchy a hierarchy in the realm of diseases
77
77 Dependence Relations OrganismsDiseases
78
78 A Window on Reality OrganismsDiseases
79
79 A Window on Reality
80
80 siamese mammal cat organism substance universals animal instances frog
81
81
82
82 Many current standard ‘ontologies’ ramshackle because they have no counterpart of formal ontology The Universal Medical Language System (UMLS) a compendium of source vocabularies including: HL7 RIM SNOMED International Classification of Diseases MeSH – Medical Subject Headings Gene Ontology
83
83 Three senses of ontology 1.Philosophical sense: an inventory of the types of entities and relations in reality 2.Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain (Semantic Web) 3. Ontology as controlled vocabulary (Gene Ontology, Open Biological Ontologies Consortium)
84
84 Problem: The different source vocabularies are incompatible with each other
85
85 Problem: They contain bad coding which often derives from failure to pay attention to simple logical or ontological principles or from principles of good definitions
86
86 Bad Coding Plant roots is-a Plant Plant leaves is-a Plant Pollen is-a Plant Both testes is a testis Both uterii is a uterus
87
87 Bad definitions Heptolysis = def the cause of heptolysis Biological process = def a biological goal that requires more than one function
88
88 The Concept Orientation Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures Has focused almost exclusively on ‘concepts’ conceived (sometimes confused with terms/descriptions).
89
89 The Curse of Linguistics Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures This led to the assumption that all that need be said about classes can be said without appeal to time or to instances in reality. Ontology is about meanings/terms/strings
90
90 An alternative research programme for ontology based on philosophical principles Terms in bio-ontologies refer not to ‘concepts’ but to universals in reality
91
91 already reformed Foundational Model of Anatomy Anatomy Reference Ontology
92
92
93
93 A window on reality
94
94 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision
95
95 To represent ontological relations we need to take instances into account To say A part_of B is not to say anything about Bs’ need for As as parts
96
96 part_of as a relation between universals A part_of B = def given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y) human testis part_of human being, But not: heart part_of human being.
97
97 already reformed Foundational Model of Anatomy Anatomy Reference Ontology
98
98 under construction / overhaul Physiology Reference Ontology Gene Ontology OBOL
99
99 The Gene Ontology a controlled vocabulary for annotations of genes and gene products
100
100 When a gene is identified three important types of questions need to be addressed: 1. Where is it located in the cell? 2. What functions does it have on the molecular level? 3. To what biological processes do these functions contribute?
101
101 GO has three ontologies molecular functions cellular components biological processes
102
102 GO astonishingly influential used by all major species genome projects used by all major pharmacological research groups used by all major bioinformatics research groups
103
103 GO part of the Open Biological Ontologies consortium Fungal Ontology Plant Ontology Yeast Ontology Disease Ontology Mouse Anatomy Ontology Cell Ontology Sequence Ontology Relations Ontology
104
104 Each of GO’s ontologies is organized in a graph-theoretical structure involving two sorts of links or edges: is-a (= is a subtype of ) (copulation is-a biological process) part-of (cell wall part-of cell)
105
105
106
106
107
107 cellular components molecular functions biological processes 1372 component terms 7271 function terms 8069 process terms
108
108 The Cellular Component Ontology (counterpart of anatomy) flagellum chromosome membrane cell wall nucleus
109
109 The Molecular Function Ontology ice nucleation protein stabilization kinase activity binding The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity
110
110 Biological Process Ontology glycolysis copulation death An ontology of occurrents on the level of granularity of cells, organs and whole organisms
111
111 GO built by biologists free of the Curse of Linguistics free of the Curse of Computer Science
112
112 but problems still remain menopause part_of aging aging part_of death menopause part_of death
113
113 heptolysis Definition The causes of heptolysis …
114
114 regulation of sleep part_of sleep extrinsic to membrane part_of membrane
115
115 GO uses only two relations is_a and part_of
116
116 hence GO has only sentences of the forms A is_a B and A part_of B no way to express ‘not’ and no way to express ‘is localized at’ and no way to express ‘I don’t know’:
117
117 Holliday junction helicase complex is-a unlocalized cellular component unknown is-a cellular component
118
118 Old GO definition of part_of A part_of B = def A can be part of B
119
119 New GO definition of part_of as part of current OBOL reform effort A part_of B = def given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y)
120
120 Analogous problems for nearly all foundational relations of ontologies and semantic networks: A causes B A is associated with B A is located in B etc. Reference to instances is necessary to clear up these problems
121
121
122
122 The Future of Ontology in Buffalo http://ontology.buffalo.edu/bcor/ to provide a forum within which philosophical ontologists and those involved in ontology applications can work together in high- level interdisciplinary research to assist in coordination and integration of projects in ontological research being pursued in Buffalo
123
123 Gary Byrd Charles Dement Randall Dipert John Eisner Daniel Fischer Louis Goldberg Jorge Gracia David Hershenov Rajiv Kishore Eric Little James Llinas David Mark Bill Rapaport Galina Rogova Ram Ramesh Stuart C. Shapiro Barry Smith Rohini Srihari Moises Sudit
124
124 College of Arts and Sciences Computer Science and Engineering School of Management Center of Excellence in Bioinformatics School of Informatics School of Dental Medicine Center for Multisource Information Fusion National Center for Geographic Information and Analysis School of Medicine and Biomedical Sciences
125
125 Computer Science and Engineering School of Management Charles Dement Pharma of the Future
126
126 Computer Science and Engineering Daniel Fischer Bill Rapaport Stuart Shapiro Rohini Srihari
127
127 School of Management Ram Ramesh Rajiv Kishore
128
128 Center of Excellence in Bioinformatics Daniel Fischer
129
129 School of Informatics / School of Medicine Gary Byrd Medical Informatics Certificate Program
130
130 School of Dental Medicine John Eisner Louis Goldberg SNODENT
131
131 Center for Multisource Information Fusion Eric Little James Llinas Galina Rogova Moises Sudit
132
132 National Center for Geographic Information and Analysis David Mark Barry Smith
133
133 Department of Philosophy Barry Smith (Director?) Randall Dipert Jorge Gracia David Hershenov Ingvar Johansson Jiyuan Yu
134
134 Goal To show how philosophical ontology can contribute to the successful application of ontologies in information systems
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.