Ontology: A Guide for the Intelligence Analyst Barry Smith http://ontology.buffalo.edu/smith
Problem of ensuring sensible cooperation in a massively interdisciplinary community concept type instance model representation data
What do these mean? ‘conceptual data model’ ‘semantic knowledge model’ ‘reference information model’ ‘an ontology is a specification of a conceptualization’
help is on the way ...
national center for ontological research
ECOR Partner Institutions Laboratory for Applied Ontology, Trento/Rome Center for Theoretical and Applied Ontology, Turin Foundational Ontology Group, University of Leeds JCOR – Japanese Center for Ontological Research
Ontologies (tech.) Ontology (phil.) Standardized classification systems which enable data from different sources to be combined Ontology (phil.) The theory of being
The need strong general purpose classification hierarchies created by domain specialists thoroughly tested in real use cases to help us navigate through oceans of data
Good ontologies should be intelligible to human beings computationally useful capable of being glued together
The actuality (too often) myriad special purpose ‘light’ ontologies, prepared by ontology engineers and deposited in internet ‘repositories’ or ‘registries’ which only create NEW oceans of data
Schemaweb ontologies (http://www.w3.org/) MusicBrainz Metadata Vocabulary Musical Baton Vocabulary Beer Ontology Kissology
‘Lite’ ontologies often do not generalize … repeat work already done by others are not gluable together no roadmap for progressive improvement reproduce the very problems of communication which ontology was designed to solve
Ontology (science) The empirical study of how to build humanly useful and computationally tractable representations of entities and of the relations between them Evidence-based terminology research
Why NCOR? Why NCOR? NCOR will advance ontology as science develop measures of quality for ontologies to establish best practices
Why NCOR? NCOR will provide coordination and support for investigators working in ontology and its applications engage in outreach endeavors designed to foster the goals of high quality ontology in both theory and practice advance ontology education
National Center for Biomedical Ontology $18.8 mill. NIH Roadmap Center Stanford Medical Informatics University of San Francisco Medical Center Berkeley Drosophila Genome Project Cambridge University Department of Genetics The Mayo Clinic University at Buffalo Department of Philosophy
From chromosome to disease
… legacy of Human Genome Project genomics transcriptomics proteomics reactomics metabonomics phenomics behavioromics connectomics toxicopharmacogenomics bibliomics … legacy of Human Genome Project
need for semantic annotation of data where in the body ? what kind of disease process ? need for semantic annotation of data dir.niehs.nih.gov/ microarray/datamining/
Woops: 54M already ! Compare with 3M Dec 2004, and 12 M june 2005 when I did this.
natural language labels to make the data cognitively accessible to human beings dir.niehs.nih.gov/ microarray/datamining/
compare: legends for maps http://www.ags.gov.ab.ca/GRAPHICS/uranium/athabasca_group_map_with_legend.jpg
ontologies are legends for data dir.niehs.nih.gov/ microarray/datamining/
compare: legends for cartoons
legends help human beings use and understand complex representations of reality help human beings create useful complex representations of reality help computers process complex representations of reality
computationally tractable legends help human beings find things in very large complex representations of reality
ontologies are legends for images
what lesion ? what brain function ?
which period? which architectural style? which type of building?
ontologies are legends for mathematical equations xi = vector of measurements of gene i k = the state of the gene ( as “on” or “off”) θi = set of parameters of the Gaussian model ...
ontologies are legends for word lists ...and the Computer's View name education CV private work © 2006 Adam Pease, Articulate Software Slide inspired by Frank von Harmelan Slide inspired by Frank von Harmelan
The Idea GlyProt MouseEcotope DiabetInGene GluChem sphingolipid transporter activity DiabetInGene GluChem
annotation using common ontologies yields integration of databases MouseEcotope GlyProt Holliday junction helicase complex DiabetInGene GluChem
Glue-ability / integration rests on the existence of a common benchmark called ‘reality’ the ontologies we want to glue together are representations of what exists in the world not of what exists in the heads of different groups of people
truth is correspondence to reality
simple representations can be true
there are true cartoons
a cartoon can be a veridical representation of reality
a network diagram can be a veridical representation of reality
http://www.psb.ugent.be/cbd/images/research/networks.jpg
pathway maps are representations of complexes of types http://www.nature.com/msb/journal/v1/n1/images/msb4100014-f1.jpg
maps may be correct by reflecting topology, rather than geometry
an image can be a veridical representation of reality a labeled image can be a more useful veridical representation of reality
an image labelled with computationally tractable labels can be an even more useful veridical representation of reality
annotations help us to find images
annotations using common ontologies can yield integration of image data
and link image databases together Gazetteer GlyProt ruins of Hadrami mosque CIA Factbook GluChem
if you’re going to semantically annotate piles of data, better work out how to do it right from the start
two kinds of annotations
names of types
names of instances
instances vs. types dir.niehs.nih.gov/ microarray/datamining/
instances vs. types types dir.niehs.nih.gov/ microarray/datamining/
instances
molecular images and radiographic images are representations of instances
First basic distinction type vs. instance (science text vs. diary) (human being vs. Tom Cruise)
For ontologies it is generalizations that are important = ontologies are about types, kinds
Inventory vs. Catalog: Two kinds of representational artifact Databases represent instances Ontologies represent types
Catalog vs. inventory A 515287 DC3300 Dust Collector Fan B 521683 Gilmer Belt C 521682 Motor Drive Belt
Catalog vs. inventory
Catalog of types/Types
Ontology types Instances
Ontology = A Representation of types
An ontology is a representation of types We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories experiments relate to what is particular science describes what is general
object types organism animal cat mammal siamese frog instances
Ontologies are here
or here
ontologies represent general structures in reality (leg)
Ontologies do not represent concepts in people’s heads
They represent types in reality
which provide the benchmark for integration
My job here Not tools: Leo Obrst, Chris Welty Not instances: Werner Ceusters Ontology content : the types in reality
How to build an ontology create an initial top-level classification of your domain = ~50 most common types of entities arrange the corresponding expressions terms into an informal is_a hierarchy according to this universality principle A is_a B every instance of A is an instance of B fill in missing terms to give a complete hierarchy move on to populate the lower levels of the hierarchy) annotate your data
Example domain: threat, vulnerability Eric Little
Example domain: The ontology of documents Hernando de Soto
valuable work on ‘documents’ in the context of XML, etc. e.g. Bob Glushko: “A document is a purposeful and self-contained collection of information.” focuses on information content, but there is more than information here
transactional documents passport contract tax form bill of lading shipping authorization plane ticket visa
the legal powers of documents the social interactions in which they play a role the institutional systems to which they belong provenance (original, copy, forgery ...)
document vs. attachments signatures, seals, stamps ...
http://www.chazj.com/indent/en/1664qa.jpg
anchoring documents to reality
Countersignatures
document template vs. filled-in document document vs. the piece of paper (or other physical carrier) upon which it is written/printed, ...
Standardized documents filled in completely/partially correctly/incorrectly validly/invalidly
from the Shiprock Navajo fair New Mexico, September 30-October 1, 2005
Standardized documents allow networking across time across space (individuals linked through document systems) improve the flow of communications allow standardized transactions
Documents are artifacts analogous to organizations, rules, prices, debts, claims and obligations ...
John Searle The Construction of Social Reality claims and obligations are brought into existence by the performance of speech acts
appointings, marryings, promisings change the world We perform a speech act ... the world changes, instantaneously
The de Soto thesis document systems are mechanisms for creating the institutional orders of modern societies
stock and share certificates create capital marriage licenses create bonds of matrimony statutes of incorporation create companies title deeds create property rights and property owners insurance certificates create insurance coverage
Identity documents create identity and thereby create the possibility of identity theft what is the ontology of identity? what is the epistemology of identity (the technologies of identification)?
What you can do with a document sign it stamp it witness it fill it in revise it nullify it deliver it (de facto, de jure) ...
types of document systems types of document acts types of document systems types of document pathways
if you’re going to semantically annotate piles of data, better work out how to do it right from the start
Tomorrow: The problems, and a strategy for the future