Ontologies and foreign data models Paul Millar
Contents What is an ontology? What is it useful for? More on ontologies Foreign data models Consensus approach When consensus breaks down How to use others' results. 6/19/2018 Talk Title
What's an ontology? “a formal, explicit description of concepts in a domain of discourse” -- huh? Lots of things exist in our world: events, files, datasets, users, physics groups, roles, collaborations, sites, storage elements, ... We group these things into different classes We can define/add properties for classes Also add restrictions(facets) to properties With specific instances(data), this is a knowledge base
What's it useful for? To share common understanding of structure of information amongst people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyse domain knowledge Taken from Ontology Development 101: A guide to creating your first ontology; Noy & McGuinness
Dataset hierarchy The most important property of a class is the taxomic relation (“is-a”). Multiple inheritance is expected. Some examples Production dataset is-a/is-a-kind-of Dataset Analysis dataset is-a Dataset Reco-source is-a Dataset RDO dataset is-a Reco-source dataset RDO dataset is-a Monte Carlo dataset TDAQ dataset is-a Reco-source dataset
Decomposition A item is made up of one (or more) components Limited by cardinality Examples File is-part-of a Dataset Event is-part-of a File Production Grid Job is-part-of a task Task is-part-of a transformation chain Tier-n is-part-of a geographic region Site is-part-of a Tier-2 CE is-part-of a Tier-n
Explicit “other” links Other links (properties/slots) are possible: Analysis dataset is-created-by a user Production dataset is-result-of a physics group File1 is-replica-of File2 Job is-requested-by a User User is-member-of a physics group (or more than one)
Foreign data models Two (groups of) users might have different understanding (ontology) about a particular environment Particle Physics is mostly about annotating datasets (private Tags) There will be no standard dictionary. No way to know that number X (by Physicist A) represents the same concept as Physicist B represents with number Y.
Consensus So, “easy” solution, build up a dictionary of well-defined words. No so easy: should be unambiguous should be complete should be well adhered to should be static
Consensus breaks down We imagine that no one can (or has the time to) make the consensus model. Each physicist does his/her own work and information is passed via peer-reviewed publications This is probably “good enough”, but can we do better?
An ontological solution Being able to bridge ontologies by specifying links between the two ontologies. If the links are sufficient and the ontologies are compatible, then it might be possible to reason about relationships between components in different ontologies. Speculative work is subject to a research proposal ... hopefully more news soon!
Summary A quick introduction to ontologies and how they might be useful. A introduction to the foreign data model problem (is it a problem?) A possible solution, using ontologies.