Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power
Automating Generation of Textual Class Definitions from OWL to English Summary Motivation Use Case Methods and Description Generator Results Evaluation Open Questions (still)
Automating Generation of Textual Class Definitions from OWL to English Motivation Textual definitions are cornerstone of good practice in ontology delivery a requirement of the OBO process hard work to produce Logical definitions make meaning explicit to the computer help maintenance of the ontology’s structure, querying, and so on are also hard to produce but also more difficult to understand The information in one form should reflect the information in the other Need to keep textual and logical definitions synchronised Aim to produce fluent textual definitions from logical definitions/description in OWL
Automating Generation of Textual Class Definitions from OWL to English OWL Smackdown: Computer vs Human
Automating Generation of Textual Class Definitions from OWL to English Our Hypotheses Text = humans Logical = computers (and future human-computer hybrids) Textual definitions ≈ Logical definition Textual definitions tend to be more lossy than logical (cardinalities are often dropped, specific roles not mentioned, etc.) Logical definitions are often more explicit than natural language and therefore should contain sufficient content to produce a textual definition.
Automating Generation of Textual Class Definitions from OWL to English EFO Use Case Experimental Factor Ontology (EFO) is an application ontology which consumes domain ontologies to satisfy specific application focused use cases Primarily Gene Expression data from EBI
Automating Generation of Textual Class Definitions from OWL to English Gene Expression Atlas
Related Work Generating descriptions from ontologies often called ‘ontology verbalisation’ A number concerned only with ABox verbalisation (Hielkema 2009; Galanis and Androutsopoulos, 2007) Others produce only separate sentences, one for each OWL axiom (Kalijurand, 2007) Our approach has much in common but differs in; only a subset of OWL is considered (the simple description logic EL++) instead of realising axioms in isolation we apply some rules for organisation and aggregation to give more natural feel Automating Generation of Textual Class Definitions from OWL to English
Automating Generation of Textual Class Definitions from OWL to English Method Overview An OWL ontology is just a “pile of axioms” We can produce individual sentences based on a grammar that guides transformation from OWL to English (or other natural language) Need to group sentences (group axioms with the same subject together) Need to aggregate axioms (collapse axioms with the same relationship together) Once grouped and aggregated, a paragraph of text can be produced sentence by sentence. hasPart some leg hasPart some body hasPart some head Has parts leg, body and head
Automating Generation of Textual Class Definitions from OWL to English Processing stages Transcode OWL/XML to Prolog Construct a lexicon for atomic entities – (next slide) Group axioms by atomic entity Aggregate axioms with similar structure Generate sentences from aggregated axioms. class(animal). subClassOf(class(cat), class(animal). subClassOf(class(dog), class(animal). => class(animal). subClassOf([class(cat), class(dog)], class(animal)). => ANIMAL. A cat and a dog are both kinds of animals.
Automating Generation of Textual Class Definitions from OWL to English Description Generator Input: OWL/XML ontology Output: Text describing atomic entities generation from label/URL It is assumed that the syntax of each phrase will be severely constrained as follows: individuals are expressed by proper names classes by common nouns (with singular and plural forms) properties by transitive verbs (simple or compound) with slots for a subject and an object. ANIMAL. The following are kinds of animals: a cat, a duck, a giraffe, a person, a sheep, and a tiger. An animal eats a thing. If X has as pet Y then necessarily Y is an animal.
Automating Generation of Textual Class Definitions from OWL to English Results Class labelOWL axioms (Manchester syntax)Natural Language Definition Extracted 22rv1bearer_of some 'prostate carcinoma' derives_from some 'Homo sapiens' derives_from some prostate A 22rv1 is a cell line. A 22rv1 is all of the following: something that is bearer of a prostate carcinoma, something that derives from a homo sapiens, and something that derives from a prostate. HeLabearer_of some 'cervical carcinoma' derives_from some 'Homo sapiens' derives_from some cervix derives_from some 'epithelial cell' A he la is a cell line. A he la is all of the following: something that is bearer of a cervical carcinoma, something that derives from a homo sapiens, something that derives from an epithelial cell, and something that derives from a cervix. Ara-C-resistant murine leukemia has subclass b117h* has subclass b140h* A ara c resistant murine leukemia is a cell line. A b117h, and a b140h are kinds of ara c resistant murine leukemias. GM18507derives_from some 'Homo sapiens' derives_from some lymphoblast has_quality some male A gm18507 is all of the following: something that has as quality a male, something that derives from a homo sapiens, and something that derives from a lymphoblast. *axioms placed on subclasses
Automating Generation of Textual Class Definitions from OWL to English Results Online survey of ontology users at EBI 10 of the 50 verbalisations were evaluated based on widest range of axioms Total Judgement
Automating Generation of Textual Class Definitions from OWL to English Findings Finding of dodgy class; definition for Ara-C-resistant murine leukemia indicated subclasses b117h and b140h types of this, implying that they were diseases rather than cell lines Desire amongst this user group for simplicity of language – avoid ontological formality e.g. bearer of Especially property names for qualities e.g. has as quality male Initial verbalisation making semantics clear was not liked Plural forms occasionally issue: lex(class(EFO_ ),noun, ‘cell line’, ‘cell lines’). lex(class(EFO_ ),noun, ‘22rv1’,’22rv1s’).
Automating Generation of Textual Class Definitions from OWL to English Conclusion Initial results were largely well received and considered useful in most cases Discovery of incorrect class definition demonstrates potential as tool for class validation Preference for text definitions was for ‘clear and simple’ over ‘precise and complex’ Dependent entities could become adjectival forms of the independent entities in which they inhere (cell has quality female becomes female cell) Formal relations/class labels reduce understanding and should be brought closer to domain language Many ontologies are not amenable to text mining – this is an important use case neglected by most Definitions now being imported into EFO
Automating Generation of Textual Class Definitions from OWL to English Next Steps Systematic study of acceptable wordings Different wording styles for different users Adjectival forms for qualities etc; the role of a upper level ontology Moving beyond EL++ Parsing for OBO
Next Steps: Round Tripping Automating Generation of Textual Class Definitions from OWL to English
Open Questions Should textual descriptions ≡ logical descriptions? Are discrepencies acceptable? Automating Generation of Textual Class Definitions from OWL to English
Automating Generation of Textual Class Definitions from OWL to English Acknowledgements Sandra Williams, Richard Power and Robert Stevens are funded by the SWAT project (EPSRC grants EP/G033579/1 and EP/G032459/1); James Malone is funded by EMBL and EMERALD (project number LSHG-CT ). We would like to thank the members of the EBI’s ontology interest group, functional genomics group and Dr Helen Parkinson for comments and survey participation