Introduction to the OntoLex-Lemon Model John P. McCrae1, Thierry Declerck2 1Insight Centre for Data Analytics, National University of Ireland Galway 2Austrian Centre for Digital Humanities
RDF Turtle (Terse) Syntax
Simple (!) RDF Document <http://dbpedia.org/resource/Paris> <http://dbpedia.org/ontology/populationTotal> “2229621”^^<http://www.w3.org/2001/XMLSchema#nonNe gativeInteger> . <http://dbpedia.org/resource/Paris> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Location> . <http://dbpedia.org/resource/Paris> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.wikidata.org/entity/Q486972> .
Prefixes @prefix dbo: <http://dbpedia.org/ontology/> . @prefix dbp: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://dbpedia.org/resource/Paris> <http://dbpedia.org/ontology/populationTotal> “2229621”^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> . <http://dbpedia.org/resource/Paris> <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://dbpedia.org/ontology/Location> . <http://dbpedia.org/resource/Paris> <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://www.wikidata.org/entity/Q486972> . dbp:Paris dbo:populationTotal “2229621”^^xsd:nonNegativeInteger . dbp:Paris rdf:type dbo:Location . dbp:Paris rdf:type <http://www.wikidata.org/entity/Q486972> . @prefix pre: <long> . pre:name => <long+name>
Continuations dbp:Paris dbo:populationTotal “2229621”^^xsd:nonNegativeInteger . dbp:Paris rdf:type dbo:Location . dbp:Paris rdf:type <http://www.wikidata.org/entity/Q486972> . Replace . with ; to repeat subject and , to repeat subject and object dbp:Paris dbo:populationTotal “2229621”^^xsd:nonNegativeInteger ; rdf:type dbo:Location , <http://www.wikidata.org/entity/Q486972> . Or more nicely formatted: dbp:Paris dbo:populationTotal “2229621”^^xsd:nonNegativeInteger ; rdf:type dbo:Location , <http://www.wikidata.org/entity/Q486972> .
RDF Lists ex:node ?? rdf:nil “one” “two” rdf:rest rdf:rest rdf:first
Blank nodes Some nodes do not have a known URI, we call these blank nodes, they are denoted with [ ] or _:id. A typical use is for lists: ex:node rdf:first “one” ; rdf:rest _:n1 . _:n1 rdf:first “two” ; rdf:rest rdf:nil . ex:node rdf:first “one” ; rdf:rest [ rdf:first “two” ; rdf:rest rdf:nil ] . Actually Turtle supports an even more compact syntax for lists ( “one” “two” )
URLs http://www.example.com/path/to/file#identifer Domain Fragment Protocol Path
Relative URLs URLs may be resolved relative to the Base URL (e.g., the URL used to find the document) <http://www.example.com/path/to/file#identifer> <//www.example.com/path/to/file#identifer> </path/to/file#identifer> <file#identifer> <#identifer>
Design of the Model
History LingInfo (Buitelaar, 2006) Monnet Lemon (2011) Linguistic Information Repository (Montiel-Ponsoda, 2008) OntoLex Use Cases (2014) Lexicography Module (2019) LexInfo (2010) OntoLex Lemon Final Specification (2016) LexOnto (Cimiano, 2007) OntoLex CG Founded (2012)
General Requirements R1. OWL and RDF R2. Multilinguality R3. Semantics by Reference R4. Openness R5. Reuse relevant standards
RDF and OWL RDF models are labelled directed graphs Representation Each entry has a URI Reuse of lexicon data Reasoning
Multilinguality Support any language Do not make language-specific assumptions Part-of-speech values Gender Translation and variation
Semantics by Reference Meaning of a word given by reference Reference captures semantic information Disambiguation is performed relative to the ontology No (traditional) word senses
Openness Extensible with new models No unnecessary choices of linguistic categories No payment or restrictions in using the model
Reuse standards Reuse as many standards as possible OWL RDF SKOS Dublin Core LMF TMF
The OntoLex-Lemon Model
...Cuairt Liteartha do Theangacha Mionlaigh san Eoraip... Ontologies http://www.example.com/foo/ID1234 “Language”@en “Teanga”@ga ...Cuairt Liteartha do Theangacha Mionlaigh san Eoraip...
Linked Data on the Web “Edema” http://dbpedia.org/resource/Edema http://de.dbpedia.org/resource/Ödem umls:C0013604 mesh:D00487 icd10:R60.9
Linked Data with Language “Edemata” http://dbpedia.org/resource/Edema http://de.dbpedia.org/resource/Ödem “Edema” mesh:D00487 icd10:R60.9 umls:C0013604 “Dropsy”
Lexical Entries EDEMA DROPSY “Edemata” “Edema” “Dropsy” dbpedia:Edema mesh:D00487 “Edema” icd10:R60.9 DROPSY “Dropsy” umls:C0013604
What is a lexical entry? A lexical entry represents a unit of analysis of the lexicon that consists of a set of forms that are grammatically related and a set of base meanings that are associated with these forms. Thus, a lexical entry is a word, multiword expression or affix with a single part-of-speech, morphological pattern, etymology and set of senses.
Forms “Edemata” “Edema” number=plural number=singular EDEMA
Senses EDEMA DROPSY dating=old dbpedia:Fish_Dropsy dbpedia:Edema
http://kahoot.it/
The Model
Simple Entry OntoLex Namespace @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . <#cat> a ontolex:Word ; ontolex:canonicalForm [ ontolex:writtenRep “cat”@en ] ; ontolex:denotes [ skos:definition “A four-legged, furry animal”@en ] . Lemma Sense
Simple Entry with Grammatical Information @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . <#cat> a ontolex:Word ; lexinfo:partOfSpeech lexinfo:noun ontolex:canonicalForm [ ontolex:writtenRep “cat”@en ; lexinfo:number lexinfo:singular ] ; ontolex:otherForm [ ontolex:writtenRep “cats”@en ; lexinfo:number lexinfo:plural ] ; ontolex:denotes [ skos:definition “A four-legged, furry animal”@en ] . LexInfo Ontology Part of Speech Inflected Form
Restriction on Lexical Sense @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbo: <http://dbpedia.org/ontology/> . <#bulrush> a ontolex:Word ; ontolex:sense [ ontolex:reference dbpedia:Typha ; ontolex:usage [ rdf:value “British English” ] ] ; ontolex:denotes dbpedia:Typha . <#cattail> a ontolex:Word ; ontolex:usage [ rdf:value “American English” ] ] ; sense ⚬ reference = denotes Restriction on Lexical Sense
<http://john.mccr.ae> foaf:knows agsc:cimiano Syntax and Semantics John knows Philipp <http://john.mccr.ae> foaf:knows agsc:cimiano
Syntax and Semantics
Syntactic Frames Synsem Module @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix synsem: <http://www.w3.org/ns/lemon/synsem#> . @prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . <#know> a ontolex:Word ; synsem:synBehavior <#know/transitive> . <#know/transitive> a synsem:SyntacticFrame ; lexinfo:subject <#know/subject> ; lexinfo:directObject <#know/directObject> . Frame
Semantic Frames @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix synsem: <http://www.w3.org/ns/lemon/synsem#> . @prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . @prefix foaf: <http://xmlns.com/foaf/0.1/>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . <#know> a ontolex:Word ; ontolex:sense <#know/sense> ; synsem:synBehavior <#know/transitive> . <#know/sense> a ontolex:LexicalSense , synsem:OntoMap ; synsem:ontoMap <#know/sense> ; ontolex:reference foaf:knows ; synsem:subjOfProp <#know/subject> ; synsem:objOfProp <#know/directObject> . foaf:knows a rdf:Property ; rdfs:domain foaf:Person ; rdfs:range foaf:Person . Lexical sense is an ontology mapping Identifiers from syntactic frame Ontological definition of semantic frame
Syntactic-Semantic Mapping Lexical Entry Argument (subject) Lexical Sense/ Onto Map Syntactic Frame Argument (object) Class (domain) Property Class (range)
Decomposition Qualitätsmanagement-System Qualität Management System
Decomposition
constituent ⚬ correspondsTo Decomposition constituent ⚬ correspondsTo = subterm @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix decomp: <http://www.w3.org/ns/lemon/decomp#> . <#summer_school> a ontolex:MultiWordExpression ; decomp:subterm <#summer>, <#school> . <#école_d’été> a ontolex:MultiWordExpression ; decomp:constituent <#école_d’été/école> , <#école_d’été/de> , <#école_d’été/été> ; rdf:_1 <#école_d’été/école> ; rdf:_2 <#école_d’été/de> ; rdf:_3 <#école_d’été/été> ; <#école_d’été/de> a decomp:Component ; decomp:correspondsTo <#de> ; lexinfo:lexTermType lexinfo:contraction . Order Component Properties
Variation and Translation Cultural Translation “Japanese Rice Cake”@en “もち”@ja
Variation and Translation
How to represent translation Lexical Level (4) Translatable As Rice@en “米”@ja Lexicosemantic Level vartrans:Translation (3) Stand-off Sense Sense (2) Translation Semantic Level dbpedia:Rice (1) Shared Reference
Linguistic Metadata Magic Ontology Jace the Wizard Erhnam the Djinn
LiMe - Linguistic Metadata See Manuel’s Talk
Future Directions
New Modules Morphology Lexicography (for traditional lexicographic resources) Frequency, Attribution and Corpus Information (FRAC) Etymology and Diachronicity Lexico-Syntactic Categories
Lexicography Module
Community Group Please join! http://www.w3.org/community/ontolex
Thanks. This work has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund, and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731015, ELEXIS - European Lexical Infrastructure.
Coffee