Download presentation
Presentation is loading. Please wait.
Published byWilliam Bell Modified over 9 years ago
1
ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH
2
Possible Topics (Nicoletta) Relations between Lexicon and Ontology. Possible issues/questions: –Is LMF enough to represent Ontological links? –How to connect work being done in ISO Lexical group and ISO Ontology groups? –Lexicon and Ontologies: separation? or lexicalised ontologies? or ontologised lexicons? –Lexicon, Ontologies and Domains –Relation to multilinguality –On a very different dimension: Ontology of lexical/semantic/conceptual categories? Standardised semantic categories, ontology labels? –A general question: where to store best the semantics of words/terms etc. In linguistic lexicon or in ontologies? The items in italic are also central within the MONNET project (http://www.monnet-project.eu/) Extension to the Linked Data Cloud for establishing links between lexical/linguistic information and semantic web? See www.lexvo.org as a first start. / Place of ISO standards (Data Categories) in the cloud?www.lexvo.org
3
Relevance of the topic for HLT Multingual extraction of and access to information, which is encoded in knowledge representation systems. Towards (again?) knowledge-driven NLP applications. But need to clarify the understandings of the word „lexicon“ in the Semantic Web (SW) and in the HLT communities –In SW, a lexicon is probably the list of all natural language expressions occuring in labels of ontologies –In HLT, the lexicon is probably the repository of all „lexemes“ of a language Relation between both understandings? LMF =>
4
LMF: Some definitions http://www.lexicalmarkupframework.org/ 3.14 form sequence of morphs 3.18 graph minimal unit in a written language including letters, pictograms, ideograms, numerals and punctuations 3.24 lemma lemmatised form canonical form conventional form chosen to represent a lexeme 3.25 lexeme abstract unit generally associated with a set of forms sharing a common meaning 3.26 lexical entry container for managing one or several forms and possibly one or several meanings in order to describe a lexeme 3.28 lexicon resource comprising lexical entries for a given language NOTE A special language lexicon or a lexicon prepared for a specific NLP application can comprise a specific subset of language. 3.31 morph sequence of graphs or sequence of phones EXAMPLE The word boys consists of two morphs: boy and s. 3.38 phone minimal unit in the sound system of a language
5
Question to LMF: Is the list of labels of a KR system a lexicon?
6
If Yes? The focus of ISO? Maybe give a clear definition of what is meant by „lexicalisation“ of ontologies? Representig the linguistic information that is conveyed by the „entries“ (or labels): –Apply LMF and LAF/GrapH to the „entries“, or –Another model, like „lemon“ (see next slides on the MONNET project), related to LMF => This probably an item for W3C (interoperability on the Web) Are we done then? –I think not: we have to „compactize“ the ontology lexicon, and avoid redundancy of entries. We need then to be able to link the „lexicalized“ entries to the larger terms and to the domain concepts => towards a domain specific lexical network. A possible model ist CTL (see next slide) A basic question: Would the Ontologizing the lexicalised ontologies („lexicon“) yeld another model of the domain?
7
CTL, Declerck & Lendvai, LREC 2010
8
A concrete Example from XBRL
9
Issues for lexicalisation In the example on last slide, linguistic information for a lexical item can be distributed over the labels of different classes. How to cope with this? Or the term (label): „Profit (loss) from continuing operations“. Should this be translated into one „ontology lexical entry“ or three: – 1. Profit from continuing operations – 2. Loss from continuing operations – 3. Profit (loss) from continuing operations Clearly: lexicalisation can lead to very redundant and generalisation missing „ontology lexical entry“.
10
Or has ISO another focus? Ontologised (multilingual) lexicons? –Rather than “lexicalised ontologies” –Toward a domain specific “LinguisticNet” (combination of WordNet and FrameNet like structures) Other principles of building a lexicon as the one of just using term in labels? Are we not too restricted if we consider only the lexicon? Should we extend to syntax etc?
11
MONNET project The Monnet project is concerned mainly with ontology localisation, i.e., the translation of the lexico-terminological level of ontologies (often referred to as the ‘ontology labels’). The project outcomes, as currently understood by the project members, can be described as a set of software components as follows, all of which can be used in combination as well as stand-alone: –Ontology Lexicalization (with use of ISO datcats) –Ontology Localization –Cross-lingual Ontology-based Information Extraction –Cross-lingual Knowledge Access & Presentation
12
MONNET project: Architecture
13
MONNET: Details The core objective of Monnet is the provision of advanced services for the translation of the lexico-terminological level of ontologies, which will be instantiated by the ‘localization service’. However, as ontologies often have only a very limited representation of lexico- terminological information, a first step will be to analyze a given ontology and enrich it with appropriate information on –i) the terminological structure of ontology labels, –ii) linguistic information on terminology items, and –iii) analysis of implicit semantics where needed. Together we refer to these analysis and enrichment steps as “ontology lexicalisation”, which will be instantiated by the ‘lexicalisation service’ that takes as input an ontology and outputs an ‘ontology-lexicon’ for at least one default language (depending on the language that was used in defining ontology labels). A ‘corpus service’ will enable access to external domain corpus evidence for modelling and analyzing language use in the ontology labels. The ontology-lexicon will be represented on the basis of the so-called ‘lemon’ format[1], a lexicon model for ontologies that has been defined by the Monnet project for the appropriate integration of lexical/linguistic and terminological information in ontologies. The different lexicons will therefore be handled by use of the ‘lemon API’ as shown.[1] This aspect of MONNET will possibly be standardized within W3C activities [1] http://lexinfo.net/[1]http://lexinfo.net/
14
The lemon model, for encoding lexicalized ontologies
15
Role of ISO Standards for Lexicalization W3C is mainly about the representation and linking of information. ISO could design some guidelines on how to formulate labels of ontologies/semantic resources in the LD?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.