Download presentation
Presentation is loading. Please wait.
1
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML, IPP, BAS CALP 2007, RANLP, Borovets
2
Outline of the Talk Motivation Requirements to the system Parameters of semantic annotation –General overview –Problematic issues CLaRK System –Basic architecture in brief –The new functionalities Conclusions
3
Motivation (1) The creation of automatic systems for semantic annotation needs: –Reliably annotated corpora with semantic information = gold standard data
4
Motivation (2) The semantic annotation requires various types of support: –appropriate source of semantic information (domain ontology) –comprehensive annotation guidelines –a system to support semi-automatic creation of such corpora (CLaRK)
5
Motivation (3) The annotation process follows the two steps: –chunk annotationidentification of the text segment which represents a given concept or a relation in the text –chunk annotation - identification of the text segment which represents a given concept or a relation in the text –concept selection - a chunk might represent more than one concept or relation depending on the context
6
Motivation (4) We follow the ideas of Erdmann et al. 2000 that the manual (or semi- automatic) semantic annotation is a cyclic process mixing: –the actual annotation, and –the evolution of the ontology In our case we also include the lexicon and the concept annotation grammar in the process of the concurrent development.
7
Support requirements to the system (1) Search for a text segment: helps the annotator to determine the exact segment of text which is the carrier of the concept or relation from the ontology Concept selection: determines which concept/relation to be added to the annotation of the corresponding text segment
8
Support requirements to the system (2) Ontology evolution: updates the ontology in following cases: –new concept/relation is necessary for the annotation of a text segment –an existing concept needs to be changed in order to be more precise Lexicon/grammar evolution: updates them when: –there are changes in ontology –there are new expressions for already existing concepts/relations
9
Support requirements to the system (3) Annotation evolution: after changes in the ontology and/or the lexicon/grammar it is necessary to update the previously done annotations In the implementation of these functionalities we follow the requirements for a semantic annotation system as they are stated in Uren et al. 2006
10
Parameters of semantic annotation The ideal prerequisite for semantic annotation is the interaction among the following three components: Domain ontology Lexicons Grammars concepts terms link of terms to concepts domain texts
11
Domain ontologies (3) We use English as lingua franca (as usual) HOWEVER: We rely on the meanings of the concepts We aim at reconciling the discrepancy between knowledge conceptualization and language lexicalization –If there is no a lexicalized term for a concept, then one of the terms is selected as a name (ASCII vs. ASCII code table), or a concept name is constructed as a phrase (BarWithButtons vs. Toolbar)
12
Terminological lexicons (1) Lists of the main keywords in a certain domain Free expressions are also allowed Example: Example: AlphanumericDisplay [a display that gives the information in the form of characters (numbers or letters)] In Bulgarian: 9 spelling and lexical variants буквеноцифров дисплей, буквено-цифров дисплей, символен дисплей, буквеноцифров монитор, буквено-цифров монитор, символен монитор, буквеноцифров екран, буквено- цифров екран, символен екран
13
Terminological lexicons (2) Generalized structure of the Lexicon (1)a representative term which constitutes the meaning for all the term wordings within the entry. This term usually ensures the mapping to the relevant concept (2)explanation of the concept meaning in lingua franca (usually it is English, but in fact it might be any natural language); (3)a set of terms in a given language that have the meaning expressed by the leading term
14
Grammars Two interconnected steps: (1) concept annotation step (by cascaded regular grammars in CLaRK) (2) disambiguation step (by constraint facilities in CLaRK) The quality of the grammar predefines the coverage and precision of the annotation, and hence – the efficiency of the search
15
Interaction among modules OntologyLexicalized Terms Free Phrases Grammars Domain Text
16
Problematic issues wrt SA Disambiguation is needed of ambiguous cases (LINK as Connection and Hyperlink) Due to the problems of coverage and precision of the ontology the following operations are also needed: –addition, extension, deletion of concepts or their correction
17
CLaRK: architecture and tools CLaRK XML Regular grammars Constraints Editing operations Extraction SortStatistics XPath Engine Macro Language http://www.bultreebank.org/clark/index.html
18
The CLaRK System: previous work flow architecture Tool preparation phase –Writing grammars –Writing constraints, etc Document Processing –Application of grammars, constraints –User input – selection of constraint options, selection of grammar application Revision of the tools
19
The CLaRK System: new work flow architecture Tool preparation phase –Writing grammars –Writing constraints, etc Document Processing –Application of grammars, constraints –User input – selection of constraint options, selection of grammar application –Processing-time revision of the tools Revision of the tools
20
Conclusions We presented an architecture for the semantic annotation of XML documents in a domain from both sides of view - linguistic adequacy and implementation The process of semantic annotation interleaves with ontology / lexicon / grammar evolution This way of combining the three tasks allows the annotation process also to develop from almost completely manual work towards an effective semi-automatic support module
21
Thank you! Ever moving CLaRK Functionalities User running for better tools
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.