A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Slides:

Advertisements

Similar presentations

Using a domain-ontology and semantic search in an eLearning environment Lothar Lemnitzer, Kiril Simov, Petya Osenova, Eelco Mossel and Paola Monachesi.

Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.

Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Language Resources and Tools for the Creation of a Bulgarian Treebank Kiril Simov, Petya Osenova, Sia Kolkovska, Elisaveta Balabanova, Dimitar Doikoff.

SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.

Lecture 1 Introduction to the ABAP Workbench

SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.

Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.

Xyleme A Dynamic Warehouse for XML Data of the Web.

April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.

A Tool to Support Ontology Creation Based on Incremental Mini- Ontology Merging Zonghui Lian Data Extraction Research Group Supported by Spring Conference.

Chapter 9 Describing Process Specifications and Structured Decisions

A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.

Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

1 Chapter 11 Developing Custom Help. 11 Chapter Objectives Use HTML to create customized Help topics for an application Use the HTML Help Workshop to.

Semi-Automatic Generation of Mini-Ontologies from Canonicalized Relational Tables Chris Hathaway.

Software Engineer Report What should contains the report?!

Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.

1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.

CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.

Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,

Evaluating Websites.

Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Chapter 7. BEAT: the Behavior Expression Animation Toolkit

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Intelligent Database Systems Lab Presenter: WU, JHEN-WEI Authors: Rodrigo RizziStarr, Jose´ Maria Parente de Oliveira IS Concept maps as the first.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

1 Technologies for (semi-) automatic metadata creation Diana Maynard.

NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.

AUTOMATION OF WEB-FORM CREATION - KINNERA ANGADI – MS FINAL DEFENSE GUIDANCE BY – DR. DANIEL ANDRESEN.

SeL-LR&SD, LREC 2010, Valletta, Malta 1 Semantic Annotation for Semi- Automatic Positioning of the Learner Kiril Simov, Petya Osenova Linguistic Modelling.

Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.

From E-Content to E-Learning in Computational Linguistics Localisation of Teaching materials for less processed languages Kiril Simov *, Petya Osenova.

Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

Elizabeth Furtado, Vasco Furtado, Kênia Sousa, Jean Vanderdonckt, Quentin Limbourg KnowiXML: A Knowledge-Based System Generating Multiple Abstract User.

Towards a Pattern Language for User Interface Design

SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.

Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )

Understanding User’s Query Intent with Wikipedia G 여 승 후.

Computer Systems & Architecture Lesson 4 8. Reconstructing Software Architectures.

Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.

Session 10a, 21st October 2005 eChallenges e-2005 Copyright 2005 K-Wf Grid, Institute of Informatics SAS Experience Management based on Text Notes (EMBET)

Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.

An Ontological Approach to Financial Analysis and Monitoring.

Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.

1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.

Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.

Of 24 lecture 11: ontology – mediation, merging & aligning.

Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.

 System Requirement Specification and System Planning.

© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.

Databases (CS507) CHAPTER 2.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Institute of Informatics & Telecommunications NCSR “Demokritos”

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Presented by: Hassan Sayyadi

Architecture Components

Ontology-Based Approaches to Data Integration

Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.

Presentation transcript:

A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML, IPP, BAS CALP 2007, RANLP, Borovets

Outline of the Talk Motivation Requirements to the system Parameters of semantic annotation –General overview –Problematic issues CLaRK System –Basic architecture in brief –The new functionalities Conclusions

Motivation (1) The creation of automatic systems for semantic annotation needs: –Reliably annotated corpora with semantic information = gold standard data

Motivation (2) The semantic annotation requires various types of support: –appropriate source of semantic information (domain ontology) –comprehensive annotation guidelines –a system to support semi-automatic creation of such corpora (CLaRK)

Motivation (3) The annotation process follows the two steps: –chunk annotationidentification of the text segment which represents a given concept or a relation in the text –chunk annotation - identification of the text segment which represents a given concept or a relation in the text –concept selection - a chunk might represent more than one concept or relation depending on the context

Motivation (4) We follow the ideas of Erdmann et al that the manual (or semi- automatic) semantic annotation is a cyclic process mixing: –the actual annotation, and –the evolution of the ontology In our case we also include the lexicon and the concept annotation grammar in the process of the concurrent development.

Support requirements to the system (1) Search for a text segment: helps the annotator to determine the exact segment of text which is the carrier of the concept or relation from the ontology Concept selection: determines which concept/relation to be added to the annotation of the corresponding text segment

Support requirements to the system (2) Ontology evolution: updates the ontology in following cases: –new concept/relation is necessary for the annotation of a text segment –an existing concept needs to be changed in order to be more precise Lexicon/grammar evolution: updates them when: –there are changes in ontology –there are new expressions for already existing concepts/relations

Support requirements to the system (3) Annotation evolution: after changes in the ontology and/or the lexicon/grammar it is necessary to update the previously done annotations In the implementation of these functionalities we follow the requirements for a semantic annotation system as they are stated in Uren et al. 2006

Parameters of semantic annotation The ideal prerequisite for semantic annotation is the interaction among the following three components: Domain ontology Lexicons Grammars concepts terms link of terms to concepts domain texts

Domain ontologies (3) We use English as lingua franca (as usual) HOWEVER: We rely on the meanings of the concepts We aim at reconciling the discrepancy between knowledge conceptualization and language lexicalization –If there is no a lexicalized term for a concept, then one of the terms is selected as a name (ASCII vs. ASCII code table), or a concept name is constructed as a phrase (BarWithButtons vs. Toolbar)

Terminological lexicons (1) Lists of the main keywords in a certain domain Free expressions are also allowed Example: Example: AlphanumericDisplay [a display that gives the information in the form of characters (numbers or letters)] In Bulgarian: 9 spelling and lexical variants буквеноцифров дисплей, буквено-цифров дисплей, символен дисплей, буквеноцифров монитор, буквено-цифров монитор, символен монитор, буквеноцифров екран, буквено- цифров екран, символен екран

Terminological lexicons (2) Generalized structure of the Lexicon (1)a representative term which constitutes the meaning for all the term wordings within the entry. This term usually ensures the mapping to the relevant concept (2)explanation of the concept meaning in lingua franca (usually it is English, but in fact it might be any natural language); (3)a set of terms in a given language that have the meaning expressed by the leading term

Grammars Two interconnected steps: (1) concept annotation step (by cascaded regular grammars in CLaRK) (2) disambiguation step (by constraint facilities in CLaRK) The quality of the grammar predefines the coverage and precision of the annotation, and hence – the efficiency of the search

Interaction among modules OntologyLexicalized Terms Free Phrases Grammars Domain Text

Problematic issues wrt SA Disambiguation is needed of ambiguous cases (LINK as Connection and Hyperlink) Due to the problems of coverage and precision of the ontology the following operations are also needed: –addition, extension, deletion of concepts or their correction

CLaRK: architecture and tools CLaRK XML Regular grammars Constraints Editing operations Extraction SortStatistics XPath Engine Macro Language

The CLaRK System: previous work flow architecture Tool preparation phase –Writing grammars –Writing constraints, etc Document Processing –Application of grammars, constraints –User input – selection of constraint options, selection of grammar application Revision of the tools

The CLaRK System: new work flow architecture Tool preparation phase –Writing grammars –Writing constraints, etc Document Processing –Application of grammars, constraints –User input – selection of constraint options, selection of grammar application –Processing-time revision of the tools Revision of the tools

Conclusions We presented an architecture for the semantic annotation of XML documents in a domain from both sides of view - linguistic adequacy and implementation The process of semantic annotation interleaves with ontology / lexicon / grammar evolution This way of combining the three tasks allows the annotation process also to develop from almost completely manual work towards an effective semi-automatic support module

Thank you! Ever moving CLaRK Functionalities User running for better tools