Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Wordnet Development Using a Multifunctional Tool Ivan Obradović, Ranka Stanković University of Belgrade Faculty.
Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.
BalkaNet project overview Dan Tufiş Dan Cristea Sofia Stamou RACAI UAIC DBLAB.
Improved TF-IDF Ranker
The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Section 4: Language and Intelligence Overview Instructor: Sandiway Fong Department of Linguistics Department of Computer Science.
Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Methodology Conceptual Database Design
LREC 2008 AWN 1 Building WordNets: The Arabic case H. Rodríguez.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
BT Exact Technologies - Adastral Park, Ipswich July - October 2003 Linguistic Web Services for Semantic Web Dr. Vassil T. Vassilev London Metropolitan.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Methodology - Conceptual Database Design Transparencies
Methodology Conceptual Databases Design
1 Define a model 2 Populate the lexicon. Core Model.
Toman, Steinberger, Ježek Searching and Summarizing in a Multilingual Environment Michal Toman, Josef Steinberger, Karel Ježek University of West Bohemia.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool) Ján GENČI Technical University of Košice, Slovakia
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
WORDNET. THE WORDNET SYSTEM  Lexicographer files  Code: Lexico files  database  Search Routines and Interfaces.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.
Methodology - Conceptual Database Design
WordNet: Connecting words and concepts Peng.Huang.
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
Wordnet - A lexical database for the English Language.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
© Copyright 2008 STI INNSBRUCK A Semantic Model of Selective Dissemination of Information for Digital Libraries.
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
1 Dictionary priorities, e- dictionaries of compounds, morphological mode Cvetana Krstev & Duško Vitas.
A few words about history Duško Vitas University of Belgrade Faculty of Mathematics.
Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
WordNet, EuroWordNet, Balkanet Faculty of Informatics MU Karel Pala
DALOS Progress Meeting – April 20th Florence The Lois data base A Knowledge Organization System for Dalos Daniela Tiscornia.
Ontologies Introduction to Computational Linguistics – 23 March 2016.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Methodology Conceptual Databases Design
Lexicons, Concept Networks, and Ontologies
Talp Research Center, UPC, Barcelona, Spain
Methodology Conceptual Database Design
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Linguistic Linked Open Data
WordNet: A Lexical Database for English
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
Methodology Conceptual Databases Design
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Presentation transcript:

Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University of Belgrade

WordNet (WN) a semantic network of concepts represented by synsets – sets of synonymous words (nouns, verbs, adjectives & adverbs) contains explicitly coded descriptions of semantic relations inspired by research in the field of psycholinguistics initially developed at Princeton for the English language Fellbaum C. (ed.), (1998) WordNet: An Electronic Lexical Database, The MIT Press

Multilingual WordNets Featuring: the InterLingual Index (ILI) EuroWordNet (EWN): Dutch, Italian, Spanish, German, French, Czech and Estonian BalkaNet (BWN) five Balkan languages: Greek, Turkish, Bulgarian, Romanian and Serbian, as well as Czech Vossen, P. (ed.) (1998) EuroWordNet: A Multilingual Database with Lexical Semantic Networks, Kluwer Academic Publishers, Dordrecht Stamou S., Oflazer K., Pala K., Christoudoulakis D., Cristea D., Tufis D., Koeva S., Totkov G., Dutoit D., Grigoriadou M. (2002) BALKANET: A Multilingual Semantic Network for Balkan Languages, 1st International Wordnet Conference, Mysore, India, January 2002 ( elsnet-ko-accept.pdf) elsnet-ko-accept.pdf

The WN semantic network based on a grouping of synonyms into synsets - representing network nodes nodes are interconnected by arcs which describe particular semantic relations (hyperonymy, hyponymy, antonymy etc.) in general, every synset is accompanied by a definition (gloss) and examples of usage that specify the meaning of the concept represented by the synset the semantic network itself is an XML- document with a precisely established set of entities

The Serbian version of WN developed starting from the base concepts of the English WN using existing English/Serbian dictionaries in paper form synset elements represented as the elements in DELAS or DELAC dictionaries without any additional morphosyntactic information lexical meanings in Serbian coded with reference to the dictionary of Matica Srpska

XML representation of a synset in Serbian WN ( demonstrate, establish, prove, show) ENG v dokazati 1 dokazivati 1 pokazati 3 pokazivati 3 Utvrditi valxanost necyega, primerom, objasxnxenxem ili eksperimentom. (Establish the validity of something by example, explanation or experiment) Anketa je pokazala da u tako nesxto veruje mali broj ispitanih. (The poll showed that few people believe in this) v ENG v hypernym 1 Dusko 2003/04/21

Problems in Serbian WN that might be solved using INTEX lack of morphological and syntactic information related to lexemes absence of precise criteria for the selection of lexemes for a particular synset lack of information on relative relevance of each lexeme in a synset in terms of its lexical frequency

Incorporation of morphosyntactic information into synsets using INTEX The DictWNSrp program matches literals in WN with literals in selected Delas dictionaries and extracts morphosyntactic information from dictionaries assigns morphosyntactic information to WN literals in cases of a 1-1 match offers the user the option to confirm or alter the assigned information and resolve cases of homography (e.g. multiple matches) transfers confirmed morphosyntactic information into the WN using the LNOTE element

Resolving homography with the DictWNSrp program

XML representation of a synset with assigned morphosyntactic information dokazati 1 V122+Perf+Tr+Iref+Ref dokazivati 1 V18+Imperf+Tr+Iref pokazati 3 V122+Perf+Tr+Iref+Ref pokazivati 3 V18+Imperf+Tr+Iref

Validation of lexemes from a synset on a corpus Phase One: The IntexWN program selects and displays all synsets from WN for a given lexeme constructs Intex graphs for all lexemes from selected synsets Phase Two: INTEX produces concordances from a chosen corpus for graphs constructed by IntexWN Phase Three: User checks the validity of synonymous relations of lexemes on concordances decides on removing or adding new lexemes to the synset

Constructing a graph for all lexemes from a synset with the IntexWN program

Validation results for synset ENG ( being, beingness, existence) Comments: the lexemes used in the synset have been used to denote the given concept in 24% of concordances the lexeme most frequently used to denote the given concept is postojanxe although zxivot is the most frequent lexeme in the synset, it has been used to denote the given concept only in 10% of cases bivstvo does not occur in the corpus and its exclusion from the synset could be considered if a similar result is obtained on a wider corpus

Further developments definition of more precise criteria for validation of lexemes in a synset based on their occurrence in corpora investigation of possibilities for introducing relevance information in synsets further development of the IntexWN program to include semantic relations, such as hyponymy/ hyperonymy etc. introduction of near-synonym information into the Serbian WN using INTEX dictionaries (e.g. augmentatives/diminutives) investigation of possibilities for introducing multi- lingual features into INTEX using the WN (to be used for parallel corpora)