Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu.

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

Building Wordnets Piek Vossen, Irion Technologies.
Ontologies ARIN Practical W7/Spr Dimitar Kazakov & Suresh Manandhar.
Improved TF-IDF Ranker
Greedy Algorithms Greed is good. (Some of the time)
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Statistical NLP: Lecture 3
Fingerprint Minutiae Matching Algorithm using Distance Histogram of Neighborhood Presented By: Neeraj Sharma M.S. student, Dongseo University, Pusan South.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Lectures on Network Flows
Extracting an Inventory of English Verb Constructions from Language Corpora Matthew Brook O’Donnell Nick C. Ellis Presentation.
Progress in inferring business relationships between ASs Dmitri Krioukov 4 th CAIDA-WIDE Workshop.
Emerging from the Quagmire Building Expert Systems Technologies for the Social Sciences Robert Wozniak IASSIST 2002 University of Connecticut – 12 June.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Aki Hecht Seminar in Databases (236826) January 2009
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and.
© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Mining and Summarizing Customer Reviews
Automatic Summarization of News using WordNet Concept Graphs
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
Automated Suggestions for Miscollocations the Fourth Workshop on Innovative Use of NLP for Building Educational Applications Authors:Anne Li-E Liu, David.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
Mack T-12 Pre-Matrix Data Analyses Version 3 Presented to T-12 Task Force March 17, 2005 Jim Rutherford (510)
IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
Sampling in Graphs Alexandr Andoni (Microsoft Research)
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Linguistic Graph Similarity for News Sentence Searching
Talp Research Center, UPC, Barcelona, Spain
Automatically Extending NE coverage of Arabic WordNet using Wikipedia
Statistical NLP: Lecture 3
Web News Sentence Searching Using Linguistic Graph Similarity

Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1
Ontology Evolution: A Methodological Overview
Recognizing Partial Textual Entailment
WordNet: A Lexical Database for English
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
Mapping Ontology classes to Wordnet synsets
Linguistic Essentials
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
T-cell phenotypes of cells harvested from the peripheral blood of patients undergoing chemotherapy. T-cell phenotypes of cells harvested from the peripheral.
Presentation transcript:

Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu The 8th Meeting of The Global Wordnet Conference in BUCHAREST January 27-30, 2016

Motivation – why to validate? Every expandable and developing human-machine system needs a feedback mechanism The quality of wordnet has a strong impact on the quality of NLP tasks that use it Multiple inheritance cases in the semantic hierarchies of wordnet are prone to different semantic errors 2

Main aim To prove that semantic hierarchies of wordnet-type dictionaries do contain yet undiscovered substructures which correspond to certain descriptions (test patterns) and … the usage of these patterns to validate semantic hierarchies may improve wordnet structure significantly 3

Previous work Cycles (Šmrz, 2004), (Kubis, 2012) Shortcuts (Fischer, 1997) Rings (Liu et al., 2004; Richens, 2008) Dangling uplinks (Koeva et al., 2004; Šmrz, 2004) Orphan nodes (null graphs) (Čapek, 2012) 4

An artificial hierarchy 5

An artificial hierarchy and specific substructures 6 1 Short cut 2 Heart-shaped substructure 3 Ring 4 Closed subset 5 Dense component 6 Connected roots + 4 substructures Specific substructures = test patterns

Example 1: synset with many roots 7

Example 2: dense component 8

Example 3: „Compound“ pattern 9

Example 4: connected roots Side view Top view 10

Estonian Wordnet iterative evolution Version Noun roots Verb roots Multiple inheritance cases Short cuts Rings Synset with many roots Heart-shaped substructure Dense component “Compound ” pattern The largest closed subset , ,4451,1231, ,057×457 …………………………… , , ,875× , , ,907×218 …………………………… × ×4 11

Statistics of the correction operations Over ten versions of EstWN (during 4 years) 21,911 – removing the hypernymy and hyponymy relations 5,344 – the lexical units in synsets were changed 4,122 – hypernymy and hyponymy relations were replaced by another semantic relation, mainly by near synonymy and fuzzynymy 12

Wordnets in comparison Wordnet Noun roots Verb roots Multiple inheritance cases Short cuts Rings Synset with many roots Heart-shaped substructure Dense component „Compound“ pattern The largest closed subsets Princeton WordNet Version ,453402, ,333×167 Finnish Wordnet Version ,453402, ,334×167 Cornetto Version , ,309621, ,032×589 Polish Wordnet Version , , ,254 5, ,794×4,683 Estonian Wordnet Version x4 13

Summary In this presentation we studied: how to validate semantic hierarchies of wordnet and we proposed to use test patterns which are descriptions of the substructures with the specific nature. To prove the efficiency of test patterns we partially applied these test patterns over 10 versions of EstWN. Instances of different test patterns were extracted by programs of ours and validated by lexicographers. We discovered that the number of multiple inheritance cases decreased during last five versions about 97 procent. 14

Future works Applying test patterns on: other semantic relations other wordnets 15