Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
S-Match: an Algorithm and an Implementation of Semantic Matching Pavel Shvaiko 1 st European Semantic Web Symposium, 11 May 2004, Crete, Greece paper with.
Honolulu, 23 rd of May 2011PESOS Evaluating the Compatibility of Conversational Service Interactions Sam Guinea and Paola Spoletini.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Lexical Semantics and Word Senses Hongning Wang
Topology Control of Multihop Wireless Networks Using Transmit Power Adjustment Paper By : Ram Ramanathan, Regina Resales-Hain Instructor : Dr Yingshu Li.
Graph Visualization CSC4170 Web Intelligence and Social Computing Tutorial 2 Tutor: Tom Chao Zhou
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
Generating topic chains and topic views: Experiments using GermaNet Irene Cramer, Marc Finthammer, and Angelika Storrer Faculty.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Semantics. Philosophical Issues Context Context Reference Reference Deixis Deixis Structuralism Structuralism Linguistic Determinism Linguistic Determinism.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
1. 2 Content WSK Online is a new online database of specialized dictionaries covering all the major areas of linguistics and communication science: Biannual.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Integrating Greek and English Digital Resources Sean Boisen Computer Assisted Research Section, S Slides at:
Course G Web Search Engines 3/9/2011 Wei Xu
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Tutorial 1: Getting Started with Adobe Dreamweaver CS4.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.
Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.
Group-oriented Modelling Tools with Heterogeneous Semantics Niels Pinkwart COLLIDE Research Group University of Duisburg, Germany.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
An Applied Ontological Approach to Computational Semantics Sam Zhang.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
Lecture 19 Word Meanings II Topics Description Logic III Overview of MeaningReadings: Text Chapter 189NLTK book Chapter 10 March 27, 2013 CSCE 771 Natural.
Graphs Basic properties.
Graphs Upon completion you will be able to:
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
LE:NOTRE Thematic Network Project. Overview Glossary Database 2004 current english definition current english word alternative english definition associated.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Lexical Semantics and Word Senses Hongning Wang
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
Computer Applications
Automatically Extending NE coverage of Arabic WordNet using Wikipedia
Web News Sentence Searching Using Linguistic Graph Similarity
Exploring and Navigating: Tools for GermaNet
Element Level Semantic Matching

ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
What is Linguistics? The scientific study of human language
WordNet: A Lexical Database for English
Presented by: Prof. Ali Jaoua
Graphs Chapter 11 Objectives Upon completion you will be able to:
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
A method for WSD on Unrestricted Text
Lecture 19 Word Meanings II
Dynamic Word Sense Disambiguation with Semantic Similarity
Presentation transcript:

Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network

Page 2 SenDiS SenDiS operates on a specific lexicon network (LexNet) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet (manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet (WordNet tagged glosses, as of 2008) preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet Outline

Page 3 SenDiS o hypernyms o hyponyms o similar to o has part o synonyms o antonyms o holonyms o meronyms o coordinate terms o troponyms o entailment Semantic/Lexical Relations

Page 4 SenDiS An excerpt of the WordNet semantic network * * Navigli, R Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009) Semantic/Lexical relations: WordNet

Page 5 SenDiS Semantic/Lexical relations: GRAALAN Tail of relationHead of relationRelation type {synonym } Bidirectional, symmetric {antonym } Bidirectional, symmetric {paronym} Bidirectional, symmetric { hypernym }{hyponym}Bidirectional, asymmetric {connotation}-Unidirectional {holonym}{meronym}Bidirectional, asymmetric {homonym} Bidirectional, symmetric {heteronym} Bidirectional, symmetric {homophone} Bidirectional, symmetric {diminutive of}{diminutive by}Bidirectional, asymmetric {augmentative of}{augmentative by}Bidirectional, asymmetric {extension from}{extension into}Bidirectional, asymmetric {reduction from}{reduction into}Bidirectional, asymmetric {generalization from}{generalization into}Bidirectional, asymmetric {specialization from}{specialization into}Bidirectional, asymmetric {figurative of}{literal for}Bidirectional, asymmetric {reference to}-Unidirectional {derived from}{derived into}Bidirectional, asymmetric {back formatted form}{back formats}Bidirectional, asymmetric {abstract for}{concretized from}Bidirectional, asymmetric {with variant}{variant for}Bidirectional, asymmetric

Page 6 SenDiS manually annotating the glosses from a lexicon (using a specific tool that can ease the process) importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses Obtaining a SenDiS LexNet

Page 7 SenDiS o implied a significant effort, usually measured in months, involving several trained linguists o using a specialized collaborative tool (BuildLNTool – Build Lexicon Network Tool) o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language Creating the SenDiS LexNet

Page 8 SenDiS o BuildLNTool (Build Lexicon Network Tool) provides: a visual and effective mechanism to manually annotate the lexicon glosses a synchronized overview of the already created relations a browsing mechanism for inspecting the already tagged glosses and relations BuildLNTool

Page 9 SenDiS “Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees” “Root & Leaf Meanings” Messages and progress BuildLNTool - Sections

Page 10 SenDiS o “Lemmas & MWEs”: list of lexicon entries o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network o “Lemma/MWE Info”: current lexicon entry being analyzed o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net o section for messages and progress BuildLNTool – Sections II

Page 11 SenDiS selection of lexicon entry type selection of unfinished lexicon entries filter selection of viewing interval text filter lexicon entry text lexicon entry status BuildLNTool – Lemmas & MWEs

Page 12 SenDiS double click BuildLNTool – Selection of a current lexicon entry

Page 13 SenDiS lexicon entry textmorphologic interpretation list of meanings filters meaning/gloss fully tagged meaning/gloss partially tagged meaning/gloss not tagged BuildLNTool – Browsing the meanings of the current lexicon entry

Page 14 SenDiS double click BuildLNTool – Selection of a current meaning for tagging

Page 15 SenDiS unrecognized gloss constituent ‘Enter’ BuildLNTool – Gloss constituent without interpretations

Page 16 SenDiS Default setting: Medium BuildLNTool – Degrees of relevance (in gloss context)

Page 17 SenDiS ‘Strong’ tokens ‘Medium’ tokens ‘Weak’ tokens Ignored (X) tokens BuildLNTool – Degrees of relevance II

Page 18 SenDiS Unsaved annotations Saved annotations BuildLNTool – Gloss tagging

Page 19 SenDiS view of meaning tagging tree selection of constituent / group of gloss constituents set / modify relevance degree edit text of gloss constituent select / modify the sense for the gloss constituent further annotate meaning / save annotations chose the next meaning further on save annotations current gloss constituent without sense interpretations BuildLNTool – Gloss tagging protocol

Page 20 SenDiS LexNets AllTokensOperatedTokensOpTokensValidOpTokensRelatedOpTokens V & R LL_Romanian - 99% 1,528,8191,191,942691,010720,420686,210 LL_English - 2% 36,82830,35018,52317,64117,505 LexNets GlossesTagged GlossesTargeted GlossesTags Density LL_Romanian - 99% 130,087118,53658, LL_English - 2% 259,6513,4967, Built LexNets for Romanian and English

Page 21 SenDiS o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples) o the synsets were split and transformed in to a classical lexicon format o the lexicon network imported: LexNets GlossesTagged GlossesTargeted GlossesTags Density WordNet 206,941206,93859, WordNet_extendedGlosses 206,941 83, LexNets AllTokensOperatedTokensOpTokensValidOpTokensRelatedOpTokens V & R WordNet 2,394,190 2,394,189834,803 WordNet_extendedGlosses 3,114,968 3,114,967936,397 Imported WordNet tagged glosses

Page 22 SenDiS o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and vertices over 1,000,000 edges / arcs o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices Ordering a SenDiS LexNet

Page 23 SenDiS e9e9 e4e4 e5e5 e6e6 e7e7 e8e8 e1e1 e2e2 e3e3 A minimal lexicon net in the original form Unordered LexNet

Page 24 SenDiS V e 11 e1e1 e2e2 e3e3 e4e4 e5e5 e6e6 e7e7 e8e8 e9e9 10 e B The same minimal lexicon net leveled Ordered (leveled) LexNet

Page 25 SenDiS LNsVerticesEdges In OLN AlgorithmEdges OutEdges RemovedLevelsTime (s) wn202,361834,803Patentv1821,04813, wn_ex205,188936,397Patentv1936,39774, ro_48%72,067318,741Patentv1308,59210, ro_78%100,175523,192Patentv1504,21018, ro_99%120,472686,784Patentv1659,03027, ro_48%130,407318,741NT_eades308,33410, ro_99%130,099686,784NT_eades654,02532, wn_ex206,941936,397NT_eades904,99231,405461,315 Results on leveling experimental LexNets