Generating sets of synonyms between languages

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

Building Wordnets Piek Vossen, Irion Technologies.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Survey Analysis An attempt to develop an Intuition of Semantic Relatedness.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
Marek Maziarz, Maciej Piasecki, Ewa Rudnicka, Stanisław Szpakowicz G4.19 Research Group Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Ewa Rudnicka, Marek Maziarz, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
The Loanword Typology Project Measuring the Borrowability of Word Meanings Uri Tadmor and Martin Haspelmath Max Planck Institute for Evolutionary Anthropology.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
From Semantic Similarity to Semantic Relations Georgeta Bordea, November 25 Based on a talk by Alessandro Lenci titled “Will DS ever become Semantic?”,
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Course G Web Search Engines 3/9/2011 Wei Xu
Indo WordNet A WordNet for Hindi
Antonym Creation Tool Presented By Thapar University WordNet Development Team.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
RDB2Onto: Approach for creating semantic metadata from relational database data Martin Šeleng, Michal Laclavík, Zoltán Balogh, Ladislav Hluchý Institute.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool) Ján GENČI Technical University of Košice, Slovakia
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.
WordNet: Connecting words and concepts Peng.Huang.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
23- November-091 WordNet and Extended WordNet Sriram Rajaraman.
Wordnet - A lexical database for the English Language.
Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.
WordNet::Similarity Measuring the Relatedness of Concepts Yue Wang Department of Computer Science.
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. DENOTATIVE.
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida HealthInsight Workshop, Oslo, Norway.
Extending Princeton WordNet withcompositional semantics Luchezar Jackov Institute for Bulgarian Language Bulgarian Academy of Sciences.
Introduction to Computational Linguisitics The Lexicon.
Ontologies Introduction to Computational Linguistics – 23 March 2016.
Talp Research Center, UPC, Barcelona, Spain
తెలుగు పదమాలిక TELUGU WORDNET A Lexical Database for Telugu.
Statistical NLP: Lecture 3
Ontology Engineering: from Cognitive Science to the Semantic Web
LEXICAL RELATIONS IN DISCOURSE
ConceptNet: Search ontology classes via human senses ---A proposal
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Tagging and Statistically Translating Latin Sentences
WordNet: A Lexical Database for English
Introduction to Ontologies
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
Entailment summary Possible to predict when some sentences entail other sentences. Depends on factive matrix verb hypernym vs. hyponym type of sentence:
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Knowledge Representation for Natural Language Understanding
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Automatic generation of UW Dictionary through WordNet
Presentation transcript:

Generating sets of synonyms between languages Ján Genči¹, Ondrej Dzurjuv¹, Radovan Garabík² ¹Department of Computers and Informatics, Technical University of Košice ²Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava

Introduction Besides classical bi-/multi-lingual dictionaries on-line, there are EuroWordNet or Global WordNet exist Nor EuroWordNet neither Global WordNet cover the Slovak language There were several works dealing with automatic creation of synsets in the past A requirement to create computer application for building Slovak synsets 2

Our goals Design new methods of synsets generation Evaluate the quality of synset generation Design and implementation of computer application for assisted synset editing 3

WordNet Project WordNet started in 1990 at Princeton Univerzity Lexical database of English language; nouns, pronouns, verbs and adverbs Two base properties of WordNet Words are organized into groups – sets of synonyms - synsets Synsets are inteconnected 4

Relationships in WordNet {motorové vozidlo} {nemotorové vozidlo} {nárazník} {sedan} {autobus} {dvere} ... {motor} antonymy hypernymy hyponymy holonymy meronymy {auto; automobil} 5

Method A Synonymy among words in synsets – Set of words from synset – translation – result 6 6

{kind; sort; form; variety} -> kind -> druh, rod, kategória sort -> druh, akosť, trieda, typ, forma, chlap form -> forma, tvar, podoba, formulár, blanketa, formula variety -> rozmanitosť, odroda, výber, druh, rad, množstvo, mnohotvárnosť, rôznosť ⋂ {druh, forma}

Method B Translation of univocal words - Set of words from synset - Subset of univocal words - translation - result 8

kind – 1 sense -> druh, rod, kategória sort – 4 senses form – 16 senses variety – 6 senses {druh, rod, kategória}

Method C It uses hyponymy and hyperonymy relationship for synset generation - Words from synset - Set of hyponym and hyperonym related to source synset - translation - result 10

{kind; sort; form; variety} -> {hyper,hypo}nyms: {category, type, brand, genus, species} {kind; sort; form; variety} -> druh, rod, kategória, akosť, trieda, typ, forma, chlap, tvar, podoba, formulár, blanketa, formula, rozmanitosť, odroda, výber, rad, množstvo, mnohotvárnosť, rôznosť {category, type, brand, genus, species} -> kategória, skupina, trieda, typ, symbol, litera, druh, odroda, značka, označenie, známka, kvalita, akosť, ohorok, rod, forma, tvar ⋂ {druh; rod; kategória; akosť; trieda; typ; forma; tvar; odroda}

Method D - source synset hypernyms - source synset hyponyms It uses hyponymy and hyperonymy relationship for synset generation as source synsets - source synset hypernyms - source synset hyponyms - translation - target synset 12

{kind; sort; form; variety}: hypernym: {category} -> kategória, skupina, trieda hyponym: {type, brand, genus, species} -> typ, symbol, litera, druh, odroda, značka, označenie, známka, kvalita, akosť, ohorok, druh, rod, skupina, trieda, forma, tvar ⋂ {skupina, trieda}

Results The WordNet contains 117659 synsets, Slovak equivalents were generated for 34.4 % (40521) of English synsets Method A: 8.7 % Method B: 25.7 % Method C: 12 % Method D: 1.4 % 14

Results for frequently used words For set of 300 more frequently used English words there are 1709 synsets in WordNet, Slovak equivalents were produced for 55.4 % (946) synsets Method A: 32.7 % Method B: 23.6 % Method C: 29.6 % Method D: 6.6 % 15

Inspection of Slovak synsets Part of speech and semantics of generated Slovak synsets were inspected. 16

Results New computer application for manual synset translation. Application was “fed” by generated data About 60% of synsets were generated About 75% of generated synsets were evaluated as “not incorrect“ 17

Conclusion We were able to produce Slovak synsets for 34.4% of English synsets. Manual inspection was required. Process of synset generation is influenced by many factors – translation dictionaries used, frequency of words, number of words in synset. Application – used to create en-sk-pl-de-lt dictionary (with ontology relations) 18

Part of speech Nouns Adjectiv es Verbs Adverb s Totals Unique strings 12941 3321 1150 982 18394 Strings with one sense 10239 2305 953 702 14199 Word-sense pairs 18740 5551 1400 1505 27196 Synsets 9317 2329 830 549 13025 Synsets of one word 3916 773 426 141 5256

Thank you for your attention 21