Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.

Slides:



Advertisements
Similar presentations
CODE/ CODE SWITCHING.
Advertisements

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
The NOUN 1 General characteristics and classification
Statistical NLP: Lecture 3
Verb, Adverb, Preposition, Conjunction, Interjection
Used in place of a noun pronoun.
Example Database English-German Dictionary
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Emerging from the Quagmire Building Expert Systems Technologies for the Social Sciences Robert Wozniak IASSIST 2002 University of Connecticut – 12 June.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Grammar Skills Workshop
‘Afghan Hands’ Dari Grammar Seminar خلاصهَ از دستور زبان دری OCT 14, 2010.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
6th Intex Workshop, Sofia May th Intex Workshop & 10 years of (Silberztein, 1993) Sofia, May 2003.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Daily Grammar Practice
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool) Ján GENČI Technical University of Košice, Slovakia
Parts of Speech Project Language Arts
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Macedonian DELAS – first results Aleksandar Petrovski Tetovo, Macedonia.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Grammar Review Name___________ Title____________ Author _________ Parts of Speech COPY A SENTENCE FROM YOUR BOOK. Label the parts of speech of each word.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
Morphological Analysis of Hungarian in NooJ
8 Parts of Speech Noun Pronoun Adjective Verb Adverb Preposition Conjunction Interjection.
_____________________ Definition Part of Speech (circle one) Picture Antonym (Opposite) Vocab Word Noun Pronoun Adjective Adverb Conjunction Verb Interjection.
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
Wordnet - A lexical database for the English Language.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100.
English Review for Final These are the chapters to review. In Textbook: Chapter 9 Nouns Chapter 10 Pronouns Chapter 11 Adjectives Chapter 12 Verbs Chapter.
Grammar Boot Camp Parts of Speech Challenge
Daily Grammar & Vocabulary Practice
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Daily Grammar & Vocabulary Practice
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
VOCABULARY BUILDING ONE. WORDS ARE A GROUP OF LETTERS WHICH FORM A MEANING.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Category 2 Category 6 Category 3.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
When our vacation ended Piper and Levy climbed up in the tree, and they would not answer their mother. 1. Which answer contains the prepositional phrase.
Writing 2 ENG 221 Norah AlFayez. Lecture Contents Revision of Writing 1. Introduction to basic grammar. Parts of speech. Parts of sentences. Subordinate.
The theory of word classes in modern grammar studies
Parts of Speech Review.
Monday Write out this week's sentence and add capitalization and punctuation including end punctuation, commas, semicolons, apostrophes, underlining, and.
DOMAIN ONTOLOGY DESIGN
Talp Research Center, UPC, Barcelona, Spain
Statistical NLP: Lecture 3
Generating sets of synonyms between languages
PARTS OF SPEECH Nouns Pronouns Verbs Adjectives Adverbs Prepositions
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Cross-language Information Retrieval
Chapter 4 Basics of English Grammar
Certificate III in ESL (Further Studies)
WordNet: A Lexical Database for English
The Eight Parts of Speech
Your New Best Friend: Mr. Dictionary
Monday Write out this week's sentence and add capitalization and punctuation including end punctuation, commas, semicolons, apostrophes, underlining, and.
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
FIRST SEMESTER GRAMMAR
Daily Grammar Practice
Parts of speech.
PREPOSITIONAL PHRASES
Linguistic Essentials
Chapter 4 Basics of English Grammar
Vocabulary/Lexis LEXIS: n., collective, uncountable
Meanings of the voices active: The subject acts. passive:
Presentation transcript:

Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein 8th INTEX / NooJ Workshop, 30 May, 2005

Main research goals To provide a sufficient methodology for the implementation of the natural language semantic relations into the NooJ system: –to create specialized Semantic Dictionaries for English, French and Bulgarian based on WordNet semantic relations; –to provide compete formalization of the inflection for simple and compound words included in the Wn structure.

History The integration of semantic relations into the INTEX system was initially proposed at the sixth INTEX workshop. Later on the idea was advanced into the Joint research RILA project Information retrieval based on semantic relations –LASELDI, Université de Franche-Comté –Department of Computational Linguistics, IBL, Bulgarian Academy of Sciences.

Language resources Bulgarian grammatical dictionary (BGD) – over lemmas and word forms; English WordNet 2.0 – synonymous sets; Bulgarian WordNet (BalkaNet project) – synonymous sets; French WordNet (EuroWordNet project) – synonymous sets; English dictionary – over lemmas (not inflected); French dictionary – extracted with INTEX.

Implementation tasks To transform the format of the BGD into the NooJ standard; To create semantic dictionaries for Bulgarian and English; To associate lemmas from the Bulgarian semantic dictionaries with the corresponding inflection types; To add missing lemmas and inflection types in BGD, if any; To create extensive dictionaries and corresponding inflection types for compounds.

BGD – Information structure design Category information – 6 classes: Noun, Verb, Adjective, Pronoun, Numeral, Others (Adverb, Preposition, Conjunction, Particle, Interjection) ; Paradigmatic information – Personal, Transitive, Perfective, Common, …; Grammatical information – Inflection, Conjugation, Sound alternations, ….

BGD – Grammatical subclasses Nouns - 22 subclasses with respect of their Type (Common, Proper, Singularia tantum, Pluralia tantum) and Gender; Verbs – 32 subclasses with respect of Transitivity, Perfectiveness, and Personality; Adjectives – 2 subclasses; Pronouns – 26 subclasses with respect of their Type and Possessor; Numerals – 6 sunclasses.

BGD – Grammatical types Noun – Number, Definiteness, Counting form, Case, Optional forms – 266 types; Verb – Person, Number, Tense, Mood, Voice, Participles, Gender, Definiteness – 257 types; Adjective – Gender, Number, Definiteness – 30 types; Pronoun – Gender, Person, Number, Definiteness, Case, Clitic, Possessing – 28 types; Numeral – Gender, Number, Definiteness, Approximate form, Male form – 20 types.

BGD – Dictionary format а,ЧА,0ПРИ, 7 sm0, Ok, ‘‘ абсол`ютен, ПРИ, 7smh, Ok, '2RCия‘ `август, С+М, 10sml, Ok, '2RCият‘ авиокомп`ания, С+Ж, 1sf0, Ok, '2RCа‘ австр`ийски, ПРИ, 3sfd, Ok, '2RCата‘ автоб`ус, С+М, 11sn0, Ok, '2RCо‘ автомат`ичен, ПРИ, 7snd, Ok, '2RCото‘ адрес`ирам, Г+Н+Т, 4p0, Ok, '2RCи‘ агит`ирам, Г+Н+Т, 4pd, Ok, '2RCите'

Transforming BGD Perl Script Dictionary Grammatical types Transliteration of labels

NooJ dictionary → aбсол`ютен, ПРИ, 7 aбсолютен,A+FLX=A-7 `август, С+М, 10август,N+M+FLX=N_M-10 авиокомп`ания, С+Ж,1авиокомпания,N+F+FLX=N_F-1 aвстр ` ийски, ПРИ, 3aвстрийски,A+FLX=A-3 автоб`ус, С+М, 11автобус,N+M+FLX=N_M-11 автомат`ичен, ПРИ, 7автоматичен,A+FLX=A-7 адрес`ирам,Г+Н+Т,4адресирам,V+IT+FLX=V_IT-4

NooJ formal descriptions → sm0, Ok, ‘‘ A-7 = /sm0 + smh, Ok, '2RCия‘ ия /smh + sml, Ok, '2RCият‘ ият /sml + sf0, Ok, '2RCа‘ а /sf0 + sfd, Ok, '2RCата‘ ата /sfd + sn0, Ok, '2RCо‘ о /sn0 + snd, Ok, '2RCото‘ ото /snd + p0, Ok, '2RCи‘ и /p0 + pd, Ok, '2RCите‘ ите /pd;

WordNet semantic relations ILRPOS/POSEW2.0BulNet HYPERONYMY N/N V/V NEAR ANTONYMY N/N A/A V/V PART MERONYMY N/N MEMBER MERONYMY N/N PORTION MERONYMY N/N SUBEVENT V/V CAUSES V/V SIMILAR TO A/A V/V VERB GROUP V/V ALSO SEEA/A V/V

Other relations ILRPOS/POSEW2.0BulNet BE IN STATEA/N BG DERIVATIVEN/V DERIVEDA/N PARTICIPLEA/V40156 REGION DOMAINN/N V/N A/N B/N USAGE DOMAINN/N V/N A/N B/N98322 CATEGORY DOMAINN/N V/N A/N B/N

Selected relations Synonymy (reflexive, symmetric, and transitive relation of equivalence); Hypernymy (inverse, asymmetric, and transitive relation between synonym sets), Meronymy (inverse, asymmetric, and transitive relation between synonym sets): Part meronymy; Member meronymy; Portion meronymy.

Selected relations Similar to (symmetric relation between similar adjectival synsets); Verb group (symmetric relation between semantically related verb synsets); Also see (symmetric relation between synsets - verbs or adjectives, that are close in meaning); Category domain (asymmetric extralinguistic relation between synsets denoting a concept and the sphere of knowledge it belongs to).

DELAF semantic dictionaries These dictionaries consist of pairs of literals defined for the corresponding semantic relation: – car,automobile.N – auto,automibile.N All possible combinations between literals in the given synsets are listed: – car,automobile.N – cars,automobile.N – auto,automibile.N – autos,automibile.N

NooJ Semantic dictionaries Synonymy relation ‘a plant consisting of buildings with facilities for manufacturing’ фабрика,N+FLX=ENG n предпрятие,N+FLX=ENG n factory,N+FLX=ENG n mill,N+FLX=ENG n manufacturing plant,N+FLX=ENG n manufactory,N+FLX=ENG n

NooJ Semantic dictionaries Hypernymy relation ‘the organized action of making of goods and services for sale’ производство,N+FLX=ENG n промишленост,N+FLX=ENG n индустрия,N+FLX=ENG n production,N+FLX=ENG n industry,N+FLX=ENG n manufacture,N+FLX=ENG n

Inflecting wordnet... otstranqwam (to remove) … ГНТ remove something concrete, as by lifting, pushing, taking off, etc. or remove something abstract...

NooJ Semantic descriptions ‘the organized action of making of goods and services for sale’ ENG n = /Hs0 + то/Hsd + а /Hp0 + ата /Hpd + мишленост /Ss0 + мишлеността /Ssd + мишлености /Sp0 + мишленостите /Spd + индустрия/Ss0 + индустрията/Ssd + индустрии/Sp0 + индустриите/Spd; ENG n = /Hs + industry/Ss + industries/Sp0+ manifactures/Ss + manifactures/Sp;

After the nice solutions Lemmas which are not included in the BGD: –Lemmas classification to existing inflection types; –Formal description of new inflection types –Literals in Latin; –Validating WordNet. Semantic ambiguity - literals with two inflectional descriptions in BGD; Compound words –Formal description of inflection types; –Compounds classification.

NooJ Compound semantic descriptions ENG n = /Ss0 + та/Ssd + и (и/p0 +ите/pd) + завод ен/Ss0 + завод ния/Ssh + завод ният/Ssl + заводи ни/Sа0 + заводи ните/Sа0 + рафинерия/Ss0 + рафинерия та/Ssd + рафинерии и/Sp0 + рафинерии ите/Spd;

Applications of the Semantic Dictionaries Information retrieval by means of semantic equivalence with synonymy dictionaries; Information retrieval by means of semantic specification with hyperonymy and meronymy dictionaries; Information retrieval by means of similarity; Information retrieval by means thematic domains affiliations; Validation WordNet structure against its completeness and consistency.

Future directions Extensions and enhancements of the semantic dictionaries by means of: –Extension of the dictionaries coverage; –Addition of other semantic relations; –Inclusion of additional information to the entries. Integration of multilingual semantic extraction with NooJ using the Inter-Lingual-Index relation.