SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
1 Egyptian Ministry of Communications and Information Technology Research and Development Centers of Excellence Initiative Data Mining and Computer Modeling.
Statistical NLP: Lecture 3
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Second Language Acquisition and Real World Applications Alessandro Benati (Director of CAROLE, University of Greenwich, UK) Making.
Methodology Conceptual Database Design
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Overview of the Database Development Process
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
NOOJ 0.1 Max Silberztein Université de Franche-Comté 6th INTEX Workshop Sofia, Bulgaria, May 2003.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Methodology - Conceptual Database Design Transparencies
Methodology Conceptual Databases Design
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
A new lexical module for NooJ Max Silberztein LASELDI, Université de Franche Comté.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Introducing MorphoLogic to LIRICS Gábor Prószéky MorphoLogic Pázmány Péter Catholic University Faculty.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
Using Short-Answer Format Questions for an English Grammar Tutoring System Conceptualization & Research Planning Jonggun Gim.
Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.
Methodology - Conceptual Database Design
FF & FER INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009 Comparative Analysis of Automatic Term and Collocation Extraction Sanja.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
ICS 482: Natural language Processing Pre-introduction
Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Dictionary graphs Duško Vitas University of Belgrade, Faculty of Mathematics.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Removing the Language Barrier Machine Translation And Digital Libraries.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Lexicons, Concept Networks, and Ontologies
Statistical NLP: Lecture 3
LACONEC A Large-scale Multilingual Semantics-based Dictionary
--Mengxue Zhang, Qingyang Li
European Network of e-Lexicography
Token generation - stemming
Presentation transcript:

SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV

The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation programme) Objectives:  Reliable (exhaustive and precise) multilingual lexical resources for a variety of purposes such as machine translation, information extraction and information retrieval, etc.

Prerequisites for carrying out such task:  Large-coverage linguistic resources such as comprehensive multilingual and monolingual dictionaries (designed according to certain criteria and stored in a format such as would ensure accessibility and manageability).  Ancillary (esp. disambiguation and recognition) resources.  An appropriate system for the storage and management of multilingual linguistic data, as well as the implementation of task-related procedures.

Methodology  Systematization and unification of the existing INTEX resources as well as their conversion in compatibility with the established NooJ format.  Expansion and enhancement of the resources aiming at ever higher precision and recall parameters.  Creation of various new resources using the experience, resources and tools developed along the first two lines.

Conversion of the lexical resources in DELA format to the.nod format:  Conversion of the BGD (Bulgarian Grammar Dictionary) 1 automata underlying the DELAF dictionaries to the.flx automata description.  Creation of automata for the existing dictionaries of compounds since they have been stored in DELACF format. Koeva, S. Grammar Dictionary of Bulgarian. Description of the concept of organization of the linguistic data. Bulgarian Language 6, pp

Conversion of the INTEX graphs into the NooJ format:  Preprocessing graphs:  Compound conjunctions graphs.  Abbreviations and elision graphs (with possible treatment in a dictionary), etc.  Recognition graphs developed along tasks involving automatic treatment of syntactic phenomena.

Expanding the compound words dictionaries with new entries in a systematic way (covering large and diverse areas of the lexicon`s inventory of compounds).  Establishing the resources to be used:  The available specialised on-line dictionaries  The lexical-semantic data base - the Bulgarian WordNet.  Developing automata for the inflection types in the established format.

Specifics:  Restricted paradigms for certain types of compounds (esp. domain-specific terms) – pluralia tantum, singularia tantum, count forms, plural endings.  Invariable forms or forms that are not established in the Bulgarian language, esp. ones introduced in the language as transcription of mainly English terms, etc. (hedge, swap, bear market, bull market, etc.)

Compounds extraction from the above mentioned resources (enhanced complementarily):  Extraction of thematic compound dictionaries of terms, named entities, other compound lexemes (using semantic relations encoded in the data base and employing inheritance to the task).  Employing NooJ as environment for compounds extraction, processing of the obtained material with the already designed dictionaries and encoding of the appropriate candidates among the unrecognized tokens.

Dictionaries generation enhancement  Exploring large data bases and spotting different head words inflection types using the existing automata:  Using chiefly Bulgarian WordNet where head words of compounds are marked unambiguously.  Using simple syntactic grammars (identifying NPs) to spot head words in the available domain specific dictionaries of concepts and terms (more comprehensive with regard to the coverage of types of inflection).

Recognition enhancement  Development of morphological grammars embracing certain classes of words not present currently in any dictionary, provided the source words are in the dictionary:  Personal feminine nouns приятел (friend) - приятелка (girl friend)  Diminutive nouns – детенце (a small child), кученце (a small dog), etc.  Verbal nouns, etc.

Present day and future directions:  Information retrieval, machine translation, etc.  Facilitating linguistic tasks by supplying the prerequisites - large resources as input data – for the exploration of linguistic phenomena, validation of linguistic hypotheses on language material.  Education (facilitating the acquisition of knowledge and skills in NLP)