LACONEC A Large-scale Multilingual Semantics-based Dictionary

Slides:



Advertisements
Similar presentations
Translation in the 21 st Century Impacts of MT and social media on language services.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
COMMERCIAL METADATA APPROACH BY ANDREA DE POLO (ALINARI)
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Cross-Lingual IR Salim Roukos IBM T. J. Watson Research Center 9/11/02.
HLT Research and Development for Baltic Languages in Tilde Andrejs Vasiļjevs, Raivis Skadiņš Tilde Riga, October 27, 2004.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Libraries and Institutional Content Management Systems
An innovative platform to allow translation and indexing of internet sites Localization World
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
The ECHA-term project Multilingual REACH and CLP Terminology Dieter Rummel, Translation Centre for the Bodies of the EU Luxembourg EAFT - Oslo, 11 October.
WordNet CMS Presented By: Konkani NLP team Goa University.
BILINGUAL CO-TRAINING FOR MONOLINGUAL HYPONYMY-RELATION ACQUISITION Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa ACL 2009.
E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
1 3. Computing System Fundamentals 3.1 Language Translators.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
ICT-enabled Agricultural Science for Development Scenarios, Opportunities, Issues by ICTs transforming agricultural science, research & technology generation.
Worldwide Lexicon Brian McConnell May, WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.
Multilingual Search Shibamouli Lahiri
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Removing the Language Barrier Machine Translation And Digital Libraries.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Learning English – a quick tour
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Introduction to Machine Translation
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
8. Translation resources
Computational and Statistical Methods for Corpus Analysis: Overview
Information Retrieval and Web Search
HLT Research and Development for Baltic Languages in Tilde
Basque language: is IT right on?

Information Retrieval and Web Search
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Social Knowledge Mining
Multilingual Information Access in a Digital Library
Cloud Helps Company Scale to Demand for Growing Healthcare Provider Field MINI-CASE STUDY “Microsoft Azure gives us the opportunity to focus on the task.
Dictionaries First Grade Unit 3 INTRO
DBpedia 2014 Liang Zheng 9.22.
CSE 635 Multimedia Information Retrieval
MATERIAL Resources for Cross-Lingual Information Retrieval
Introduction to Machine Translation
Computational Linguistics: New Vistas
The Translation Management System for Global Enterprises
C SC 620 Advanced Topics in Natural Language Processing
Web Mining Research: A Survey
Boost Productivity with Azure-Powered Personal E-Assistant for Managing and Calling Contacts “Pobuca uses Microsoft Azure cloud services to create Pobuca.
Open Source SUMMA Platform
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Exploring Cognitive Services
Presentation transcript:

LACONEC A Large-scale Multilingual Semantics-based Dictionary Lê Khánh Hùng, Trần Cảnh, Võ Công Minh, Lê Hồng Minh

Content Purposes of LACONEC General Design Functions Conclusion

Purposes Cross-lingual lookup and reference Preserving languages at risk A Multilingual Knowledge Base for NLP’s Tasks

Multilingual Reference and MT Google Translate & Dictionary - 91 languages Babylon Dictionary - 75 languages Yandex Translate - 63 languages Microsoft Bing Translator - 51 languages Vietnam - more than 100 languages The World - more than 7,100 languages,

Many Languages are dying Population range Living languages Number of speakers Count Percent Cumulative Total >100,000,000 8 0.1 0.1% 2,529,403,578 40.20547 40.20547% 10,000,000 to 99,999,999 82 1.2 1.3% 2,480,078,977 39.42144 79.62691% 1,000,000 to 9,999,999 304 4.3 5.5% 915,659,448 14.55462 94.18154% 100,000 to 999,999 943 13.3 18.8% 296,136,843 4.70717 98.88870% 10,000 to 99,999 1,822 25.7 44.5% 61,802,734 0.98237 99.87107% 1,000 to 9,999 1,982 27.9 72.4% 7,633,408 0.12133 99.99241% 100 to 999 1,065 15.0 87.4% 464,299 0.00738 99.99979% 10 to 99 338 4.8 92.1% 12,777 0.00020 99.99999% 1 to 9 140 2.0 94.1% 560 0.00001 100.00000% 206 2.9 97.0% 0.00000 Unknown 212 3.0 100.0% Totals 7,102 100.0 6,291,192,624 100.00000 Distribution of world languages by number of first-language speakers (https://www.ethnologue.com/statistics/size)

Languages in Vietnam 5 Dying Languages 37 Languages in Trouble 50 Vigorous Languages 16 Developing Languages

NLP Applications Machine Translation Proofing Multilingual Full-text Content Search Text Mining Answering OCR, ASR Information retrieval

Limitations of Current MT Systems The small number of languages Low quality The Absence of reliable reference dictionary No Semantic Processing No Language Understanding

The overall design of LACONEC Problems to be solved Multilingual Solution Cloud computing

Problems Huge Database Dayly Changes of Lexicon Dictionary’s Quality Control

Bilingual Vietnamese English French Japanese Thai

Semantics-based Solution Vietnamese Japanese Meaning Thai English French

require as useful, just, or proper Example require as useful, just, or proper involve exiger quy định 要求 postulate ต้องการ nécessiter require đề nghị 求める demander ปรารถนา demand yêu cầu 要する requérir necessitate วางเป้าหมาย đòi hỏi 要求+する take

Functions of LACONEC Look up Editing Knowledge and User Rating Future Developments

Lookup Selection of languages Lookup monolingual Lookup bilingual, multilingual Lookup on the web Lookup in applications Windows App Android App [iOS, MAC]

Select Languages

Unilingual Lookup

Bilingual/Multilingual Lookup

ANDROID App.

Web version http://laconec.com

Editing Editing directly in lookup window Knowledge Editor Creating Relationships Lexical Relations Semantic Relations Pragmatic Relations Adding new languages

Quick Editing

Knowledge Editor

Knowledge Ranking Automatic Data Gathering WordNets Open source Bilingual Dictionaries News, Encyclopaedias, Multilingual Sites Update manually from the Community Automatic Content Ranking Remembering users’ actions User Scoring

User Scoring

Future Developments Dictionary Expansion More Languages Multimedia External Links  Multilingual corpus Interlingual Machine Translation & Proofing Multilingual Full-text Content Search

Conclusion http://laconec.com LACONEC can be used as: A Multilingual General Purpose Dictionary A Database for NLP’s Tasks For Preserving Languages-at-risk around the world An Environment for Research in Contrastive Linguistics

 Thank you!