 Asian WordNet: Development and Service in Collaborative Approach Virach Sornlertlamvanich Thai Computational Linguistics Laboratory (TCL), NICT, and.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Digital Citizenship Pledge
GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information.
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
Proposal to be considered at the OneGeology Workshop Linguistic base of the OneGeology project Oleg Petrov Grigory Brekhov Evgeny Kiselev Viktor Snezhko.
Evolution of Computer Terminology Translations with the SPOT Dictionary Jiri Hynek, Premek Brada Department of Computer Science & Engineering Faculty of.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Multilingual multimedia thesaurus for conservation and restoration collaborative networked model of construction Lucijana Leoni University of Dubrovnik.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Open Statistics: Envisioning a Statistical Knowledge Network Ben Shneiderman Founding Director ( ), Human-Computer Interaction.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Quick Start Guide Version 1.0. Focused around 14 major areas of engineering, AccessEngineering features a new taxonomy book view offering comprehensive.
Selection & Evaluation of Information Sources and Services Dr. Dania Bilal IS 530 Fall 2009.
4. The Historical Thesaurus. The Historical Thesaurus is a semantic index of the contents of the OED…
English Word Origins Grade 3 Middle School (US 9 th Grade) Advanced English Pablo Sherman The etymology of language.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
1 e-Learning Movement in Thailand and Proposal for Multi-lingual e-Learning Development Virach Sornlertlamvanich 1 Pornchai Tummarattananont 2 1 Thai Computational.
Linked data the next network?. The Web of documents is for people The Web of data is for computers The Web of documents is difficult for computers to.
NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool) Ján GENČI Technical University of Košice, Slovakia
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
Towards an Intelligent Multilingual Keyboard System Tanapong Potipiti, Virach Sornlertlamvanich, Kanokwut Thanadkran Information Research and Development.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
Ontology-based information retrieval of scientific information Natalia V. Loukachevitch Laboratory of Information Resources Analysis Research Computing.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
PAN Localization, Jan 12-16, 2009, Novotel, Vientiane, Lao PDR Language Resource and Language Technology Virach Sornlertlamvanich NECTEC, Thailand TCL,
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Grade 8 – Writing Standards Text Types and Purposes (1b) Write arguments to support claims with clear reasons and relevant evidence. Support claim(s) with.
ALRC Report Virach Sornlertlamvanich Chair of ALRC, AFNLP
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
What have we learned?. What is a database? An organized collection of related data.
ADD and SNLP in Thailand Virach Sornlertlamvanich Thai Computational Linguistics Lab. (TCL), NICT Asia Research Center, Thailand
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
The study on the impact of the promulgation of English language as Thai’s second language Virach Sornlertlamvanich Director Information Research and Development.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Assignment Examples of Portfolios using wikispaces
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
Hitoshi ISAHARA National Institute of Information and Communications Technology (NICT) Sustainability of the work and PAN L10n network: Vision Beyond 2010.
Removing the Language Barrier Machine Translation And Digital Libraries.
Teaching English with Technology. A little bit of history…. Web – 1970: Tape recorders, laboratories – 1970: Tape recorders, laboratories.
Chapter 6 Building Vocabulary This multimedia product and its contents are protected under copyright law. The following are prohibited by law: –any public.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Global Rangelands Data Entry Guidelines March 23, 2015.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
Thai AGROVOC Ontology Base for Agricultural Information Retrieval

LACONEC A Large-scale Multilingual Semantics-based Dictionary
Introduction to DSC ClassView
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Cross-language Information Retrieval
WordNet: A Lexical Database for English
PCT Terminology and WIPO Pearl
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
Recursive Discussions
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

 Asian WordNet: Development and Service in Collaborative Approach Virach Sornlertlamvanich Thai Computational Linguistics Laboratory (TCL), NICT, and National Electronics and Computer Technology Center (NECTEC), Thailand The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Motivation  Need of a computational ontology  Quick start approach  Online collective environment  Cross language web service The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Approaches  Asian WordNet Development  Use of the existing bilingual dictionaries  Synset assignment  KUI for collaborative editing  WNMS (WordNet Management System)  Distributed WordNet service  Service for cross language WordNet retrieval The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India. Asian WordNet Development GWN AWN Applications Dictionary Ontology CL-Search MT Summarization IE/IR …. KUI Lookup Discussion Addition Correction Voting Translation WN merged-WN X-English Thai-English X-English Indonesian -English Jan 31- Feb 4, 2010

Synset Assignment (CS=4) Example: L0: เป้าหมาย E0: aim E1: target S0: purpose, intent, intention, aim, design S1: aim, object, objective, target S2: aim Accept the Synset that includes more than one English Equivalent with confidence score of 4. L0L0 E0E0 S0S0  S1S1  E1E1  S2S2  The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Synset Assignment (CS=3) Example: L0: จ้อง L1: เพ่งมอง E0: stare E1: gaze S0: stare S1: gaze, stare Synonym Accept the Synset that includes more than one English Equivalent from the synonym of the target language with confidence score of 3. L0L0 E0E0 S0S0  S1S1  E1E1  S2S2  L1L1 The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Synset Assignment (CS=2) Example: L0: สูติแพทย์ E0: obstetrician S0: obstetrician, accoucheur Accept the only Synset that includes the English Equivalent with confidence score of 2. L0L0 E0E0 S0S0  The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Synset Assignment (CS=1) Example: L0: ช่อง E0: hole E1: canal S0: hole, hollow S1: hole, trap, cakehole, maw, yap, gap S2: canal, duct, epithelial duct, channel Accept more than one Synset that includes each of the English Equivalent with confidence score of 1. L0L0 E0E0 S0S0  S1S1  E1E1 S2S2  The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Participation (Translate)  Input a word to search  Input a translated word, and select degree of confidence  Input comment or memo if have  Delete Jan 31- Feb 4, 2010The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India

Participation (Vote)  Read the comment or memo  Vote vote upvote down Jan 31- Feb 4, 2010The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India. 2 1

WNMS (WordNet Management System) Jan 31- Feb 4, 2010

The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India. Distributed WordNet Service  Distribute the WordNet service node  Service node can be locally maintained  Synset ID (or Synset Offset) is the key to link between nodes Jan 31- Feb 4, 2010

Representation of Synset Translation The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Types of Services ‘sense’  Thai Sense (Get word translation by POS and SYNSET_OFFSET) Service URI : Service Name : sense Parameter : pos = PartOfSpeech {n,v,r,s}, synset_offset is an English Princeton WordNet v.3.0 offset, represented in 8 digits The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Types of Services ‘dictionary’  E-Dictionary (Get word translation by word entry) Service URI : d Service Name : dictionary Parameter : type_of_dict = {en2th, th2en}, search_word is a word you want to search The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Types of Services  Auto complete (Get a list of words existing in WordNet by prefix auto completion) Service URI : Service Name : autocomplete Parameter : language = {en,th}, search_word is a word you want to get autocomplete (Result:limit 50 records found)  WN-Browser (Browse WordNet and its semantic relations) Service URI : Service Name : browse Parameter : language = {en,th}, search_word is a word you want to get all semantic relations The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Asian WordNet (  Asian WordNet  Visualization of Asian WordNet  Function  Cross language visualization  3 modes of visualization  Progress  Thai  Lao  Japanese  Korean  Myanmar  Indonesian  Vietnamese  Mongolian 2283  Bengali 1775  Sinhala 117  Collaboration  TCL  ADD members English->Japanese Thai->English Thai->Indonesian The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Guideline in WordNet Translation  Word entry must be translated into the appropriate WORD(s) by avoiding phrase and meaning explanation.  Words in a Synset must be interchangeable. The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Translational Issues  There are many cases that a gloss need to be expressed in a phrase or explanation, especially in the case of technical terms and scientific vocabulary. Ex.Chaperon POSNoun Synsetchaperon, chaperone Glossone who accompanies and supervises a young woman or gatherings of young people Thai ผู้ตามควบคุมหญิงสาว  These concepts are not general for Thai language The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Translational Issues (cont.)  A gloss can be expressed by two or more Thai words. These words have the core meaning but occur in different context. Should it be divided into more specific concept? Ex.Appear POSVerb Synsetappear, come out Glossbe issued or published; "Did your latest book appear yet?"; "The new Woody Allen film hasn’t come out yet” ThaiT1 = ตีพิมพ์ ; T2 = ออกฉาย  T1 occurs in the context of printed matter  T2 occurs in the context of film or movie The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010

Conclusion and Future Work  Asian WordNet Community  Language resource conversion and alignment  Language technology sharing  Collaborative development platform  AWN and language technology web service  Applications on digital heritage understanding etc. AsianWordnet Join us! The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India.Jan 31- Feb 4, 2010