1 AFNLP 2008 Meeting Indonesia Country Report Hammam Riza Agency for the Assessment and Application of Technology (BPPT) Ministry of.

Slides:



Advertisements
Similar presentations
1 Integrating user environments and data liquidity to improve the research experience.
Advertisements

Wednesday 13 April /02/ :58 European Union: Keeping up-to-date Eva Koundouraki Information Specialist, European Union EUI Library
© Max von Zedtwitz, China Frontier Research 1 China Frontier Survey - Results Prof. Dr. Max von Zedtwitz GLORAD (B-55) School of Economics.
GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Language Resources in Indonesia Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for.
ACCESSIBLE TECHNOLOGIES FOR SPEECH MANAGEMENT “Making media accessible to all” ITU workshop – Geneva October 2013.
DOING BUSINESS INTERNATIONALLY The inaugural WIPO SEMINAR SERIES WIPO Programs and Services for Business Australia August 2013 Yo Takagi Assistant Director.
Status and Challenges of Local Language Computing and BRAC University’s Initiative Naushad UzZaman Research Programmer Center for Research on Bangla Language.
Ministry of Culture of the Republic of Macedonia – Cultural Heritage Protection Office.
Measuring Monolinguality Chris Biemann NLP Department, University of Leipzig LREC-06 Workshop on Quality Assurance and Quality Measurement for Language.
Corpora and the ‘general public’ Belinda Maia and Luís Sarmento Universidade do Porto.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
HLT Research and Development for Baltic Languages in Tilde Andrejs Vasiļjevs, Raivis Skadiņš Tilde Riga, October 27, 2004.
Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Confidential & Proprietary Copyright © 2010 CiMESO Surgery, Implantology and Cosmetic Dentistry
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
T ourism Market Overview Germany, Austria and Switzerland in FY 2012/13.
Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16 National Institute of Information and Communications Technology,
1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.
Virtual Health Information Infrastructures: Scale and Scope Ann Séror, MBA, PhD 1 1 eResearch Collaboratory, Quebec City, QC, Canada, Url:
Arabic NLP: Challenges & Opportunities Dr. Samir Tartir Scientific Day Faculty of Information Philadelphia University May 15 th 2013.
DATABASE UPDATES Field/Area Database NameNo. of RecordsAvailability S&T Councils Industry & Energy Health Project Proposals On-going projects Projects.
TECHNOLOGY TALK Open Bio-Surveillance Change Fusion, Opendream and Thai Health Promotion Foundation Information and Communication Technology Forum Mukdahan,
Marshall Center Research Library Electronic Resources: An Overview.
Teaching Intellectual Property Website: A Meeting Point for Teachers of Intellectual Property.
Roadmap for Language Resources and Evaluation in a Multilingual Environment Minority Languages in the African Context Justus Roux Centre for Language and.
NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization.
Intel ISEF Educator Academy Intel ® Education Programs 1 Indonesia Intel ISEF Educator Academy Phoenix, Arizona May 13-17, 2013.
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
Web portal for minority languages in Sápmi, Nordic countries, Estonia, Latvia, Lithuania, Poland, Norway and Karelia Birger Winsa, President of SWEBLUL,
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
Ministry for Regional Development of the Czech Republic International conference CZ PRES – Tourism Industry Employment and Labour market challenges
Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept.
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
PAN L10N NETWOK VISION BEYOND 2010 INDONESIAN PERSPECTIVES Mirna Adriani University of Indonesia 16 January 2009.
Policies of the major countries of the world concerning implementation of integrated science and technology information networks International Workshop.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
educarchile The national educational internet portal A partnership with the Ministry of Education (through Enlaces) Supported by a.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Documenting Endangered Languages A Partnership between the National Endowment for the Humanities and the National Science Foundation.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Dr Liz Lyon Associate Director, Outreach Funders: Engaging the Users: the Outreach & Community Support Programme Digital Curation Centre a centre of expertise.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Third Regional Workshop on Production and Use of Vital Statistics May 2014, in Daejeon, Republic of Korea Presented by: Ashok Kumar Bhattarai, Director.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
 digital methodologies for global media research Randy Kluver Dept of Communication Texas A&M University.
Computational Linguistics Courses Experiment Test.
Hitoshi ISAHARA National Institute of Information and Communications Technology (NICT) Sustainability of the work and PAN L10n network: Vision Beyond 2010.
Azerbaijan Deposit Insurance Fund Public Awareness Policy of ADIF through the years 1.
Communication with public via media and social media Anu Ots Communications Manager of Statistics Estonia
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Measuring Monolinguality
Heidi Johnson The University of Texas at Austin
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.

LACONEC A Large-scale Multilingual Semantics-based Dictionary
PMI Strategic Plan Indonesia Adaptation Strategy 2011
المكتبة العربية الرقمية
Presentation transcript:

1 AFNLP 2008 Meeting Indonesia Country Report Hammam Riza Agency for the Assessment and Application of Technology (BPPT) Ministry of Research and Technology Republic of Indonesia

TOC Past Activities Activities in 2007 Activities Plan 2008, 2009 National Language Year 2008

3 Past NLP Research Projects in Indonesia Indonesian Text-To-Speech (BPPT, ITB, UI) GDA/MMA/Linguistic-DS MPEG-7 (Multimedia Annotation) Cross-Linguistic Portal (dictionaries, corpus, tools) Web translator (WebTRans) Standard Indonesian Language Corpus (SILC) Indonesian Language Dictionaries Project (KBBI) English-Indonesia Parallel Corpus (INCI) Speech recognition/synthesis system (Bandung Institute of Technology/ Telkom RDC/University of Indonesia) Information retrieval (ITB and University of Indonesia) Text/Image processing tools (Gajah Mada University) Computational lexicon (National Language Center) Computational morphology (Atmajaya University)

4 Promotion of Language Technologies (2007) National Language Congress XII in Solo introducing toolkit to build speech database for endangered languages and Atmajaya Language Workshop (June 2007) in Jakarta on promoting local computing policy and speech technologies (both keynote speeches by Dr. Hammam Riza) Promotion of Context Sensitive Dictionary Project for Speech Translation Corpus for Aceh Tsunami Region; (Indonesian- Acehnese, bidirectional)

5 Activities in Machine Translation ( ) Rule-based system Indonesian-English translator (started in 2006) was launched to the market June 2007 by ITB This translator is combined with English TTS (Windows), and Indonesian TTS (proprietary) Experiment of Statistical MT – using Pharaoh decoder (Eng-Indo parallel corpus) by

6 Current Activities in Speech Tech Telkom RDC & BPPT collaboration on Speech Recognition and Summarization Indonesia Goes Open Source (IGOS) speech recognition system (funded by Ministry of Research and Technology) Speech recognition system for Bahasa Indonesia (University of Indonesia) – Transcribing speech data that contains broadcast TV and Radio news – Applications: sending short message service (sms) IVR ( health and tourism services) Research for “intonation by example” and “automatic prosody pattern extractor” using Artificial Neural Network (ANN) Text to Speech system for local languages (ITB/UI)

100 th Year of Bahasa Indonesia – National Language Year 2008 Series of event culminating at the International Conference on Bahasa Indonesia (Oct 2008)  Importance of Indonesian – Its roles, functions in national life & development (policy making, business, media, education)  Language planning (shaping change) 6 keynote speakers from AFNLP will be invited by Indonesian government through out the year

8 Major Activities for 2008 Local Language Resource Projects (Language Center) Indonesian and Local Languages - Wordnet MALINDO (Malaysia-Indonesia) joint projects Speech to speech translation for Asian languages (A-STAR) Speech database Telkom RDC/BPPT (APT support) Language Resources and Translation English - Indonesia (collaboration with PAN Localization) Speech Corpus for Local Languages (Endangered Languages) – using BLARK (ELDA)

9 Activities Plan for Speech Recognition and Phrase-based Statistical Machine Translation (SMT) system for bidirectional Indonesian-English and Indonesian-Japanese Mapping and SMT for Indonesian-Regional Languages (Bahasa Nusantara) and for German, French, Chinese and Arabic (cross border languages) Information Retrieval (cross language speech retrieval)  Searching and retrieving Indonesian speech data Topic Detection and Tracking (TDT)  Identifying topics in speech data collection  Classifying new data to the existing topics in the collection Speech Synthesis Speech Summarization  Summarize the Indonesian speech documents

E-dictionary project National Language Center Size & Comprehensiveness:  200,000 entries  many subject areas are covered Method:  corpus-based,  primary data for largest print dict Usefulness:  find the words you need  definitions and examples are helpful Users  writers, journalists, editors, scientists, academics, teachers, students, business people, lawyers etc… Kamus Besar Bahasa Indonesia (KBBI) 3 rd ed. Echols & Shadily’s Eng-Ind. dictionary.

In Indonesia, there are at least 13 biggest local languages with at least one million speakers Javanese (75,200,000) Sundanese (27,000,000) Malay (20,000,000) Madurese (13,694,000) Minangkabau (6,500,000) Batak (5,150,000) Buginese (4,000,000) Balinese (3,800,000) Acehnese (3,000,000) Sasak (2,100,000) Makassarese (1,600,000) Lampung (1,500,000) Rejang (1,000,000)

ACEH – 32 local languages

EAST JAVA – 6 local languages

LOCAL & CROSS-BORDER LANGUAGES Note: Cross-Border Languages in Indonesia: English, Arabic, Chinese, French, German, Dutch, Japanese, etc.

Language Digital Divide Language Preservation Survey of indigenous local languages Local computing policy will be developed for major local languages Endangered languages are identified and preserved by means of ICT Language resources collection for official and major local languages

Thank You Any comments please mail to