LREC 2006 May. 24-26 Genoa, Italy 1 Oriental COCOSDA: Past, Present and Future Shuichi ITAHASHI National Institute of Informatics (NII), Tokyo, Japan AIST,

Slides:



Advertisements
Similar presentations
Kento Aida, Tokyo Institute of Technology Grid Working Group Meeting Aug. 27 th, 2003 Tokyo Institute of Technology Kento Aida.
Advertisements

My Misunderstanding … Computer is not magical box! (International Communication on High Speed Network) Takatoshi MATSUBARA & GOTO, Yukinori.
Capacity Building in Quality Assurance in Higher Education in the Asia-Pacific Region Dr. Antony Stella Adviser, NAAC.
GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information.
Greetings AEN Conference 2004 in Singapore Takashi SAKAMOTO, Prof. Dr. Chairperson, AEN Promotion Committee.
Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica ACL 2000 WORKSHOP:
太平洋鄰里協會 Pacific Neighborhood Consortium (PNC) An organizational mechanism for encouraging development and sharing of digital content.
Leo Tak- hung CHAN (AATI Director of Research and Publications), Hong Kong (China)
Key Issue 3 Where are other language families distributed?
Asian Regionalism? ASEAN Northeast Asia. Outline Economic development –Flying geese, falling geese Economic interdependence ASEAN Northeast Asia.
An Update on OCLC Asia Pacific OCLC CJK Users Group Meeting San Diego, CA March 6, 2004 Andrew H. Wang Executive Director OCLC Asia Pacific Building an.
Languages of Asia Part 1: East and Southeast Asia ASIAN 401 Spring 2009 ASIAN 401 Spring 2009.
Collecting Non-Chinese Materials from China: Needs, Methods, Issues Yang Jidong University of Michigan.
APNG Camp Anthony S. Lee. What Is APNG Camp? APNG Camp means Asia Pacific Next Generation Camp that provides a forum for Asia Pacific young Internet users.
AsiaCrypt Program Committee Report Chi Sung Laih Nov.30~Dec.4,2003 Taipei, Taiwan.
1 Introduction of ANQ (Asian Network for Quality ) by Chih-Han Wang, Ph.D. Chairperson, ANQ ( )
Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16 National Institute of Information and Communications Technology,
JPCERT/CC May Fixed-Point Auto Data Collecting System Getting more accurate Scan and Prove data to provide more accurate network traffic analysis.
Where are the other language families?
Presented by PID / BA Wing-kun TAM
Joint HYI-NUS Doctoral Scholarship Program Professor Linda Grove Consultant Harvard-Yenching Institute.
1 Ho Chi Minh City, Vietnam, July 27, 2008 Daniel Navid, Chief Executive Officer International Osteoporosis Foundation Osteoporosis and the role of IOF.
1 Association of Pacific Rim Universities Dr Lawrence Loh Secretary General Association of Pacific Rim Universities 22 March, nd.
KAMANTO SUNARTO Board Member, Asia-Pacific Quality Network CAPACITY DEVELOPMENT FOR QUALITY ASSURANCE IN THE ASIA PACIFIC REGION: A CASE STUDY presented.
National Workshop on ANSN Capacity Building IT modules OAP, Thailand 25 th – 27 th June 2013 KUNJEER Sameer B History of centralized ANSN website as well.
Where are Other Language families Distributed?. 1.Indo-European (46% speak one) 2.Sino-Tibetan (21% speak one) 3.Afro-Asiatic 4.Austronesian 5.Niger-Congo.
Recent Activities of Speech Corpora and Assessment in Korea Yong-Ju Lee Wonkwang University Korea.
Next Generation Speech Science and Technologies - A Cross-Country Joint Project for Collaboration between Speech Research Labs in Taiwan and in Japan Lin-shan.
NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.
Message from ACFA Chair C.Zhang IHEP CCAST ILC Accelerator Workshop and First Asian ILC R&D Seminar under JSPS Core University November 5, 2007.
Copyright © 2010 APCERT Graham Ingram AusCERT SC member of APCERT AP* Retreat, Gold Coast 23 rd August 2010.
National Workshop on ANSN Capacity Building IT modules OAP, Thailand 25 th – 27 th June 2013 KUNJEER Sameer B Briefing on centralized ANSN website.
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
® Hosted and Sponsored by Silver Sponsors Copyright © 2011Open Geospatial Consortium 79th OGC Technical Committee Brussels Belgium Tien-Yin Chou, GIS Research.
EGNRET Projects Cary Bloyd EGNRET 27 Zhuhai, China 9-11 October 2006.
Summary of Activities at the Library of the Center for Southeast Asian Studies Kyoto University.
Plans for Regional Conference in 2005 IEEE R10 Meeting, Singapore March 30, 2005 Regional Conference Coordinator Yong-Jin Park.
20 th APAN Meeting, Taipei 23/27 August 2005 TEIN TEIN2 Overview David West TEIN2 Project Manager DANTE Slide 1.
2A. She’s British. Countries & nationalities Point out what countries through their flags & which nationalities?
International Research Networking David West, DANTE 26 April 2007 S Asia Planning Meeting Crystal Gateway Marriott, Arlington, Virginia TEIN2 experiences.
Future Development of Asian Electronics Industry May 17, 2004 Japan Electronics & Information Technology Industries Association.
Asian Peoples Population, Culture, and Religions.
XXXXXXXXXXX. Where is Mongolia? A B D C XXXXXXXXXXX Where is China? A B D C.
ADD and SNLP in Thailand Virach Sornlertlamvanich Thai Computational Linguistics Lab. (TCL), NICT Asia Research Center, Thailand
1 IAVE.ORG and Future Developments of Volunteerism and the Internet in the Asia- Pacific By Anthony Carlisle, IAVE Presented at the Symposium on Volunteerism.
Intercultural Collaboration Experiment ICE 2002 Department of Social Informatics, Kyoto University Japan Science and Technology Corporation NTT Communication.
LANGUAGE: COMMUNICATION THROUGH SPEECH. FAST FIVE ?
4-1. Introduction 1/10 APERC Workshop at EWG47, Kunming, China 19 May Oil and Gas Emergency Exercises 4-1. Introduction Kazutomo IRIE General Manager,
The APNG Camp Anthony S. Lee. What Is APNG Camp? APNG Camp means Asia Pacific Next Generation Camp that provides a forum for AP regional young Internet.
AP Outreach Program September APTLD Board/Member Meeting AP Outreach Asia Pacific Joint Secretariat.
Works at Joint OECD/Korea Regional Centre on Health and Social Policy (RCHSP) for a talk at the World Bank HCF meetings by Bong-min Yang, PhD (Seoul National.
0 ARF-ISG DOD Delhi, 9 November 2009 Takeshi ISHIKAWA Director for International Policy Ministry of Defense, Japan Promoting Effectiveness of Defense-Security.
Key Issue 3: Where Are Other Language Families Distributed?
Key Issue #3: Where Are Other Language Families Distributed? Classification of Languages 8 Largest Families: ①Indo-European (48% of world) – English, Hindi,
28 May 2006Cocosda / WRITE workshop1 Cocosda Coordinating Committee on Speech Databases and Speech I/O Systems Assessment.
LREC /05/06. LREC /05/06 COCOSDA is an international organization for coordinating the globalized efforts in language resources and.
Cooperation on Technical Assistance in the East Asia Region 1 SHIH, Hui-Fen Vice Chairperson Chinese Taipei, Fair Trade Commission 15 September, 2011.
Global Support For Librarians in Asia Through SLA Sue Henczel President Australia and New Zealand Chapter.
Strawman : Output Document of Seoul Retreat Committee Meeting - Presentation Material - APAN Retreat Committee 21 January 2003.
Goals and Plans for Regional Conference in 2006 R10 EXCOM Meeting, Gold Coast Jan 7, 2006 Regional Conference Coordinator Yong-Jin Park.
International Student Mobility and Asian Higher Education Framework for Global Network Asia-Pacific Sub-regional Preparatory Conference For the 2009 World.
Asia ISOM 591 April 10,
Incheon (2009) Registration
Asia Pacific Area Network
Asian Cooperation in KEK
ICT: Driving Innovation in Asia-Pacific
APAN update & update Yasuichi Kitamura APAN Board member Steering Committee member.
Asian Regionalism? ASEAN Northeast Asia.
XXXXXXXXXXX.
Regional Governance and Cooperation of Higher Education in Asia
Presentation transcript:

LREC 2006 May Genoa, Italy 1 Oriental COCOSDA: Past, Present and Future Shuichi ITAHASHI National Institute of Informatics (NII), Tokyo, Japan AIST, Tsukuba, Japan Chiu-yu TSENG Academia Sinica, Taipei, Taiwan Satoshi NAKAMURA ATR Spoken Language Communication Res. Labs., Kyoto, Japan

LREC 2006 May Genoa, Italy 2 Contents 1.Necessity of Speech Corpora 2.Organizations for Speech Corpora 3.Asian Languages 4.Brief History 5.Goals & Strategies 6.Regional Activities 7.Conclusion

LREC 2006 May Genoa, Italy 3 Necessity of Speech Corpus Speech Research ↑ Objectivity of Research Speech Data ↑ + → Openness to the Public Related Information ↓ ↓ Preserving Cultural Legacy Preservation of Spoken Language Data

LREC 2006 May Genoa, Italy 4 Organizing Creation & Utilization of Speech Corpora Creation of speech corpora needs some cost. Utilization needs a system to distribute corpora. Some activities started early in 1990s COCOSDA 1992 LDC in U.S.A ELRA in Europe

LREC 2006 May Genoa, Italy 5 COCOSDA International Coordinating Committee on Speech Databases and Speech I/O Systems Assessment Workshops held annually at Interspeech Cocosda promotes the development of spoken language corpora for building and/or evaluating spoken language technology and offers coordination of projects and research efforts to improve their efficiency.

LREC 2006 May Genoa, Italy 6 Features of Asian Languages 1. Many languages belong to different language families. 2. Variety of orthographic systems Various letters/characters used 3. Some tonal languages 4. No space between words in some languages 5. Non-unique romanization systems

LREC 2006 May Genoa, Italy 7 Language Families of Asian Languages 1.Austronesian (1268 languages): Malay, Indonesian, etc. 2.Sino-Tibetan (403): Chinese, Tibetan, Burmese, etc. 3.Austro-Asiatic (169): Khmer, Vietnamese, etc. 4.Tai-Kadai (76): Thai, Lao, etc. 5.Dravidian (73): Tamil, Telugu, etc. 6.Altaic (66): Mongolian, Turkic, Korean, etc. 7.Japanese (12): Japanese, Ryukyuan, etc. cf. Indo-European (449) by Ethnologue.com

LREC 2006 May Genoa, Italy 8 Letters, Tone & Word Order 1. Proper letters: Burmese, Chinese, Japanese, Khmer, Korean, Thai, etc. 2. Latin letters: Indonesian, Malay, Vietnamese, etc. 3. Tonal languages: Burmese, Chinese, Lao, Thai, Vietnamese, etc. 4. Word order: SOV, SVO, VSO, VOS

LREC 2006 May Genoa, Italy 9 Word boundary in text 1.No space between words: Burmese, Chinese, Japanese, Khmer, Lao, Thai, etc. 2.Space between words: Indonesian, Malay, Mongolian, Vietnamese, etc.

LREC 2006 May Genoa, Italy 10 Asian Activities 1994, 1997 Oriental COCOSDA 1999 GSK (Language Resource Association) in Japan 2001 SITEC in Korea (Speech Information Technology & Industry Promotion Center) 2002 Chinese LDC CCC (Chinese Corpus Consortium) in China 2006 NII-SRC in Japan (National Institute of Informatics, Speech Resources Consortium)

LREC 2006 May Genoa, Italy 11 Oriental COCOSDA Proposed in 1994, to exchange ideas, share information, discuss regional issues on SLP. Preparatory meeting in Hong Kong in Annual workshops held since 1998 in Japan, Taiwan, China, Korea, Thailand, Singapore, India, Indonesia.

LREC 2006 May Genoa, Italy 12 Necessity of Oriental COCOSDA Asia is a multilingual region. Diversity of the languages is larger than Europe. Speech researches were emerging. Speech corpora were required. Cooperation among countries was necessary. Organizations for speech corpora were needed.

LREC 2006 May Genoa, Italy 13 Oriental COCOSDA Mission To exchange ideas, share information, discuss regional matters on creation, utilization, dissemination of spoken language corpora of oriental languages, assessment methods of speech input/output systems, and To promote speech research on oriental languages.

LREC 2006 May Genoa, Italy 14 Goals of Oriental COCOSDA 1.Initiating Speech Resources Consortium in each country. 2.Establishment of Asian Network among the Consortia. 3.Creation of multilingual corpus of semantically similar contents.

LREC 2006 May Genoa, Italy 15 Strategies of Oriental COCOSDA 1.Foundation of Oriental COCOSDA  Forum of speech corpora 2.Establishment of Regional Consortia: GSK, SITEC, Chinese LDC, CCC, NII-SRC 3. Collaboration among the consortia

LREC 2006 May Genoa, Italy 16 Oriental COCOSDA Organization Convenor: Chiu-yu TSENG (2006-) S. ITAHASHI ( ) Advisory members: Three from China, Japan, Korea Committee members: 21 from 10 regions including China, Hong Kong, India, Indonesia, Japan, Korea, Mongolia, Singapore, Taiwan, Thailand.

LREC 2006 May Genoa, Italy 17 International Workshop on East-Asian Language Resources and Evaluation - Oriental COCOSDA WORKSHOP st Meeting, Tsukuba, Japan (30 papers, 54 participants) nd Meeting, Taipei, Taiwan (44, 120) rd Meeting, Beijing, China (8, 20) th Meeting, Taejon, Korea (11, 25) th Meeting, Hua Hin, Thailand (24, 96) + SNLP th Meeting, Sentosa, Singapore (28, 60 ) + PACLIC th Meeting, Delhi, India (55, 150) + iSTEPS, iSTRANS th Meeting, Jakarta, Indonesia (24, 65)

LREC 2006 May Genoa, Italy 18 Oriental COCOSDA Organizers 8 T.F.Zheng (China) S.S.Agrawal (India) Thanaruk T. (Thailand) K.T.Lua (Singapore) S.Itahashi (Japan) L.S.Lee (Taiwan) C.K.Chan (Hong Kong) H.Riza (Indonesia) Y-J Lee (Korea)

LREC 2006 May Genoa, Italy 19 Participation 0. China, Japan, Korea, Taiwan (CJKTw), Hong Kong (HK) 1.CJKTw 2.CJKTw, Thailand (Th), France (F), U.S.A. 3.CJKTw, Th, Mongolia (Mg) 4.CJKTw, Th, Australia (Au) 5.CJKTw, Th, India (Id), Indonesia (Is), Guam 6.CJKTw, Th, Id, Is, Singapore (S) 7.CJKTw, Id, Is, S, Au, F, U.S.A. 8.CJKTw, Th, Is, Malaysia, Mg, HK

LREC 2006 May Genoa, Italy 20 Some Regional Activities Japan Korea China Hong Kong Mongolia Singapore Taiwan Thailand India Indonesia

LREC 2006 May Genoa, Italy 21 Japanese Activities GSK: Language Resource Association Launched in 1999 Renovated as an NPO in 2003 Project accepted in 2005 for 3 years Emphasizing written text corpora NII-SRC launched in 2006 for speech corpora

LREC 2006 May Genoa, Italy 22 Standardization in Japan 1) Open Software Tools: Julius, Galatea, etc. 2) Standard of Speech Synthesis System Performance Evaluation Methods by JEITA (2003) 3) Standard of Symbols for Japanese Text-To-Speech Synthesizer by JEIDA (2000) JEITA: Japan Electronics and Information Technology Industries Association JEIDA: Japan Electronic Industry Development Association

LREC 2006 May Genoa, Italy 23 Korea SITEC (Speech Information Technology & Industry Promotion Center) Founded in 2001 (Korean LDC/ELRA) Wonkwang University as host organization (7 full-time staffs)

LREC 2006 May Genoa, Italy 24 Chinese LDC Launched in 2002 Creation of linguistic corpora Management & distribution of language resources Promotion of sharing language resources *Chinese Corpus Consortium (CCC)

LREC 2006 May Genoa, Italy 25 Future Prospects: Global Speech Corpus Digits, digit strings, days of the week, months, time, salutations, yes/no, well- known proper nouns (person names, cities, companies), well-known stories, phonetically-balanced sentences, etc. common to all languages.

LREC 2006 May Genoa, Italy 26 Utterance Content Items widely understood in the world: 10 Digits, 12 Months of the year, 7 Days of the week, 4 Words on Weather, 6 Phrases of Greetings, 3 Words of Replies, 4 Words on time. “North Wind” from Aesop’s Fables

LREC 2006 May Genoa, Italy 27 Features of the proposed corpus Containing various Asian Languages With the same semantic content Recorded in a sound-proof room

LREC 2006 May Genoa, Italy 28 Future of Oriental COCOSDA 1. Collaboration among regional activities 2. Cooperative creation of speech corpora 3. Promotion of speech research in Asia Future conference sites: Malaysia, Vietnam, Mongolia, Xinjang Uygur Autonomous Region of China

LREC 2006 May Genoa, Italy 29 Conclusion 1. Importance of speech corpora for promoting speech research. 2. Role of organizations for speech corpus creation and distribution 4. GSK, SRC/SITEC/Chinese LDC, CCC are expected to further speech corpus creation and distribution together with Oriental COCOSDA in East Asia.

LREC 2006 May Genoa, Italy 30 Oriental COCOSDA Dec Universiti Sains Malaysia Penang, Malaysia Abstract submission: Aug. 5 Notification of acceptance: Aug. 26 Final manuscript: Sep. 30