Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica ACL 2000 WORKSHOP: Infrastructures for Global Collaboration Saturday, October 7, Hong Kong
Types of Infrastructures Sharable resources (for Chinese computational linguistics) Mechanisms for international collaboration Mechanisms for scholarly exchange
Host Institutes -The Association for Computational Linguistics and Chinese Language Processing (ACLCLP, a.k.a. ROCLING) -Academia Sinica -National Science Council (NSC)
Sharable Resources for Chinese Computational Linguistics Corpora Lexicons Procedures
Sharable Resources for Chinese Computational Linguistics--Corpora -Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) -Sinica Treebank -Standard Segmentation Corpus -ROCLING Corpus -Mandarin-Across-Taiwan (MAT) Speech Database
Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) 5 million words, segmented and tagged Direct WWW Access - words/modern-words/index.html OR - License Information -
Sinica Treebank ,725 Trees 239,532 Words Direct WWW Access (1000 sample trees) License Information
Mandarin-Across-Taiwan (MAT) Speech Database Speech files are collected through telephone networks. The content Includes spontaneous speech (short answering statements) and read speech (numbers, Mandarin syllables, words of 2 to 4 syllables, phonetically balanced sentences). MAT-160 ( 160 speakers) MAT
Sharable Resources for Chinese Computational Linguistics-Procedures Segmentation Standard for Chinese Language Processing Segmentation Standard Standard Segmentation Corpus (2 million words, segmented) Standard Segmentation Lexicon (42,138 entries, w/ frequency) Segmentation Program (free download )
Sharable Resources in Languages Other than Modern Mandarin Classical Chinese Corpora Corpus of Formosan Austronesian Languages Under construction, part of the National Digital Archive Initiative Lexical Databases of other Sino-Tibetan and Tibeto-Burmese Languages
Mechanisms for International Collaboration Major Sponsors of International Collaboration Involving Taiwan -- The Chiang Ching-kuo Foundation for International Scholarly Exchange The National Science Council --Academia Sinica
Synchronic and Diachronic Chinese Corpora Three Projects Sponsored by the CCK Foundation ( ) Chu-Ren Huang, Keh-jiann Chen and Pei-chuan Wei, Academia Sinica Paul Thompson, SOAS, University of London Chaofen Sun, Stanford University
Mechanisms for Scholarly Exchange and Collaboration Department of International Programs, NSC Canada: NRC France: CNRS Japan: EAACST Germany: DFG, DAAD, DKFG Netherlands: NWO, IIAS USA: NSF, NIH UK: Royal Society of London, ETC
A NSF/NSC International Joint Project NSF: Asian Language Digital Library Project Ching-Chih Chen, Simmons College NSC International Digital Library Collaborative Projects -- Lexicon-based Knowledge Linking -Approaches Towards a WordNet Infrastructure for Multilingual Digital Library Chu-Ren Huang, Academia Sinica -- Linguistic Technology and Resources for English-Chinese Bilingual Information System Hsin-Hsi Chen, National Taiwan University
Mechanisms for International Collaboration-Bilateral Projects -Case by Case Negotiation Academia Sinica vs. Hong Kong Chinese University, LDC, Stanford, UCSB etc.
Mechanisms for Scholarly Exchange- Conferences ROCLING (annually since 1988) PACLIC [Pacific Asia Conference on Language Information and Computation] (regional conference involving Hong Kong, Japan, Korea, Singapore, and Taiwan) COLING2002
Mechanisms for Scholarly Exchange- Exchange Scholars Academia Sinica and EHESS: Yearly exchange Academia Sinica and University of Pennsylvania (under negotiation) NSC and CNRS, NSC and NWO: Cognitive Science
Mechanisms for Scholarly Exchange- Post-doctoral Fellows -Academia Sinica Post-doctoral Fellowships Application through Project PI’s or directly by applicants -NSC Post-doctoral Fellowships
Mechanisms for Scholarly Exchange- International Students Computational Linguistics and Chinese Language Processing An international graduate (PhD) program (Proposal under review) Visiting Students Internships