Download presentation
Presentation is loading. Please wait.
Published byElena Palfreyman Modified over 9 years ago
1
Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica churen@sinica.edu.tw ACL 2000 WORKSHOP: Infrastructures for Global Collaboration Saturday, October 7, Hong Kong
2
Types of Infrastructures Sharable resources (for Chinese computational linguistics) Mechanisms for international collaboration Mechanisms for scholarly exchange
3
Host Institutes -The Association for Computational Linguistics and Chinese Language Processing (ACLCLP, a.k.a. ROCLING) -Academia Sinica -National Science Council (NSC)
4
Sharable Resources for Chinese Computational Linguistics Corpora Lexicons Procedures http://rocling.iis.sinica.edu.tw/ROCLING/
5
Sharable Resources for Chinese Computational Linguistics--Corpora -Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) -Sinica Treebank -Standard Segmentation Corpus -ROCLING Corpus -Mandarin-Across-Taiwan (MAT) Speech Database
6
Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) 5 million words, segmented and tagged Direct WWW Access -http://www.sinica.edu.tw/~tibe/2- words/modern-words/index.html OR -http://www.sinica.edu.tw/ftms-bin/kiwi.sh License Information - http://rocling.iis.sinica.edu.tw/ROCLING/corpus98/sinicor_E.htm
7
Sinica Treebank 1.0 38,725 Trees 239,532 Words Direct WWW Access (1000 sample trees) http://godel.iis.sinica.edu.tw/CKIP/trees1000.htm License Information http://rocling.iis.sinica.edu.tw/ROCLING/Treebank/Treebank-E.htm
8
Mandarin-Across-Taiwan (MAT) Speech Database Speech files are collected through telephone networks. The content Includes spontaneous speech (short answering statements) and read speech (numbers, Mandarin syllables, words of 2 to 4 syllables, phonetically balanced sentences). MAT-160 ( 160 speakers) MAT-2000 http://rocling.iis.sinica.edu.tw/ROCLING/MAT/index_cf.htm
9
Sharable Resources for Chinese Computational Linguistics-Procedures Segmentation Standard for Chinese Language Processing Segmentation Standard http://godel.iis.sinica.edu.tw/ROCLING/juhuashu1.htm Standard Segmentation Corpus (2 million words, segmented) http://godel.iis.sinica.edu.tw/ROCLING/corpus98/segcorp_E.htm Standard Segmentation Lexicon (42,138 entries, w/ frequency) http://godel.iis.sinica.edu.tw/ROCLING/corpus98/segdic_E.htm Segmentation Program (free download ) http://godel.iis.sinica.edu.tw/CKIP/ws/
10
Sharable Resources in Languages Other than Modern Mandarin Classical Chinese Corpora http://www.sinica.edu.tw/~tibe/2-words/old-words/index.html Corpus of Formosan Austronesian Languages Under construction, part of the National Digital Archive Initiative Lexical Databases of other Sino-Tibetan and Tibeto-Burmese Languages
11
Mechanisms for International Collaboration Major Sponsors of International Collaboration Involving Taiwan -- The Chiang Ching-kuo Foundation for International Scholarly Exchange http://www.cckf.org http://www.cckf.org.tw --The National Science Council --Academia Sinica
12
Synchronic and Diachronic Chinese Corpora Three Projects Sponsored by the CCK Foundation (1990-1995) Chu-Ren Huang, Keh-jiann Chen and Pei-chuan Wei, Academia Sinica Paul Thompson, SOAS, University of London Chaofen Sun, Stanford University
13
Mechanisms for Scholarly Exchange and Collaboration Department of International Programs, NSC http://www.nsc.gov.tw/int/2_cooperation/index_02.html Canada: NRC France: CNRS Japan: EAACST Germany: DFG, DAAD, DKFG Netherlands: NWO, IIAS USA: NSF, NIH UK: Royal Society of London, ETC
14
A NSF/NSC International Joint Project NSF: Asian Language Digital Library Project Ching-Chih Chen, Simmons College NSC International Digital Library Collaborative Projects -- Lexicon-based Knowledge Linking -Approaches Towards a WordNet Infrastructure for Multilingual Digital Library Chu-Ren Huang, Academia Sinica -- Linguistic Technology and Resources for English-Chinese Bilingual Information System Hsin-Hsi Chen, National Taiwan University
15
Mechanisms for International Collaboration-Bilateral Projects -Case by Case Negotiation Academia Sinica vs. Hong Kong Chinese University, LDC, Stanford, UCSB etc.
16
Mechanisms for Scholarly Exchange- Conferences ROCLING (annually since 1988) PACLIC [Pacific Asia Conference on Language Information and Computation] (regional conference involving Hong Kong, Japan, Korea, Singapore, and Taiwan) http://www.rcl.cityu.edu.hk/paclic15 COLING2002 http://www.COLING2002.sinica.edu.tw
17
Mechanisms for Scholarly Exchange- Exchange Scholars Academia Sinica and EHESS: Yearly exchange Academia Sinica and University of Pennsylvania (under negotiation) NSC and CNRS, NSC and NWO: Cognitive Science
18
Mechanisms for Scholarly Exchange- Post-doctoral Fellows -Academia Sinica Post-doctoral Fellowships Application through Project PI’s or directly by applicants -NSC Post-doctoral Fellowships
19
Mechanisms for Scholarly Exchange- International Students Computational Linguistics and Chinese Language Processing An international graduate (PhD) program (Proposal under review) Visiting Students Internships
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.