GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information.

Slides:



Advertisements
Similar presentations
IT-university in Kista May 2004 Swedish Program for ICT in Developing Countries Gunnar Landgren Rector IT-university in Kista.
Advertisements

Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Hiroshi NAKAGAWA Information Technology Center, University of Tokyo,Japan Postal:
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
LREC 2006 May Genoa, Italy 1 Oriental COCOSDA: Past, Present and Future Shuichi ITAHASHI National Institute of Informatics (NII), Tokyo, Japan AIST,
Language Resources in Indonesia Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for.
J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.
Expert Group Meeting on International Economic and Social Classifications United Nations Statistics Division May 2011, New York.
Open Statistics: Envisioning a Statistical Knowledge Network Ben Shneiderman Founding Director ( ), Human-Computer Interaction.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
CAPMAS Arab Republic of Egypt Central Agency for Public Mobilization and Statistics Presented by : Salwa Elsayed Selim Elshazly Director of establishments.
Building a Multilingual Thesaurus for Agriculture, Forestry and Fisheries in Japan Hiroko AOKI Agriculture, Forestry and Fisheries Research Information.
An innovative platform to allow translation and indexing of internet sites Localization World
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi.
Information and Communication Technologies in the field of general education in Armenia NATIONAL CENTER OF EDUCATIONAL TECHNOLOGIES.
IAEA International Atomic Energy Agency INIS Progress and Activities Report Highlights of Activities 2006/2007.
Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16 National Institute of Information and Communications Technology,
The ECHA-term project Multilingual REACH and CLP Terminology Dieter Rummel, Translation Centre for the Bodies of the EU Luxembourg EAFT - Oslo, 11 October.
1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.
ELN – Natural Language Processing Giuseppe Attardi
National Science Portals: New Potential Partnerships for Global Discovery Eleanor G. Frierson Deputy Director, National Agricultural Library (U.S.), Co-chair.
Computer Concepts 2014 Chapter 7 The Web and .
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Online Library of Knowledge – b-on South European Libraries Link Madrid, March 2007 João Mendes Moreira.
FAO, Library and Documentation Systems Division – Dr. Johannes Keizer | May 2006 AGRIS – A new Vision and Strategy CAAS, Beijing May 2006 A new vision.
OSHA Alliance Program. 2 n Facilitates voluntary collaboration with OSHA to address such things as: F Elimination or control of a particularly serious.
Recent Activities of Speech Corpora and Assessment in Korea Yong-Ju Lee Wonkwang University Korea.
National Institute of Informatics Current Status of Institutional Repositories in Japan National Institute of Informatics Izumi Sugita Library Liaison.
Roadmap for Language Resources and Evaluation in a Multilingual Environment Minority Languages in the African Context Justus Roux Centre for Language and.
UBL Plenary Meeting Nov3-7,2003 - 1 - Forming a Japanese Localization SubCommittee (JLSC) Forming a Japanese Localization SubCommittee (JLSC) UBL Plenary.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Community Legal Education Online: CLEONet Prepared by Fiona MacCool CLEONet Project Manager Community Legal Education Ontario (CLEO) Learn,
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
Licensing and Distribution of Resources and Software PAN L10n Perspective Sarmad Hussain Center for Research in Urdu Language Processing National University.
ISLE: International Standards for Language Engineering A European/US joint project Martha Palmer University of Pennsylvania Tides Kickoff March 22, 2000.
BULGARIA, Sofia 1606 Vladayska str. 83 Tel. / fax
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Statipedia: a platform for collaboration across statistical agencies Peter B. Meyer Office of Productivity and Technology, BLS and James A.
1 Al-Quds Open Univ. E-Learning Experience. 2 Projects Started Multimedia CDs Production of multimedia content packaged on CD for certain course topics.
Policies of the major countries of the world concerning implementation of integrated science and technology information networks International Workshop.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
FAO, Library and Documentation Systems Division – Dr. Johannes Keizer | May 2006 AGRIS – A new Vision and Strategy GAAS, Guangzhou May 2006 A new vision.
General IT Knowledge Topic: NiDA Presentation by: Eat Sarith.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
1 Geospatial Line of Business Update FGDC Coordination Group April 14, 2009.
Slide 1 Open Educational Resources: Stimulating Global Knowledge Sharing Marshall S. Smith and Catherine M. Casserly September 27, 2005 The William and.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Cocosda multimodal Nick Campbell ATR, Japan. “facilitation” Facilitating collaboration –bringing existing work together Taking initiatives to provide.
Communicative and Academic English for the EFL Professional.
St. John’s, Canada /02/20161 Latest changes and developments in the International Electrotechnical Commission (IEC) Presentation at the meeting.
LREC /05/06. LREC /05/06 COCOSDA is an international organization for coordinating the globalized efforts in language resources and.
The DEER Distributed European Electronic Resource Dr Suzanne Keene Francesca Monti University College London.
Think of a sentence to go with this picture. Can you use any of these words? then if so while though since when Try to use interesting adjectives, powerful.
Lesson 10—Networking BASICS1 Networking BASICS The Internet and Its Tools Unit 3 Lesson 10.
Copyright James Kulich This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial,
1 Organisational Issues Steering Committee changes Regional Support Centres.
Hitoshi ISAHARA National Institute of Information and Communications Technology (NICT) Sustainability of the work and PAN L10n network: Vision Beyond 2010.
JST Chinese Bibliographic Database January, 2007 Japan Science and Technology Agency (JST) Office of Science and Technology Information.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
Dynamic Deployment of Language Services Using JGN2plus
Digital Library Service
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.
BBI 3423 LANGUAGE AND ICT.
The InWEnt Blended-learning approach; GC21 as an e-learning and Blended-learning platform 22/02/2019 An introduction course on InWEnt Blended-learning.
Statistical n-gram David ling.
MOBILITY Important issue on the DG-agenda since 1995 Object:
Information Retrieval
Presentation transcript:

GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information and Communications Technology (NICT) Licensing and Distribution of Resources and Applications

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 2 Organizing Creation & Utilization of Language Corpora Creation of language corpora needs some cost. Utilization needs a system to distribute corpora. Some activities started early in 1990s LDC in U.S.A ELRA in Europe

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 3 Japanese Activities GSK: Gengo Shigen Kyokai (Language Resource Association) Launched in 1999, Reformed as an NPO in 2003, Project accepted in 2005 for 3 years, Text corpora are its main concern at present. NII-SRC distributes speech corpora.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 4 GSK and NII-SRC Language Resource Association (GSK) A nonprofit organization collecting and distributing text and speech corpora. NII-Speech Resources Consortium (NII-SRC) Collects and distributes most major speech corpora. These two organizations try to play central roles for collecting and distributing speech and language corpora in Japan.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 5 Knowledge Information Processing Technologies Committee Language Resource Sub-committee JEITA (Japan Electronics and Information Technology Industries Association) Natural Language Processing Portal Site SHACHI: Language Resource Metadata DB NICT: National Institute of Information and Communications Technology GSK NII-SRC TCL NII: National Institute of Informatics

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 6 Purpose of GSK Collection, distribution, investigation, research, and standardization of electronic data and software tools necessary for the promotion of science, technology, education and industry concerning natural language.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 7 GSK Organization President Two vice presidents 11 board members 25 steering committee members All are voluntary workers.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 8 No-fee Distribution ProviderUser GSK Agreement Distribution permission Corpus Payment As a rule, the cost of handling corpora falls on the user, though the corpus itself is free of charge.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 9 Agency Commission GSK Request Form Payment Agreement Provider User The providers of the corpora entrust GSK with requests received from users. GSK mediates between users and providers.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 10 Advertizing ProviderUser GSK Ad request Ad rate Payment Agreement Publicity Corpora providers entrust GSK with advertizing useful information on their data or corpora.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 11 Some Examples of GSK Corpora JEITA Multimodal Corpus Japanese Web N-ram Version 1 CICC Multilingual Dictionary IPAL Lexicon of Basic Japanese

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 12 JEITA Multimodal Corpus A corpus of collected person-to-person task- oriented dialogues. 80 min. of video for 9 conversations concerning topics of “faces” and “travel” included. Speech data transcribed and provided with annotations indicating morphemes, dialogue structure and prosody. Contained in 1 DVD-R (800 MB).

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 13 Japanese Web N-gram Version 1 N-grams that have been extracted from Google crawling publicly available Japanese webpages. Pages requiring special permission to brows or indicated with nonarchaive/noindex are not included. N-grams (1-7) with frequency greater than 20 were extracted from approximately 20 billion sentences. Contained in 6 DVD-Rs (26 GB after gzip compression).

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 14 CICC Multilingual Dictionary A collection of Malay, Indonesian, Chinese, and Thai Dictionaries containing 50,000 basic words, POS tags; some contains English translations. Technical Term Dictionary for each language is also available. Contained in 1 CD-ROM for each language. CICC: Center for the International Cooperation for Computation

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 15 IPAL Lexicon of Basic Japanese Containing 861 verbs, 136 adjectives, and 1,081 Nouns and glossary. English translations also provided for nouns contained in glossary. Contained in 1 CD-ROM.

Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos 16 Summary 1. There are several distributers of language resources in Japan. 2. GSK is the only consortium of language resources qualified as NPO in Japan. 3. GSK plans to collaborate with Language Grid Project.