Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi.

Slides:



Advertisements
Similar presentations
An Ontology Creation Methodology: A Phased Approach
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Chapter 5: Introduction to Information Retrieval
GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
The Tiger Project: Korea Culture and Heritage DL Kim, Sung Hyuk Division of Information Science Sookmyung Women’s University, Seoul, Korea.
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
Information Management for Science in Korea Hyun Y. Cho Department of Library & Information Science Kyonggi University
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.
ELN – Natural Language Processing Giuseppe Attardi
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Introduction to Natural Language Processing Heshaam Faili University of Tehran.
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
Structure of Study Programmes
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
Survey of Semantic Annotation Platforms
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Recent Activities of Speech Corpora and Assessment in Korea Yong-Ju Lee Wonkwang University Korea.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Structure of Study Programmes Bachelor of Computer Science Bachelor of Information Technology Master of Computer Science Master of Information Technology.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Module 5 A system where in its parts perform a unified job of receiving inputs, processes the information and transforms the information into a new kind.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
1 Towards Ontology based Agricultural Knowledge Services Asanee Kawtrakul At FAO, Italy 21 September 2007.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
DFKI GmbH, , R. Karger Perspectives for the Indo German Scientific and Technological Cooperation in the Field of Language Technology Reinhard.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Presented by: Hassan Sayyadi
Tools of Software Development
Knowledge Based Workflow Building Architecture
CSE 635 Multimedia Information Retrieval
Presentation transcript:

Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi

Korea Terminology Research Center for Language and Knowledge Engineering Academic Society  SIG-Korean Language Computing under Korea Information Science Society  300 members  Korea Information Society  linguistics oriented

Korea Terminology Research Center for Language and Knowledge Engineering KIBS Korea Information Base and Systems  Purpose:  To improve Korean Language Processing Technology  To promote Korean Software Industry in the planning phase (1993), targetted to Hangul Wordprocessor, Machine Translation and Korean Linguistic Research  (Phase 1): “word”  Two ministry joint project + Industry Ministry of Science&Technology, Ministry of Culture  (Phase 2): “sentence”  Only by Ministry of Science&Technology + Industry  will be evaluated in October, 2000  (Phase 3): “discourse” - not decided 

Korea Terminology Research Center for Language and Knowledge Engineering King Sejong Project  Purpose  To promote the Korean Language Research in the linguistics side  To prepare for the language planning  for Unification of South-/North-Korea  for International use of Korean  Sponsor: Ministry of Culture  Period: (10 years)  Items  corpus, dictionary, internationalization, terminology, education, font, old Korean 

Korea Terminology Research Center for Language and Knowledge Engineering KIBS: Architecture MA1 MA2 TA1 TA2 PA1 PA2 WSD1 WSD2 DA1 DA2 RM1 RM2 Ontology Common Knowledge Domain Knowledge Electronic Dictionary Engine Module Level Engine Level Basic DB corpus MRD Knowledge extractor Knowledge Source Level MT engineIR engineSpell checkerStyle checkerUI engine Application Level Word processorMT system Information Retrieval System Automatic Speech Translation End User User(Programmer) User(lexicographyist) User(Dictionary ) Quality Management System -- System Terminology Distributed Resource Management System Master DB Tagging Support Tool Knowledge Level Terminology DB

Korea Terminology Research Center for Language and Knowledge Engineering KIBS: Introduction Title of Project KIBS I : Integrated Korean Information Base KIBS II : On Development of Deep-Level Processing and Quality Management Technology for Very Large Korean Information Base Outline Term : ~ (10 years) Sponsor : Ministry of Science and Technology Staff : 50 person/year

Korea Terminology Research Center for Language and Knowledge Engineering The Goal of First step Standard Module Interface Corpus and Electronic Dictionary Development and Management System Korean Part-of-Speech Tagging System Korean Syntactic Tagging System Korean/English Alignment System Standard Module Interface Corpus and Electronic Dictionary Development and Management System Korean Part-of-Speech Tagging System Korean Syntactic Tagging System Korean/English Alignment System Terminological Data Base Development and Management System Standard Korean Input/Output Environment Standardized Methodology for the Construction of a Balanced Corpus Part-Of-Speech Transfer Dictionary Rules and an Example Package Terminological Data Base Development and Management System Standard Korean Input/Output Environment Standardized Methodology for the Construction of a Balanced Corpus Part-Of-Speech Transfer Dictionary Rules and an Example Package Tree-Tagged Corpus Word-Level Narrative Speech Data Base Hand-written Hangul scripts of high frequency Tree-Tagged Corpus Word-Level Narrative Speech Data Base Hand-written Hangul scripts of high frequency The Standardization & the Specification for Korean Information Base The Development of an Integrated, Environment and Support Management System The Development of an Integrated, Environment and Support Management System The Construction of Korean Information Base

Korea Terminology Research Center for Language and Knowledge Engineering The Goal of Second step Terminology Entries Domain-specific Corpus for Terminology Building Sublanguage Analysis and Extraction of Terminology Terminology Entries Domain-specific Corpus for Terminology Building Sublanguage Analysis and Extraction of Terminology Development/Management System for Information Base Development of Integrated Management System for Distributed Resources Development/Management System for Information Base Development of Integrated Management System for Distributed Resources Syntactic Information Base for Syntactic Analysis/Generation Semantic Information Base for Semantic Analysis/Generation Additional Information on Language and GUI for Developing Applications Syntactic Information Base for Syntactic Analysis/Generation Semantic Information Base for Semantic Analysis/Generation Additional Information on Language and GUI for Developing Applications Quality Management System for Language Information Processing Terminology Dictionary and Development/Management System Terminology Dictionary and Development/Management System Development/Management System of Electronic Dictionary for Sentence Analysis/Generation (100,000 entries) Development/Management System of Electronic Dictionary for Sentence Analysis/Generation (100,000 entries)

Korea Terminology Research Center for Language and Knowledge Engineering Development Tools Korean Concordance Program (KCP) Compound Noun Browser Corpus Browser Corpus Browser by Category Automatic English-to-Korean Transliteration System (TLEK) KAIST Ontology Browser Korean Morphological Analyser Korean Tagger Korean Syntactic Analyser Editing Support Tools to Electronic Dictionary

Korea Terminology Research Center for Language and Knowledge Engineering Results & Distribution  Major Results The first (KIBS I) : ~ present (80 site) Text corpus 10 million word phrases POS tagged corpus 1 million word phrases Syntactic structure tagged corpus 10 thousands sentences TDMS, Speech DB samples, Hand-written character DB samples The second (KIBS II) : ~ present (140 site) Raw corpus 10 million word phrases, POS tagged corpus – 200 thousands word phrases The third (KIBS III) : 2000 (pending) Proper noun 10 thousands entries, Compound noun 20 thousands entries, Verb sentence pattern dictionary 3 thousands entries,... Plan to maintain and distribute...

KORTERM Korea Terminology Center for Language and Knowledge Engineering

Korea Terminology Research Center for Language and Knowledge Engineering Goals of KORTERM  Through World-Wide Terminology Collection and Their Standardization and Harmonization in Local Society  Distribution, Publication and Application in Language and Knowledge Engineering are promoted.  Through Education and Consultation of Terminology R&D Methodology for Each Subject Field,  High-Quality, High-Reliable Terminology and Its Infrastructure and System are achieved. Center of Terminology and Knowledge Engineering

Korea Terminology Research Center for Language and Knowledge Engineering Phases and Subjects of KORTERM Integration of Working Terminology Terminology Collection (Basic S&T, Industry Standard, Economics) Electronic Terminology (Publication) R&D Environment (System Standardization) Terminology Theory and Education Infrastructure Value-Added Terminology Integration Terminology Collection (Extended S&T) Extension & Maintenance (Industry Standards) High-Quality Terminology Application in Language Industry Verification for High-Reliability and Distribution Multi-lingual Terminology Integration Terminology Collection (Humanity and Social Science) Maintenance and Extension Large-Scale Knowledge Base for Terminology Terminology Education Curriculum Development Application Product Development Continuous Extension and Management Terminology Study Promotion Distribution of Terminology Information Base Continuous Terminology Extension and Management Phase 2 ( ) Value-Added Working System Phase 3 ( ) Operation Phase 4 ( ) Maintenance and Extension Phase 1 ( ) R&D Environment and Basic Data Collection

Korea Terminology Research Center for Language and Knowledge Engineering  Basic Data (Corpus)  Corpus for Each Subject Domain  Electronic Dictionary for Basic Vocabulary  Everyday Vocabulary consists of General Vocabulary and Everyday Terminology  Internationalization of Korean Language  South-North Korean Terminology Standardization, Korean language Input Methods  Korean Language Engineering  Standardized Term Use for Information Retrieval, Machine Translation and Document Classification R & D (1)

Korea Terminology Research Center for Language and Knowledge Engineering  Language Engineering  Information Retrieval:  Effective Internet Information Creation and Information/Knowledge Acquisition  Multi-lingualism  Machine Translation:  Efficient Information Generation through Terminology and Vocabulary Collection and Standardization  Wordprocessor:  High Productivity by Spelling Correction, Summarization and Efficient Use. R & D (2)

Korea Terminology Research Center for Language and Knowledge Engineering  Language, Information and Terminology  Language Education:  Technical Thinking and Technical Communication  Terminology-based Education  Language Study:  Domain-specific Language Study R & D (3)

Korea Terminology Research Center for Language and Knowledge Engineering Terminology Sponsors  Support from Government, Organization and Industry according to each specialty  Ministry of Culture and Tourism (KORTERM Center Operation)  Ministry of Science and Technology (R&D Fund)  Ministry of Information and Telecommunication (R&D Fund)  Ministry of Diplomacy and Trade  Ministry of Industry and Resource  Ministry of Education  Korea Science and Technology Foundation (Event Support)

Korea Terminology Research Center for Language and Knowledge Engineering Task Configuration Terminology Base (Collection) Non-standards International Term Standard Terminology Standard Language& Knowledge Product Language Education Environment Terminology Information Environment R&D Environment Application Use TerminologySymbolization Terminology Access Standard Channel Grid Size Controller Application-Specific Dictionary Language Education Adaptable to Student R&D Industry Living Communication Standardization & Harmonization Terminological Conceptual Space

Large-Scale Speech/Language/Image DB Construction and Evaluation Supported by Ministry of Science and Technology Two Year Project ( )

Korea Terminology Research Center for Language and Knowledge Engineering Goals Speech/Language/Image Evaluation Standardization Final Goal Organization Test Suite Working Group Organization Survey and Planning Working Group Organization Survey and Planning Specification Standardization IR Test Suite and Evaluation Model Recommend MT Test Suite and Evaluation Model Recommend IR Test Suite and Evaluation Model Recommend MT Test Suite and Evaluation Model Recommend Image Attribute Format Color-Lexical Entry MPEG7 Specification Image Attribute Format Color-Lexical Entry MPEG7 Specification Language Sentence-unit Speech DB Prosody for Speech Synthesis Sentence-unit Speech DB Prosody for Speech Synthesis Speech Image Language Speech Image IR/QA 90 query/200K doc, MT 5,000 sentences word-unit telephone speech DB: 100 token * 500 Image 300 kinds - Meta Data

Korea Terminology Research Center for Language and Knowledge Engineering Question-Answering IR Test Suites  Test Suites for IR/QA  Documents  207,067 records (370MB)  Newspapers  Query Generation  90 queries (through 300 quiz query analysis)  Queries for WH-question and other various types of answers  for NLP problem solving  relevent document set to include the answer  by using four kinds of commercialized IR systems by 16 kinds of methods

Korea Terminology Research Center for Language and Knowledge Engineering English-Korean MT Test Suites  Type Classification: About 300 Kinds  Test Sentences and Test Query: 5,000 Records  Extracted from Textbook and Grammar books ( )  will be extracted from the Real usage like web, newspapers ( )  Evaluation by Yes/No Question  Tested for 4 Commercialized English-Korean MT Systems

Korea Terminology Research Center for Language and Knowledge Engineering MT Evaluation Workbench

Korea Terminology Research Center for Language and Knowledge Engineering Image Meta Data Editor Meta data Input Workbench by XML

Korea Terminology Research Center for Language and Knowledge Engineering Image Retrieval by Meta data

Korea Terminology Research Center for Language and Knowledge Engineering