Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi
Korea Terminology Research Center for Language and Knowledge Engineering Academic Society SIG-Korean Language Computing under Korea Information Science Society 300 members Korea Information Society linguistics oriented
Korea Terminology Research Center for Language and Knowledge Engineering KIBS Korea Information Base and Systems Purpose: To improve Korean Language Processing Technology To promote Korean Software Industry in the planning phase (1993), targetted to Hangul Wordprocessor, Machine Translation and Korean Linguistic Research (Phase 1): “word” Two ministry joint project + Industry Ministry of Science&Technology, Ministry of Culture (Phase 2): “sentence” Only by Ministry of Science&Technology + Industry will be evaluated in October, 2000 (Phase 3): “discourse” - not decided
Korea Terminology Research Center for Language and Knowledge Engineering King Sejong Project Purpose To promote the Korean Language Research in the linguistics side To prepare for the language planning for Unification of South-/North-Korea for International use of Korean Sponsor: Ministry of Culture Period: (10 years) Items corpus, dictionary, internationalization, terminology, education, font, old Korean
Korea Terminology Research Center for Language and Knowledge Engineering KIBS: Architecture MA1 MA2 TA1 TA2 PA1 PA2 WSD1 WSD2 DA1 DA2 RM1 RM2 Ontology Common Knowledge Domain Knowledge Electronic Dictionary Engine Module Level Engine Level Basic DB corpus MRD Knowledge extractor Knowledge Source Level MT engineIR engineSpell checkerStyle checkerUI engine Application Level Word processorMT system Information Retrieval System Automatic Speech Translation End User User(Programmer) User(lexicographyist) User(Dictionary ) Quality Management System -- System Terminology Distributed Resource Management System Master DB Tagging Support Tool Knowledge Level Terminology DB
Korea Terminology Research Center for Language and Knowledge Engineering KIBS: Introduction Title of Project KIBS I : Integrated Korean Information Base KIBS II : On Development of Deep-Level Processing and Quality Management Technology for Very Large Korean Information Base Outline Term : ~ (10 years) Sponsor : Ministry of Science and Technology Staff : 50 person/year
Korea Terminology Research Center for Language and Knowledge Engineering The Goal of First step Standard Module Interface Corpus and Electronic Dictionary Development and Management System Korean Part-of-Speech Tagging System Korean Syntactic Tagging System Korean/English Alignment System Standard Module Interface Corpus and Electronic Dictionary Development and Management System Korean Part-of-Speech Tagging System Korean Syntactic Tagging System Korean/English Alignment System Terminological Data Base Development and Management System Standard Korean Input/Output Environment Standardized Methodology for the Construction of a Balanced Corpus Part-Of-Speech Transfer Dictionary Rules and an Example Package Terminological Data Base Development and Management System Standard Korean Input/Output Environment Standardized Methodology for the Construction of a Balanced Corpus Part-Of-Speech Transfer Dictionary Rules and an Example Package Tree-Tagged Corpus Word-Level Narrative Speech Data Base Hand-written Hangul scripts of high frequency Tree-Tagged Corpus Word-Level Narrative Speech Data Base Hand-written Hangul scripts of high frequency The Standardization & the Specification for Korean Information Base The Development of an Integrated, Environment and Support Management System The Development of an Integrated, Environment and Support Management System The Construction of Korean Information Base
Korea Terminology Research Center for Language and Knowledge Engineering The Goal of Second step Terminology Entries Domain-specific Corpus for Terminology Building Sublanguage Analysis and Extraction of Terminology Terminology Entries Domain-specific Corpus for Terminology Building Sublanguage Analysis and Extraction of Terminology Development/Management System for Information Base Development of Integrated Management System for Distributed Resources Development/Management System for Information Base Development of Integrated Management System for Distributed Resources Syntactic Information Base for Syntactic Analysis/Generation Semantic Information Base for Semantic Analysis/Generation Additional Information on Language and GUI for Developing Applications Syntactic Information Base for Syntactic Analysis/Generation Semantic Information Base for Semantic Analysis/Generation Additional Information on Language and GUI for Developing Applications Quality Management System for Language Information Processing Terminology Dictionary and Development/Management System Terminology Dictionary and Development/Management System Development/Management System of Electronic Dictionary for Sentence Analysis/Generation (100,000 entries) Development/Management System of Electronic Dictionary for Sentence Analysis/Generation (100,000 entries)
Korea Terminology Research Center for Language and Knowledge Engineering Development Tools Korean Concordance Program (KCP) Compound Noun Browser Corpus Browser Corpus Browser by Category Automatic English-to-Korean Transliteration System (TLEK) KAIST Ontology Browser Korean Morphological Analyser Korean Tagger Korean Syntactic Analyser Editing Support Tools to Electronic Dictionary
Korea Terminology Research Center for Language and Knowledge Engineering Results & Distribution Major Results The first (KIBS I) : ~ present (80 site) Text corpus 10 million word phrases POS tagged corpus 1 million word phrases Syntactic structure tagged corpus 10 thousands sentences TDMS, Speech DB samples, Hand-written character DB samples The second (KIBS II) : ~ present (140 site) Raw corpus 10 million word phrases, POS tagged corpus – 200 thousands word phrases The third (KIBS III) : 2000 (pending) Proper noun 10 thousands entries, Compound noun 20 thousands entries, Verb sentence pattern dictionary 3 thousands entries,... Plan to maintain and distribute...
KORTERM Korea Terminology Center for Language and Knowledge Engineering
Korea Terminology Research Center for Language and Knowledge Engineering Goals of KORTERM Through World-Wide Terminology Collection and Their Standardization and Harmonization in Local Society Distribution, Publication and Application in Language and Knowledge Engineering are promoted. Through Education and Consultation of Terminology R&D Methodology for Each Subject Field, High-Quality, High-Reliable Terminology and Its Infrastructure and System are achieved. Center of Terminology and Knowledge Engineering
Korea Terminology Research Center for Language and Knowledge Engineering Phases and Subjects of KORTERM Integration of Working Terminology Terminology Collection (Basic S&T, Industry Standard, Economics) Electronic Terminology (Publication) R&D Environment (System Standardization) Terminology Theory and Education Infrastructure Value-Added Terminology Integration Terminology Collection (Extended S&T) Extension & Maintenance (Industry Standards) High-Quality Terminology Application in Language Industry Verification for High-Reliability and Distribution Multi-lingual Terminology Integration Terminology Collection (Humanity and Social Science) Maintenance and Extension Large-Scale Knowledge Base for Terminology Terminology Education Curriculum Development Application Product Development Continuous Extension and Management Terminology Study Promotion Distribution of Terminology Information Base Continuous Terminology Extension and Management Phase 2 ( ) Value-Added Working System Phase 3 ( ) Operation Phase 4 ( ) Maintenance and Extension Phase 1 ( ) R&D Environment and Basic Data Collection
Korea Terminology Research Center for Language and Knowledge Engineering Basic Data (Corpus) Corpus for Each Subject Domain Electronic Dictionary for Basic Vocabulary Everyday Vocabulary consists of General Vocabulary and Everyday Terminology Internationalization of Korean Language South-North Korean Terminology Standardization, Korean language Input Methods Korean Language Engineering Standardized Term Use for Information Retrieval, Machine Translation and Document Classification R & D (1)
Korea Terminology Research Center for Language and Knowledge Engineering Language Engineering Information Retrieval: Effective Internet Information Creation and Information/Knowledge Acquisition Multi-lingualism Machine Translation: Efficient Information Generation through Terminology and Vocabulary Collection and Standardization Wordprocessor: High Productivity by Spelling Correction, Summarization and Efficient Use. R & D (2)
Korea Terminology Research Center for Language and Knowledge Engineering Language, Information and Terminology Language Education: Technical Thinking and Technical Communication Terminology-based Education Language Study: Domain-specific Language Study R & D (3)
Korea Terminology Research Center for Language and Knowledge Engineering Terminology Sponsors Support from Government, Organization and Industry according to each specialty Ministry of Culture and Tourism (KORTERM Center Operation) Ministry of Science and Technology (R&D Fund) Ministry of Information and Telecommunication (R&D Fund) Ministry of Diplomacy and Trade Ministry of Industry and Resource Ministry of Education Korea Science and Technology Foundation (Event Support)
Korea Terminology Research Center for Language and Knowledge Engineering Task Configuration Terminology Base (Collection) Non-standards International Term Standard Terminology Standard Language& Knowledge Product Language Education Environment Terminology Information Environment R&D Environment Application Use TerminologySymbolization Terminology Access Standard Channel Grid Size Controller Application-Specific Dictionary Language Education Adaptable to Student R&D Industry Living Communication Standardization & Harmonization Terminological Conceptual Space
Large-Scale Speech/Language/Image DB Construction and Evaluation Supported by Ministry of Science and Technology Two Year Project ( )
Korea Terminology Research Center for Language and Knowledge Engineering Goals Speech/Language/Image Evaluation Standardization Final Goal Organization Test Suite Working Group Organization Survey and Planning Working Group Organization Survey and Planning Specification Standardization IR Test Suite and Evaluation Model Recommend MT Test Suite and Evaluation Model Recommend IR Test Suite and Evaluation Model Recommend MT Test Suite and Evaluation Model Recommend Image Attribute Format Color-Lexical Entry MPEG7 Specification Image Attribute Format Color-Lexical Entry MPEG7 Specification Language Sentence-unit Speech DB Prosody for Speech Synthesis Sentence-unit Speech DB Prosody for Speech Synthesis Speech Image Language Speech Image IR/QA 90 query/200K doc, MT 5,000 sentences word-unit telephone speech DB: 100 token * 500 Image 300 kinds - Meta Data
Korea Terminology Research Center for Language and Knowledge Engineering Question-Answering IR Test Suites Test Suites for IR/QA Documents 207,067 records (370MB) Newspapers Query Generation 90 queries (through 300 quiz query analysis) Queries for WH-question and other various types of answers for NLP problem solving relevent document set to include the answer by using four kinds of commercialized IR systems by 16 kinds of methods
Korea Terminology Research Center for Language and Knowledge Engineering English-Korean MT Test Suites Type Classification: About 300 Kinds Test Sentences and Test Query: 5,000 Records Extracted from Textbook and Grammar books ( ) will be extracted from the Real usage like web, newspapers ( ) Evaluation by Yes/No Question Tested for 4 Commercialized English-Korean MT Systems
Korea Terminology Research Center for Language and Knowledge Engineering MT Evaluation Workbench
Korea Terminology Research Center for Language and Knowledge Engineering Image Meta Data Editor Meta data Input Workbench by XML
Korea Terminology Research Center for Language and Knowledge Engineering Image Retrieval by Meta data
Korea Terminology Research Center for Language and Knowledge Engineering