UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.

Slides:



Advertisements
Similar presentations
From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F Grenoble cedex 09
Advertisements

Knowledge indicators: measuring information societies in Asia-Pacific International Telecommunication Society Asia-Australian Regional Conference Perth,
Coding Transit Itineraries Frank Milthorpe Transport and Population Data Centre.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Chapter 3: Modules, Hierarchy Charts, and Documentation
The Universal Networking Language UNL Foundation United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Copyright ©2011 Pearson Education 1-1 Statistics for Managers using Microsoft Excel 6 th Global Edition Chapter 1 Introduction.
Programming Logic and Design Fourth Edition, Introductory
1.10 Report Findings to Communicate Research Information to Others
Your Interactive Guide to the Digital World Discovering Computers 2012.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
 Asian WordNet: Development and Service in Collaborative Approach Virach Sornlertlamvanich Thai Computational Linguistics Laboratory (TCL), NICT, and.
Video Summarization Using Mutual Reinforcement Principle and Shot Arrangement Patterns Lu Shi Oct. 4, 2004.
About the use of UNL in key words, key images and key concepts transcultural analysis.
Chapter 6 Methodology Conceptual Databases Design Transparencies © Pearson Education Limited 1995, 2005.
1 CS1001 Lecture Overview Java Programming Java Programming Midterm Review Midterm Review.
1 Towards a better transcultural understanding through distance education. Towards a better transcultural understanding through.
Living in a Digital World Discovering Computers 2010.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Universal Networking Language
1 Distance education : What could technology offer ? Gérard CHOLLET ENST/CNRS-LTCI 46 rue Barrault PARIS cedex 13
CP Multimedia Internet Communication1 CP Multimedia Computer Communication Lecture 1 - Communication.
REFLECTIONS ON NOTECARDS: SEVEN ISSUES FOR THE NEXT GENERATION OF HYPERMEDIA FRANK G. HALASZ.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
Overview of Search Engines
ANSWERING CONTROLLED NATURAL LANGUAGE QUERIES USING ANSWER SET PROGRAMMING Syeed Ibn Faiz.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Universal Networking Language (UNL) by Pantha Kanti Nath (05IT6021) Under the Guidance of Prof. Debasis Samanta School of Information Technology Indian.
Artificial Intelligence for Universal Networking Language (UNL) (Perspective Bengali Language) By Deen Islam Muslim ID: Ariful Hoque Tuhin ID:
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
1 e-Learning Movement in Thailand and Proposal for Multi-lingual e-Learning Development Virach Sornlertlamvanich 1 Pornchai Tummarattananont 2 1 Thai Computational.
NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.
Querying Structured Text in an XML Database By Xuemei Luo.
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
System Analysis of Virtual Team Collaboration Management System based on Cloud Technology Panita Wannapiroon, Ph.D. Assistant Professor Division of Information.
WordStat & Yoshikoder T.M. & M.S.. WordStat About WordStat Must be run as part of SimStat Designed to process text such as open ended responses, journal.
Towards an Intelligent Multilingual Keyboard System Tanapong Potipiti, Virach Sornlertlamvanich, Kanokwut Thanadkran Information Research and Development.
Internationalized Domain Names Dr. Cary Karp MUSENIC Project Manager Second MUSENIC Project Workshop Stockholm, March 2004 MUSENIC – The Museum Network.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
Presented by Madhuriya Kumar Dutta Trade and Investment Facilitation Department Mekong Institute, Thailand 16 May 2012.
Dimitrios Skoutas Alkis Simitsis
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Web 2.0-based collaborative annotation system for enhancing knowledge sharing in collaborative learning.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
CALL for University Lecturers KMUTT January 2004.
A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Semantic Relations; UNL; Towards Dependency Parsing.
ICUKL-2002, Nov, Goa, India Universal Word and UNL Knowledge Base Meiying Zhu Hiroshi Uchida UNL Center UNDL Foundation.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.
The study on the impact of the promulgation of English language as Thai’s second language Virach Sornlertlamvanich Director Information Research and Development.
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
Building Resources for Teaching Computer Architecture Through Peer Review Edward F. Gehringer Dept. of Electrical & Computer Engineering Dept. of Computer.
Information Sharing on the Social Semantic Web Aman Shakya* and Hideaki Takeda National Institute of Informatics, Tokyo, Japan The Second NEA-JC Workshop.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
The UNL Program A program created by the United Nations University / Institute of Advanced Studies Now carried out by the UNDL Foundation
Standardization of Lexicon
Tomás Murillo-Morales and Klaus Miesenberger
How to Summarize.
Presented by Nick Janus
Presentation transcript:

UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National Electronics and Computer Technology Center (NECTEC), THAILAND The First International Workshop on MultiMedia Annotation January 2001, Tokyo, Japan

Overview UNL project UNL specification UNL document summarization - Sentence score - N-best sentences - Removal of redundant words - Merging sentences Conclusion

UNL project Initiated by the United Nations University in 1996 Collaboration of research institutions from 16 countries International semantic annotation standard for multilingual communication Interlingua-based data archiving

UNL and existing MT Existing interlingual MT UNL Source language Interlingua Target language analysis generation Errors in analysis are propagated into the generation process. No errors in analysis is propagated into the generation process. User UNL document preparing generation Target language

UNL specification Interlingual representation -Nodes : UWs (interlingual acceptations) -Links : UNL semantic relations such as agt, obj, pur... obj pur qua book room(icl>space) 2 The UNL graph representing ‘The bachelor books a room for 2 persons.’ Headword Restriction Attribute Relation agt UW

UW specification UW format: ( ) e.g. book(icl>do, obj>room) Headword: an English word roughly describes the UW sense. Restrictions: - Inclusion (icl), field(fld) e.g. car(icl>movable thing) - Relations UW Class Hierarchy

Multilinguality UNL is an interlingua for multilingual application. Unambiguity UNL is designed to contain no semantic ambiguity. Semantic information Employing UNL, semantic information are employed for high quality summarization. UNL document summarization

Four steps in UNL summarization 1: Calculating sentence scores 2: Selecting n-best sentences 3: Removing redundant words 4: Merging sentences Summary UNL document

1: Calculating sentence scores A sentence score is calculated as follows: where Ssentence scoring function sconsidered sentence W weighting function uw i universal word Tfterm frequency Idfinverted document frequency

2: Selecting n-best sentences Five sentences with the highest scores are selected from the original 100-sentences (2,000 words) text. UNL represents the means to facilitate multilingual communication on the information network. The language exists only on the information network. UNL is a global-scale common language, being transparent to all languages. Information encoded in UNL is converted to an equivalent counterpart written in the target language, through a language generator "deconvertor" prepared for each language. Complying with the same technical standard, these computer networks comprise the Internet.

3: Removing redundant words-1 Insignificant modifier words are removed. Modifier relations are man, mod, ben and such. Where,Con contribution function l()considered UNL relation W uw weighting function uw 1 head uw uw 2 dependent uw The links are removed if the contribution score is less than a threshold (1.5 in the experiment).

3: Removing redundant words-2 Threshold of the contribution score is 1.5 “UNL represents the means to facilitate multilingual communication on the information network.” = 4.27 Con(mod(communication, = 1.81 Con(mod(communication, = 1.78 information)) = 0.47 removed

3: Removing redundant words-3 Removed words information only, information common, all Through a language generator deconvertor" prepared for each language same, computer Sentences UNL represents the means to facilitate multilingual communication on the information network. The language exists only on the information network. UNL is a global-scale common language, being transparent to all languages. Information encoded in UNL is converted to an equivalent counterpart written in the target language, through a language generator "deconvertor" prepared for each language. Complying to the same technical standard, these computer networks comprise the Internet.

4: Merging sentences-1 The UNL sentences sharing the same UW are possibly merged to produce a more complex sentence

4: Merging sentences-2 The first sentence generated in English The language exists on the network. The second sentence generated in English The UNL language is a global-scale language. The merged sentence generated in English The UNL language is a global-scale language existing on the network.

Text vs. UNL summarization Plain text summarization UNL represents the means to facilitate multilingual communication on the information network. The language exists only on the information network. UNL is a global-scale common language, being transparent to all languages. Information encoded in UNL is converted to an equivalent counterpart written in the target language, through a language generator "deconvertor" prepared for each language. Complying with the same technical standards, these computer networks comprise the Internet. 5 sentences, 67 words. UNL document summarization UNL represents the means to facilitate multilingual communication on the network. UNL is a global- scale language, being transparent to languages, existing on the network. Information encoded in UNL is converted to counterpart written in the target language. These networks comprise the Internet, complying with the technical standard. 4 sentences, 47 words.

Conclusion The process of summarization by UNL has been presented UNL provides many advantages in summarization Our experiment shows that UNL can improve the quality of summarization Applicable to any semantic representation Further research Considering UW class hierarchy as well as the attributes and relations