Download presentation
Presentation is loading. Please wait.
Published byAlannah Walton Modified over 9 years ago
1
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-Nyoung Seon, Songwook Lee, Jungynu Seo IPM 2007 國立雲林科技大學 National Yunlin University of Science and Technology
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Methodology Experiments Conclusion Comments 2
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation The Word Sense Disambiguation is a common problem in natural language processing. Traditional approaches only consider the co-occurrence probability alone. 3 Sample: I deposit some money in the bank. Options: bank = 銀行? bank = 堤 ; 岸? bank = ( 一 ) 排; ( 一 ) 組
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective To construct a WSD system, which can be easily implemented by learning all polysemous words at once, while covering all polysemous words which are listed in MRD. To consider relation between each sense of context words and the sense of the target word. 4 Sample: I deposit some money In the bank. Ans: bank = 銀行
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Learning step Similarity matrix Word vector Vector representations of sense definitions in MRD Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic weighted digraph 5
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Learning step Similarity matrix Word vector Vector representations of sense definitions in MRD 6
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Learning step Similarity matrix Word vector Vector representations of sense definitions in MRD. 7
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Learning step Similarity matrix Word vector Vector representations of sense definitions in MRD 8
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic weighted digraph 9
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic weighted digraph 10
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic weighted digraph 11
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic weighted digraph 12
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments System results 13
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Experiment on English The accuracy of the system is 30.7% on average. The result is very low; there are some reasons as follows. Context words are not appropriate although context words are very important in that they decide which sense of the target word might be the best. Mapping English senses to Korean for using English-Korean dictionary leads to some loss of information. The errors of the stemming process disturbed us to search the right root of the verb in the MRD. 14
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusion 15 To consider the relationship between each sense of context words and the sense of the target word By using Viterbi algorithm to reduce computational complexity. The system showed bad results on English (30.7), but it resulted in suitable performances, 76.4% by accuracy, over the semantically ambiguous Korean words. To apply this method to other languages by studying language characteristics.
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments 16 Advantage To consider the relationship between each sense of context words and the sense of the target word. By using Viterbi algorithm to reduce computational complexity. Drawback The performance of this system is better in Korean. Application Word Sense Disambiguation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.