Download presentation
Presentation is loading. Please wait.
Published byShon Chandler Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Qing Ma, Kyoko Kanzaki, Yujie Zhang, Masaki Murata, Hitoshi Isahara Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel Neural Networks 17 (2004) 1241–1253
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Self-organizing monolingual semantic maps Experimental Results Conclusions Personal Opinion
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation A number of corpus-based statistical approaches have been used to compute word similarity It is difficult to recognize the relationships between groups or the relationships between words within groups
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective We need a technique that can map words from a very large lexicon into a small semantic space A visible representation where words with similar meanings are placed at the same or neighboring points so that the distance between the points represents the semantic similarity in the words Semantic maps can be automatically constructed with self-organization
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Presents a method of self-organizing monolingual semantic maps for Chinese and Japanese using SOM for specific purpose To construct semantic maps of nouns from the point of view of the adnominal constituents Extended to the construction of Japanese–Chinese bilingual semantic maps
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Self-organizing monolingual semantic maps Data coding Baseline method Frequency term-weighting method TFIDF term-weighting method d ij is the word similarity 。。。。。。
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Data coding Word w i can be defined by a set of its co-occurring words as V(w i ) is the input to the SOM only reflects the relationships between a pair of words
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Data coding method Baseline method ─ d ij : word similarity ─ a i & a j : are the numbers of co-occurring words of w i and w j ─ c ij : is the number of co-occurring words that both w i and w j have in common Frequency term-weighting method TFIDF term-weighting method
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Table 1 Comparative results for various coding methods and clustering
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation methods Numerical evaluation ─ precision ─ recall ─ F-measure Intuitive evaluation ─ our ‘common sense’ Comparison with other methods ─ multivariate statistical analyses
11
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Experimental Results
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M.
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. TFIDF comparison with PCA Fig. 2. Chinese semantic map using principal component analysis Fig. 1. Chinese semantic map based on TFIDF term-weighted coding
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Table 2 Clustering results with TFIDF term-weighted coding The underlined words are those classified into incorrect areas.
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M.
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Semantic map comparison with PCA Fig. 3. Japanese semantic map based on the TFIDF term-weighted coding method Fig. 4. Japanese semantic map using principal component analysis
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Self-organizing bilingual semantic maps(1/2) When a translation pair of sentences like Each Japanese word can therefore be automatically aligned to a Chinese word from this map by measuring its distance If the Chinese word keyi (can) is closest to the Japanese word seta (can), then the Japanese word seta (can) is regarded as being aligned to the Chinese word keyi (can) (Japanese) keiei toppu ga tei seichou jidai teichaku wo jikkan shite iru koto wo ukagawa seta. (Chinese) youci keyi kanchu, zuigao jingyingzhe shengan jingji ren tingliu zai dishu zengzhang shidai. (English) We can see that upper management has realized that the economy is fixed in an eras of slow growth.
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Self-organizing bilingual semantic maps(2/2) A small-scale (10 translation pairs) experimental comparison with the baseline method Comparison with hierarchical clustering and multivariate statistical analysis
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Data coding(1/2) (Japanese) keiei toppu ga tei seichou jidai teichaku wo jikkan shite iru koto wo ukagawa seta. (Chinese) youci keyi kanchu, zuigao jingyingzhe shengan jingji ren tingliu zai dishu zengzhang shidai. (English) We can see that upper management has realized that the economy is fixed in an eras of slow growth. Ji (i=1,.,m) are Japanese words forming the Japanese sentence Ci (i=1,.,n) are Chinese words forming the translated Chinese sentence
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Data coding(2/2) is a co-occurring word of Ji is the normalized co-occurrence frequency is a co-occurring word of either or severals of J j1 ;.; J j;ni is the normalized co-occurrence frequency
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Semantic map comparison with PCA
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Semantic map comparison with Baseline Table 3 Word alignment result obtained from semantic map Table 4 Baseline word alignment results
23
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions and Future Work Proposed a method of self-organizing monolingual semantic maps for Japanese and Chinese Experimental results proved that these maps were generally consistent with our intuition Comparison demonstrated that the hierarchical clustering technique is inferior to SOM in terms of classifying ability Furthermore, multivariate statistical analysis such as principal component analysis and factor analysis gave worse results
24
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions and Future Work An extension to the automatic construction of bilingual semantic maps of Japanese and Chinese Develop an automatic method of transforming both Japanese and Chinese words
25
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Opinion …..
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.