Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study of Learning a Merge Model for Multilingual Information Retrieval Presenter : Cheng-Hui Chen Author : Ming-Feng Tsai, Yu-Ting Wang, Hsin-Hsi Chen SIGIR 2008
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Multilingual information retrieval (MLIR) that result list usually includes more irrelevant words. Traditional merging methods for MLIR that assumption relevant documents are homogeneously distributed over monolingual result lists.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives The various translation and retrieval qualities in different collections that to merge a unique result list. To proposes merge method doesn’t assumption relevant documents are homogeneously distributed over monolingual result lists. The enhancement merge model quality. 4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Traditional MLIR Framework. ─ Raw-score ─ Round-robin ─ Normalized-by-top1 ─ Normalized-by-top k The Proposes a learning method. ─ FRank 5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. MLIR merge process 6 Feature Set 1.Query levels 2.Document levels 3.Translation levels The Construction of a Merge Model 1.FRank ranking algorithm 2.BM25
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Feature set Query levels ─ The manually classify the terms within a query into several pre-defined categories. Location/country names (Loc) Organization names (Org) Event names (EN) Technical terms (TT) Document levels ─ The extracted document length (Dlength) and title length (Tlength). 7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Feature set Translation levels ─ The size of a bilingual dictionary used for various language (i.e., DictSize). ─ The average number of translation equivalents within a query (i.e., AvgTAD). If a query has two query terms both with three translation equivalents. AvgTAD of the query is (3 + 3)/2 = 3. 8 AvgTADDicSize (4+2)/2=23 中文 (Translation QT) (Order) (Park) 訂單 公園 順序 停車 命令 隊形 中文翻譯數目 查詢詞的數目 EN Loc EN 斗六 食べる Order 、 Park 英 -> 中 Loc
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The Construction of Merge model The FRank’s generalized additive model, a merge model can be represented as : ─ m t (x) is a weak learner ─ α t is the learned weight ─ t is the number of selected weak learners The combine with a retreval model (bm25) by using linear combination. 9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Data set ─ The Details of Experimental Collections ─ The Percentage of Retrieved Documents 10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Mean Average Precision (MAP) 11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments The Experimental Results of Our Method using Different Combination Coefficient λ. 12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Feature Analysis 13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions The proposed merge model can significantly improve merging quality. The merge model indicates the key factors are the number of translatable terms and compound words. 14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions The future work ─ Use other learning-based ranking algorithms. Such as RankSVM and RankNet. ─ Extract more representative features to construct a merge model. Such as linguistic features. ─ Expect to discover more relations within query terms. Such as query term association and substitution. 15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Comments Advantage ─ Improve merging quality. Drawback Application ─ Multilingual Information retrieval.