Download presentation
Presentation is loading. Please wait.
Published byStephany Cameron Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web Counts vs. Web Mining Presenter : You Lin Chen Authors : Hikaridai,Seika-cho, Soraku-gun, Kyoto 2007.WI.7
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experiments Conclusion Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Web counts hit counts approximate Web frequency. Some Web search engines disregard punctuation and capitalization when matching a search term. Second, it is not easy to consider the contexts of transliteration hypotheses with Web counts. 3
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives To address these problems, we propose a novel method for validating transliteration hypotheses based on Web mining. 4
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 5 Ranking transliteration hypotheses machine transliteration system transliteration hypotheses Clinton 克林頓 Query Clinton 、 克林頓 Data Set Generate Web Pages contextual Information as feature trained SVM English terms Extract Ranking transliteration hypotheses trained MEM
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology freq(tci,Wl): the number of occurrences of tci in W l ex : freq(tci,W1)=6 freq d (SW, tci, W l, d): Co- occurrence of SW and tci within distance d ex : freq d (SW, tc,W,d=10)=5 freq p (SW, tci,Wl,d): Co- occurrence of SW and tc as parenthetical expressions within distance d ex : freq p (SW, tci,W1,d=10)=5 6
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology x i ∈ X be a feature vector of tc i ∈ TC g SVM (x )= w · x i + b, where x cor is a positive sample and the others are negative samples 7
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology g MEM (xi)= Pr(tc cor |xi) The maximum entropy model (MEM) is a widely used probability model that can in- corporate heterogeneous information e ff ectively. an event (ev) is usually composed of a target event (te) and a history event (he); say ev =. 8
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Experiments
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Conclusion Experiments showed that our Web mining-based transliteration validation method was consistently better than systems based on Web counts
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Comments Advantage … Drawback … Application …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.