Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.

Slides:



Advertisements
Similar presentations
1 Statistical Machine Translation Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 8 October 27, 2004.
Advertisements

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Language Models Hongning Wang
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Information Retrieval Models: Probabilistic Models
Mixture Language Models and EM Algorithm
Chapter 7 Retrieval Models.
Using TF-IDF to Determine Word Relevance in Document Queries
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Scalable Text Mining with Sparse Generative Models
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Language Modeling Approaches for Information Retrieval Rong Jin.
Natural Language Processing Expectation Maximization.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing.
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Statistical Machine Translation Models for Personalized Search Rohini U AOL India R&D, Bangalore India Vamshi Ambati Language.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
A Study of Poisson Query Generation Model for Information Retrieval
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.
Introduction to Machine Translation
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Introduction to Machine Translation
Statistical Language Models
Reading Notes Wang Ning Lab of Database and Information Systems
Chinese Academy of Sciences, Beijing, China
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
Information Retrieval Models: Probabilistic Models
Language Models for Information Retrieval
John Lafferty, Chengxiang Zhai School of Computer Science
Introduction to Machine Translation
Language Model Approach to IR
Topic Models in Text Processing
Language Models Hongning Wang
CS590I: Information Retrieval
Language Models for TR Rong Jin
Presentation transcript:

Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1

2 Outline Motivation & Background –Language model (LM) for IR –Smoothing methods for IR Statistical Machine Translation – Cross-Lingual –Motivation –IBM Model 1 Statistical Translation Language Model – Monolingual –Synthetic Queries –Mutual Information-based approach –Regularization of self-translation probabilities Smoothing in Statistical Translation Language Model

The Basic LM Approach ([Ponte & Croft 98], [Hiemstra & Kraaij 98], [Miller et al. 99]) Document Text mining paper Food nutrition paper Language Model … text ? mining ? assocation ? clustering ? … food ? … food ? nutrition ? healthy ? diet ? … Query = “data mining algorithms” ? Which model would most likely have generated this query?

Ranking Docs by Query Likelihood d1d1 d2d2 dNdN q d1d1 d2d2 dNdN Doc LM p(q|  d 1 ) p(q|  d 2 ) p(q|  d N ) Query likelihood

Retrieval as LM Estimation Document ranking based on query likelihood Retrieval problem  Estimation of p(w i |d) Smoothing is an important issue, and distinguishes different approaches Document language model

6 How to Estimate p(w|d)? Simplest solution: Maximum Likelihood Estimator –P(w|d) = relative frequency of word w in d –What if a word doesn’t appear in the text? P(w|d)=0 In general, what probability should we give a word that has not been observed? If we want to assign non-zero probabilities to such words, we’ll have to discount the probabilities of observed words This is what “smoothing” is about …

Language Model Smoothing P(w) w Max. Likelihood Estimate Smoothed LM

Smoothing Methods for IR Method 1(Linear interpolation, Jelinek- Mercer): Method 2 (Dirichlet Prior/Bayesian): parameterML estimate parameter (Zhai & Lafferty 01)

9 Outline Motivation & Background –Language model (LM) for IR –Smoothing methods for IR Statistical Machine Translation – Cross-Lingual –Motivation –IBM Model 1 Statistical Translation Language Model – Monolingual –Synthetic Queries –Mutual Information-based approach –Regularization of self-translation probabilities Smoothing in Statistical Translation Language Model

10 A Brief History Machine translation was one of the first applications envisioned for computers Warren Weaver (1949): “I have a text in front of me which is written in Russian but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need to do is strip off the code in order to retrieve the information contained in the text.” First demonstrated by IBM in 1954 with a basic word-for-word translation system

11 Interest in Machine Translation Commercial interest: –U.S. has invested in MT for intelligence purposes –MT is popular on the web—it is the most used of Google’s special features –EU spends more than $1 billion on translation costs each year. –(Semi-)automated translation could lead to huge savings

12 Interest in Machine Translation Academic interest: –One of the most challenging problems in NLP research –Requires knowledge from many NLP sub-areas, e.g., lexical semantics, parsing, morphological analysis, statistical modeling,… –Being able to establish links between two languages allows for transferring resources from one language to another

13 Word-Level Alignments Given a parallel sentence pair we can link (align) words or phrases that are translations of each other:

Machine Translation -- Concepts We are trying to model P(e|f) –I give you a French sentence –You give me back English How are we going to model this? –The maximum likelihood estimation of P(e | f) is: freq(e,f)/freq(f). –Way too specific to get any reasonable frequencies! Vast majority of unseen data will have zero counts!

Machine Translation – Alternative way We could use Bayes rule Why using Bayes rule and not directly estimating p(e|f) ? It is important that our model for p(e|f) concentrates its probability as much as possible on well-formed English sentences. But it is not important that our model for P(f|e) concentrate its probability on well-formed French sentences. Given a French sentence f, we could do a search for an e that maximizes p(e|f).

16 Statistical Machine Translation The noisy channel model –Assumptions: An English word can be aligned with multiple French words while each French word is aligned with at most one English word Independence of the individual word-to-word translations Language ModelTranslation ModelDecoder e: English f: French |e|=l |f|=m

17 Estimation of Probabilities -- IBM Model 1 Simplest of the IBM models. (There are 5 models) Does not consider word order (bag-of- words approach) Does not model one-to-many alignments Computationally inexpensive Useful for parameter estimations that are passed on to more elaborate models

18 IBM Model 1 Three important components involved –Language model Give the probability p(e). –Translation model Estimate the Translation Probability p(f|e). –Decoder

19 IBM Model 1- Translation Model

20 IBM Model 1- Translation Model

21 IBM Model 1 – Translation Model all possible alignments (the English word that a French word f j is aligned with) translation probability EM algorithm is used to estimate the translation probabilities.

22 Outline Motivation & Background –Language model (LM) for IR –Smoothing methods for IR Statistical Machine Translation – Cross-Lingual –Motivation –IBM Model 1 Statistical Translation Language Model – Monolingual –Synthetic Queries –Mutual Information-based approach –Regularization of self-translation probabilities Smoothing in Statistical Translation Language Model

The Problem of Vocabulary Gap Query = auto wash auto wash … car wash vehicle d1 auto buy … auto d2 d3 P(“auto”)P(“wash”) P(“auto”)P(“wash”) How to support inexact matching? {“car”, “vehicle”}  ==  “auto” “buy”  ====  “wash” 23

Translation Language Models for IR [Berger & Lafferty 99] Query = auto wash auto wash … car wash vehicle d1 auto buy auto d2 d3 “auto” “car” “translate” “auto” Query = car wash P(“auto”)P(“wash”) “car” “auto” P(“car”|d3) P t (“auto”| “car”) “vehicle” P(“vehicle”|d3) P t (“auto”| “vehicle”) P(“auto” |d3)= p(“car”|d3) x p t (“auto”| “car”) + p(“vehicle”|d3) x p t (“auto”| “vehicle”) How to estimate? 24

When relevance judgments are available, (q,d) serves as data to train the translation model Without relevance judgments, we can use synthetic data [Berger & Lafferty 99 ], [ Jin et al. 02 ] Basic translation model Translation model Regular doc LM Estimation of Translation Model: p t (w|u)

Estimation of Translation Model – Synthetic Queries ([Berger & Lafferty 99])

Estimation of Translation Model – Synthetic Queries Algorithm ([Berger & Lafferty 99]) Training data Limitations: 1.Can’t translate into words not seen in the training queries 2.Computational complexity

A simpler and more efficient method for estimating p t (w|u) with higher coverage was proposed in: M. Karimzadehgan and C. Zhai. Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval. ACM SIGIR, pages ,

Estimation of Translation Model Based on Mutual Information 1. Calculate Mutual information for each pair of two words in the collection (measuring co-occurrences) 2. Normalize mutual information score to obtain a translation probability: 29 presence/absence of word w in a document

Computation Detail Xw=1 Xu=1 N 30 Xw Xu D1 0 0 D2 1 1 D3 1 0 …. … D N 0 0 Exploit index to speed up computation

Sample Translation Probabilities (AP90) qp(q|w) everest0.079 climber0.042 climb mountain mount0.033 reach expedit summit whittak0.016 peak p(w| “everest”) qp(q|w) everest climber mount expedit peak himalaya nepal0.015 sherpa hillari Mutual Information 31 Synthetic Query

Regularizing Self-Translation Probability Self-translation probability can be under- estimated  An exact match would be counted less than an exact match Solution: Interpolation with “1.0 self-translation” w = u  = 1  basic query likelihood model  = 0  original MI estimate 32

Query Likelihood and Translation Language Model Document ranking based on query likelihood Document language model Translation Language Model Do you see any problem?

Further Smoothing of Translation Model for Computing Query Likelihood Linear interpolation (Jelinek-Mercer): Bayesian interpolation (Dirichlet prior): p ml (w|d) 34 p ml (w|d)

Experiment Design MI vs. Synthetic query estimation –Data Sets: Associated Press (AP90) and San Jose Mercury News (SJMN) + TREC topics –Relatively small data sets in order to compare our results with Synthetic queries in [Berger& Lafferty 99]. MI Translation model vs. Basic query likelihood –Larger Data Sets: TREC7, TREC8 (plus AP90, SJMN) –TREC topics for TREC7 and for TREC8 Additional issues –Regularization of self-translation? –Influence of smoothing on translation models? –Translation model + pseudo feedback? 35

Mutual information outperforms synthetic queries in both MAP and AP90 + queries , Dirichlet Prior Smoothing 36 Syn. Query MI Syn. Query MI

Upper Bound Comparison of Mutual Information and Synthetic Queries Dirichlet Prior Smoothing Data MAP Mutual InfoSyn. QueryMutual Info.Syn. Query AP * SJMN0.197* JM Smoothing Data MAP Mutual InfoSyn. QueryMutual Info.Syn. Query AP * SJMN0.2*

Mutual information translation model outperforms basic query likelihood Data MAP Basic QLMI Trans.Basic QLMI Trans. AP * SJMN * TREC * TREC JM Smoothing Basic QLMI Trans.Basic QLMI Trans. AP * SJMN * TREC TREC * Dir. Prior Smoothing 38

Translation model appears to need less collection smoothing than basic QL 39 Translation model Basic query likelihood

Translation model and pseudo feedback exploit word co-occurrences differently Data MAP BLPFBPFB+TMBLPFBPFB+TM AP SJMN TREC TREC JM Smoothing Query model from pseudo FB Smoothed Translation Model 40

Regularization of self-translation is beneficial AP Data Set, Dirichlet Prior 41

Summary Statistical Translation language model are effective for bridging the vocabulary gap. Mutual information is more effective and more efficient than synthetic queries for estimating translation model probabilities. Regularization of self-translation is beneficial Translation model outperforms basic query likelihood on small and large collections and is more robust Translation model and pseudo feedback exploit word co- occurrences differently and can be combined to further improve performance 42

References [1] A. Berger and J. Lafferty. Information Retrieval as Statistical Translation. ACM SIGIR, pages 222–229, [2] P. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311, [3] M. Karimzadehgan and C. Zhai. Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval. ACM SIGIR, pages ,