Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Slides:

Advertisements

Similar presentations

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.

Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

Language Modeling.

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Language Models Hongning Wang

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.

Visual Recognition Tutorial

Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR

Chapter 7 Retrieval Models.

1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou

N-Gram Language Models CMSC 723: Computational Linguistics I ― Session #9 Jimmy Lin The iSchool University of Maryland Wednesday, October 28, 2009.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

1 Smoothing LING 570 Fei Xia Week 5: 10/24/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A AA A A A.

Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.

Visual Recognition Tutorial

1 Advanced Smoothing, Evaluation of Language Models.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

EM and expected complete log-likelihood Mixture of Experts

Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

9/22/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) Dr. Jan Hajič CS Dept., Johns.

1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) AI-lab

Lecture 4 Ngrams Smoothing

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.

Natural Language Processing Statistical Inference: n-grams

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

An Empirical Study on Language Model Adaptation Jianfeng Gao, Hisami Suzuki, Microsoft Research Wei Yuan Shanghai Jiao Tong University Presented by Patty.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.

Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.

Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Lecture 13: Language Models for IR

CSCI 5417 Information Retrieval Systems Jim Martin

Language Models for Information Retrieval

CSCE 771 Natural Language Processing

Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Language Model Approach to IR

Language Models Hongning Wang

INF 141: Information Retrieval

Conceptual grounding Nisheeth 26th March 2019.

Language Models for TR Rong Jin

Presentation transcript:

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer Science & Information Engineering National Taiwan Normal University

2 Outline Introduction Web language style analysis Current state of open vocabulary language model Component model smoothing using CALM Mixture language model Web search experiments Summary

3 Introduction Ponte and Croft introduced the language modeling (LM) techniques to information retrieval (IR) that have since become an important research area. The motivation is very simple: just as we would like a speech recognition system to transcribe speech into the most likely uttered texts, they would like an IR system to retrieve documents that have high probabilities of meeting the information needs encoded in the query.

4 Introduction An enthusing question still in the center of the LM for IR research is what the document language model is and how it could be obtained. Given a query, a minimum risk retrieval system should rank the document based on product of the likelihood of the query under the document language model,, and the prior of the document.

5 Introduction Such a mixture distribution has been widely used for LM in speech and language processing as well as in IR. In this paper, they propose an information theoretically motivated method towards open vocabulary LMs. Apply this principle to the document modeling and let denote the i th stream of and the corresponding component LM for the stream

6 Web language style analysis The cross-entropy between model and is : Since the logarithmic function is convex, it can be easily shown that the cross entropy is smallest when the two models are identical. The cross entropy function can therefore be viewed as a measurement that quantifies how different the two models are.

7 Web language style analysis The entropy of a model,. To avoid the confusion on the base of the logarithm, they further convert the entropy into the corresponding linear scale perplexity measurement, namely,

8 Web language style analysis As their main objective is to investigate how these language sources can be used to model user queries, they study the cross-entropy between the query LM to others, the results of which are shown in Figure 1.

9 Web language style analysis Up to trigram, the heightened modeling power with an increasing order uniformly improves the perplexities of all streams for queries, although this increased capability can also enhance the style mismatch that eventually leads to the perplexity increase at higher order. For the document body and title, the payoff of using more powerful LMs seems to taper off at bigram, whereas trigram may still be worthwhile for the anchor text.

Current state of open vocabulary language model For any unigram LM with vocabulary V, the probabilities of all the in-vocabulary and OOV tokens sum up to 1. An open-vocabulary LM is a model that reserves non- zero probability mass for OOVs: 10

Current state of open vocabulary language model When an open vocabulary model is used to evaluate a text corpus and encounter additional k distinct OOV tokens, the maximum entropy principle is often applied to evenly distribute among these newly discovered OOVs. The key question is how much mass one should take away from V and assign it for. 11

Current state of open vocabulary language model By applying the Good-Turing formula for r = 0, we have the total probability mass that should be reserved for all the unseen tokens is where denotes the total number of tokens in the corpus 12

Component model smoothing using CALM The goal of adaptation is to find a target such that and are reasonably close and, the modification on the background LM, is minimized. It can be easily verified that such a constraint can be met if we choose the adjustment vector along the direction of the difference vector of, namely, where is the adaptation coefficient.  13

Component model smoothing using CALM A significant contribution of CALM is to derive how the adaptation coefficient can be calculated mathematically when the underlying LM is based on N-gram assuming a multinomial distribution. They show that the adaptation coefficient for a given set of observations O can be computed as where and n(t) denote the document length and the term frequency of the term t, respectively. 14

Component model smoothing using CALM They further extend the idea down to N = 1, where the observation P O and the background P B are the unigram and zero-gram LMs, respectively. Conventionally, the zero-gram LM refers to the least informative LM that treats every token as OOV, namely, its probability mass is exclusively allocated for OOV. Given an observation with a vocabulary size |V O |, a zero- gram LM would just equally distribute its probability mass equally among the vocabulary, 15

Component model smoothing using CALM They can further convert from the logarithmic into the linear scale and express the discount factor in terms of perplexity and vocabulary size: 16

Component model smoothing using CALM They first for each document D and each stream i create a closed-vocabulary maximum likelihood model as the observation. The vocabulary for the stream and the closed- vocabulary stream collection model is thus obtained as The discount factor is computed with (6) and is used to attenuate the in-vocabulary probability as the is the open-vocabulary stream collection model. 17

Component model smoothing using CALM Finally, the stream collection model is used as the background to obtain the smoothed document stream model through linear interpolation 18

Mixture language model They note that they can apply CALM adaptation formula to compute the weights of the multi-component mixture (2) by first re-arranging the distribution as two- component mixture: Accordingly, it seems appropriate to choose the query stream as D 0 in (8) so that the CALM formula functions as adapting other mixture components to the query. 19

Mixture language model As shown in (7), each mixture component itself is a two- component mixture that has parameters to be determined. They can obtain the re-estimation formula under Expectation Maximization (EM) algorithm as 20

21 Web search experiments Single Style LM

22 Web search experiments Multi-Style LM

23 Summary The key question of using LM for IR is how to create a LM for each document that best models the queries used to retrieve the document. The immediate question is how LMs with different vocabulary sets can be combined in a principled way. They propose an alternative based on rigorous mathematical derivations with few assumptions. Their experiments show that the proposed approach can produce retrieval performance close to the high-end oracle results