Language Models for TR Rong Jin

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
Probabilistic Ranking Principle
Information Retrieval Models: Probabilistic Models
Mixture Language Models and EM Algorithm
Chapter 7 Retrieval Models.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Language Modeling Approaches for Information Retrieval Rong Jin.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Overview of Text Retrieval: Part 2 ChengXiang Zhai (
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Probabilistic Ranking Principle Hongning Wang
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Language Models Hongning Wang Recap: document generation model 4501: Information Retrieval Model of relevant docs for Q Model of non-relevant.
Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Statistical Language Models for Biomedical Literature Retrieval ChengXiang Zhai Department of Computer Science, Institute for Genomic Biology, And Graduate.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Midterm Review Hongning Wang Core concepts Search Engine Architecture Key components in a modern search engine Crawling & Text processing Different.
Information Retrieval Models: Vector Space Models
Relevance Feedback Hongning Wang
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David.
A Study of Poisson Query Generation Model for Information Retrieval
Statistical Language Models Hongning Wang CS 6501: Text Mining1.
2010 © University of Michigan Probabilistic Models in Information Retrieval SI650: Information Retrieval Winter 2010 School of Information University of.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at
Essential Probability & Statistics
Overview of Statistical Language Models
Information Retrieval Models: Language Models
Lecture 13: Language Models for IR
Statistical Language Models
Language Models for Text Retrieval
Hidden Markov Models (HMMs)
CSCI 5417 Information Retrieval Systems Jim Martin
Information Retrieval Models: Probabilistic Models
Relevance Feedback Hongning Wang
Language Models for Information Retrieval
Hidden Markov Models (HMMs)
Murat Açar - Zeynep Çipiloğlu Yıldız
Introduction to Statistical Modeling
Lecture 12 The Language Model Approach to IR
Bayesian Inference for Mixture Language Models
John Lafferty, Chengxiang Zhai School of Computer Science
Michal Rosen-Zvi University of California, Irvine
Language Model Approach to IR
CS 4501: Information Retrieval
Probabilistic Ranking Principle
Topic Models in Text Processing
Language Models Hongning Wang
CS590I: Information Retrieval
INF 141: Information Retrieval
Conceptual grounding Nisheeth 26th March 2019.
Presentation transcript:

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University

What is a Statistical LM? A probability distribution over word sequences p(“Today is Wednesday”)  0.001 p(“Today Wednesday is”)  0.0000000000001 p(“The eigenvalue is positive”)  0.00001 Context-dependent! Can also be regarded as a probabilistic mechanism for “generating” text, thus also called a “generative” model

Why is a LM Useful? Provides a principled way to quantify the uncertainties associated with natural language Allows us to answer questions like: Given that we see “John” and “feels”, how likely will we see “happy” as opposed to “habit” as the next word? (speech recognition) Given that we observe “baseball” three times and “game” once in a news article, how likely is it about “sports”? (text categorization, information retrieval) Given that a user is interested in sports news, how likely would the user use “baseball” in a query? (information retrieval)

The Simplest Language Model (Unigram Model) Generate a piece of text by generating each word INDEPENDENTLY Thus, p(w1 w2 ... wn)=p(w1)p(w2)…p(wn) Parameters: {p(wi)} p(w1)+…+p(wN)=1 (N is voc. size) Essentially a multinomial distribution over words A piece of text can be regarded as a sample drawn according to this word distribution

Text Generation with Unigram LM (Unigram) Language Model  p(w| ) Sampling Document … text 0.2 mining 0.1 assocation 0.01 clustering 0.02 food 0.00001 Text mining paper Topic 1: Text mining … food 0.25 nutrition 0.1 healthy 0.05 diet 0.02 Food nutrition paper Topic 2: Health

Estimation of Unigram LM (Unigram) Language Model  p(w| )=? Estimation Document … text ? mining ? assocation ? database ? query ? text 10 mining 5 association 3 database 3 algorithm 2 … query 1 efficient 1 10/100 5/100 3/100 1/100 A “text mining paper” (total #words=100)

Language Models for Retrieval (Ponte & Croft 98) … text ? mining ? assocation ? clustering ? food ? nutrition ? healthy ? diet ? Document Query = “data mining algorithms” Text mining paper ? Which model would most likely have generated this query? Food nutrition paper

Ranking Docs by Query Likelihood dN Doc LM p(q| d1) p(q| d2) p(q| dN) Query likelihood d1 q d2 dN

But, where is the relevance? And, what’s good about this approach?

The Notion of Relevance (Rep(q), Rep(d)) Similarity P(r=1|q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Generative Model Regression (Fox 83) Prob. concept space model (Wong & Yao, 95) Different inference system Inference network model (Turtle & Croft, 91) Different rep & similarity Vector space model (Salton et al., 75) Prob. distr. (Wong & Yao, 89) … Doc generation Query Classical prob. Model (Robertson & Sparck Jones, 76) LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a)

Refining P(R=1|Q,D) Method 2: generative models Basic idea Define P(Q,D|R) Compute P(R|Q,D) using Bayes’ rule Special cases Document “generation”: P(Q,D|R)=P(D|Q,R)P(Q|R) Query “generation”: P(Q,D|R)=P(Q|D,R)P(D|R) Ignored for ranking D

Query likelihood p(q| d) Query Generation Query likelihood p(q| d) Document prior Assuming uniform prior, we have Now, the question is how to compute ? Generally involves two steps: (1) estimate a language model based on D (2) compute the query likelihood according to the estimated model

Retrieval as Language Model Estimation Document ranking based on query likelihood Document language model Retrieval problem  Estimation of p(wi|d) Smoothing is an important issue, and distinguishes different approaches

A General Smoothing Scheme All smoothing methods try to discount the probability of words seen in a doc re-allocate the extra probability so that unseen words will have a non-zero probability Most use a reference model (collection language model) to discriminate unseen words Discounted ML estimate Collection language model

Smoothing & TF-IDF Weighting Plug in the general smoothing scheme to the query likelihood retrieval formula, we obtain Doc length normalization (long doc is expected to have a smaller d) TF weighting Ignore for ranking IDF weighting Smoothing with p(w|C)  TF-IDF + length norm.

Three Smoothing Methods (Zhai & Lafferty 01) Simplified Jelinek-Mercer: Shrink uniformly toward p(w|C) Dirichlet prior (Bayesian): Assume pseudo counts p(w|C) Absolute discounting: Subtract a constant 

Comparison of Three Methods

The Need of Query-Modeling (Dual-Role of Smoothing) Keyword queries Verbose queries

Another Reason for Smoothing Query = “the algorithms for data mining” d1: 0.04 0.001 0.02 0.002 0.003 d2: 0.02 0.001 0.01 0.003 0.004 p( “algorithms”|d1) = p(“algorithm”|d2) p( “data”|d1) < p(“data”|d2) p( “mining”|d1) < p(“mining”|d2) But p(q|d1)>p(q|d2)! We should make p(“the”) and p(“for”) less different for all docs.

Two-stage Smoothing +p(w|C) +  (1-) + p(w|U)  c(w,d) |d| -Explain unseen words -Dirichlet prior(Bayesian)  (1-) + p(w|U) Stage-2 -Explain noise in query -2-component mixture  c(w,d) |d| P(w|d) =

Estimating  using leave-one-out w1 log-likelihood Maximum Likelihood Estimator Newton’s Method Leave-one-out P(w1|d- w1) w2 P(w2|d- w2) P(wn|d- wn) wn ...

Estimating  using Mixture Model P(w|d1) d1  P(w|dN) dN … ... Stage-1 (1-)p(w|d1)+ p(w|U)  (1-)p(w|dN)+ p(w|U) Stage-2 query 1 N ... Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm

Automatic 2-stage results  Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)

Acknowledgement Many thanks to Chengxiang Zhai who generously shares his slides on language modeling approach for information retrieval