Language Models for Information Retrieval

Slides:

Advertisements

Similar presentations

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.

Advertisements

Information Retrieval – Language models for IR

CS276A Text Retrieval and Mining Lecture 12 [Borrows slides from Viktor Lavrenko and Chengxiang Zhai]

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Language Models Hongning Wang

Information Retrieval Models: Probabilistic Models

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR

Chapter 7 Retrieval Models.

1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Optimizing Text Classification Mark Trenorden Supervisor: Geoff Webb.

Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.

Language Modeling Approaches for Information Retrieval Rong Jin.

Chapter 7 Retrieval Models.

Chapter Two Probability Distributions: Discrete Variables

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Chapter 23: Probabilistic Language Models April 13, 2004.

A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.

Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

A Study of Poisson Query Generation Model for Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.

Language Model for Machine Translation Jang, HaYoung.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at

CS276A Text Information Retrieval, Mining, and Exploitation

Lecture 13: Language Models for IR

Statistical Language Models

CSCI 5417 Information Retrieval Systems Jim Martin

Lecture 15: Text Classification & Naive Bayes

Information Retrieval Models: Probabilistic Models

Relevance Feedback Hongning Wang

Murat Açar - Zeynep Çipiloğlu Yıldız

Basic Information Retrieval

Representation of documents and queries

Lecture 12 The Language Model Approach to IR

N-Gram Model Formulas Word sequences Chain rule of probability

John Lafferty, Chengxiang Zhai School of Computer Science

Latent Dirichlet Allocation

Language Model Approach to IR

CS 4501: Information Retrieval

Junghoo “John” Cho UCLA

Topic Models in Text Processing

Language Models Hongning Wang

CS590I: Information Retrieval

INF 141: Information Retrieval

Conceptual grounding Nisheeth 26th March 2019.

Information Retrieval and Web Design

Language Models for TR Rong Jin

Presentation transcript:

Language Models for Information Retrieval Andy Luong and Nikita Sudan

Outline Language Model Types of Language Models Query Likelihood Model Smoothing Evaluation Comparison with other approaches

Language Model A language model is a function that puts a probability measure over strings drawn from some vocabulary.

P(q|Md) instead of P(R=1|q,d) Language Models P(q|Md) instead of P(R=1|q,d) A traditional generative model of a language, of the kind familiar from formal language theory, can be used either to recognize or to generate strings. The full set of strings that can be generated is called the language of the automaton.

Example Doc1: “frog said that toad likes frog” Doc2: “toad likes frog” STOP M1 1/6 .2 M2 1/3 1/3 1/6 1/3

Example Continued q = “frog likes toad” P(q | M1) = (1/3)*(1/6)*(1/6)*0.8*0.8*0.2 P(q | M2) = (1/3)*(1/3)*(1/3)*0.8*0.8*0.2 P(q | M1) < P (S | M2) frog said that toad likes STOP M1 1/3 1/6 .2 M2

Types of Language Models CHAIN RULE UNIGRAM LM BIGRAM LM For IR purposes, we use unigrams. Why? Structure is not as important Single document makes it difficult to have sufficient training data Sparseness outweighs complexity

Multinomial distribution Order Constraint Frequency Bag or Words == Unigram Models Multinomial Coefficient – Why is not important? Because order doesn’t matter Pretend d is only a representative sample of text drawn from a model distribution… We then estimate a language model from this sample, and use that model to calculate the probability of observing any word sequence… query M is the size of the term vocabulary

Query Likelihood Model ≈ Construct a language model for each document. Our goal is to rank documents by P (d|q) where the probability of a document is interpreted as the likelihood that it is relevant to the query. P(q) same for all the documents and can thus be ignored. Similarily, P(d) is treated as uniform across all d and so it can also be ignored. However, a genuine prior could include criteria like authority, length, genre, newness and number of previous people who have read the document. Documents are ranked by the probability that a query would be observed as a random sample from the respective document model. Treat each document as a separate class

Query Likelihood Model Infer LM for each document Estimate P(q | Md(i)) Rank documents based on probabilities Intuition about Users A user knows that there will be certain terms in a document of interest They will make a query that distinguish these documents from the collection

MLE

Smoothing Basic Intuition Why else should we smooth? New word or unseen word in the document P( t | Md ) = 0 Zero probabilities will make P ( q | Md) = 0 Why else should we smooth? Smooth for putting weights on very uncommon words, not just zero probabilities

Smoothing Continued Non-occurring term Probability Bound Linear Interpolation Language Model What should we do? Add 1, ½, epsilon Use collection information Large lambda means more emphasis on collection => more smoothing We can also imagine having variable lamda for different size of documents Small document may need more smoothing Infinite document is perfect

Example Doc1: “frog said that toad likes frog” Doc2: “toad likes frog” 1/3 1/6 M2 C 1/3 1/9 1/9 2/9 2/9

Example Continued q = “frog said” λ = ½ P(q | M1) = [(1/3 + 1/3)*(1/2)] * [(1/6 + 1/9)*(1/2)] = .046 P(q | M2) = [(1/3 + 1/3)*(1/2)] * [(0 + 1/9)*(1/2)] = .018 P(q | M1) > P (q | M2)

Evaluation Precision = (relevant documents ∩ retrieved documents)/ retrieved documents Recall = (relevant documents ∩ retrieved documents)/ relevant documents

Tf-Idf The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Numerator is the number of occurrences of a term in the document Denominator is the sum of the occurrences of all terms in the document numerator is the total number of documents denominator is the total number of documents in which the word occurs Discuss with Andy…..

Ponte and Croft’s Experiments They use a multivariate Bernoulli model instead of a mixture of two multinomials. TREC topics 202-250 over TREC disks 2 and 3.

Pros and Cons “Mathematically precise, conceptually simple, computationally tractable and intuitively appealing.” Relevancy is not captured The LM approach assumes that documents and expressions of information needs are objects of the same type, and assesses their match by importing the tools and methods of language modeling from speech and natural language processig.

Query vs. Document Model Why is query likelihood more appealing that document likelihood? Data available in the document vs the query (a) Query Likelihood (b) Document Likelihood (c) Model Comparison

KL divergence Kullback-Leibler Divergence Shown to have better results that query and document likelihood Scores are not compatible across queries Asymmetric How bad is Mq at modeling Md What is the risk? Large divergence means that the models don’t agree

Thank you.

Questions?