Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.

Slides:



Advertisements
Similar presentations
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Information Retrieval – Language models for IR
CS276A Text Retrieval and Mining Lecture 12 [Borrows slides from Viktor Lavrenko and Chengxiang Zhai]
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR
Chapter 7 Retrieval Models.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Language Modeling Approaches for Information Retrieval Rong Jin.
CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.
Chapter 7 Retrieval Models.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 9 9/20/2011.
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 23: Probabilistic Language Models April 13, 2004.
A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.
Language Models LBSC 796/CMSC 828o Session 4, February 16, 2004 Douglas W. Oard.
Language Models. Language models Based on the notion of probabilities and processes for generating text Documents are ranked based on the probability.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Statistical Language Models Hongning Wang CS 6501: Text Mining1.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Language Model for Machine Translation Jang, HaYoung.
Probabilistic Retrieval LBSC 708A/CMSC 838L Session 4, October 2, 2001 Philip Resnik.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at
Overview of Statistical Language Models
CS276A Text Information Retrieval, Mining, and Exploitation
Lecture 13: Language Models for IR
Statistical Language Models
CSCI 5417 Information Retrieval Systems Jim Martin
Language Models for Information Retrieval
Murat Açar - Zeynep Çipiloğlu Yıldız
Basic Information Retrieval
Lecture 12 The Language Model Approach to IR
John Lafferty, Chengxiang Zhai School of Computer Science
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Language Model Approach to IR
Topic Models in Text Processing
Language Models Hongning Wang
CS590I: Information Retrieval
INF 141: Information Retrieval
Conceptual grounding Nisheeth 26th March 2019.
Information Retrieval and Web Design
Language Models for TR Rong Jin
Presentation transcript:

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London

Standard Probabilistic IR query d1 d2 dn … Information need document collection matching

IR based on Language Models (LM) query d1 d2 dn … Information need document collection generation … A common search heuristic is to use words that you expect to find in matching documents as your query. The LM approach directly exploits that idea!

What is a Statistical LM A probability distribution over word sequences P(“Today is Wednesday”)  P(“Today Wednesday is”)  P(“The eigenvalue is positive”)  Can also be regarded as a probabilistic mechanism for “generating” text, thus also called a “generative” model A piece of text can be regarded as a sample drawn according to the word distribution Context/Topic dependent!

Why is a LM Useful Allows us to answer questions like: Given that we see “ John ” and “ feels ”, how likely will we see “ happy ” as opposed to “ habit ” as the next word? Given that we observe “ baseball ” three times and “ game ” once in a news article, how likely is it about sports? Given that a user is interested in sports news, how likely would the user use “ baseball ” in a query?

The Simplest Language Model Unigram Model Generate a piece of text by generating each word independently Thus, P(t 1 t 2 … t n ) = P(t 1 ) P(t 2 ) … P(t n ) Essentially a multinomial distribution over words

Unigram LM 0.2the 0.1a 0.01man 0.01woman 0.03said 0.02likes … themanlikesthewoman multiply Model M P(s|M) =

Unigram LM 0.2the 0.01class sayst pleaseth yon maiden 0.01woman Model M 1 Model M 2 maidenclasspleasethyonthe P(s|M 2 ) > P(s|M 1 ) 0.2the class 0.03sayst 0.02pleaseth 0.1yon 0.01maiden woman

Text Generation with Unigram LM

Estimation of Unigram LM

Using LMs for IR

Relevance of a document: P(d|q) Using Bayes rule: P(d|q) = P(q|d) P(d) / P(q) P(q) is the same for all documents, so ignore P(d) [the prior] is often treated as uniform for all d But we could use criteria like authority, length, genre, etc. P(q|d) [the likelihood] is the probability of q under d ’s language model M d

Using LMs for IR d1d1 d2d2 dNdN … M1M1 M2M2 MNMN q P(q|M1)P(q|M1) P(q|M2)P(q|M2) P(q|MN)P(q|MN) Doc LMQuery Likelihood Rank documents by query likelihood

Using LMs for IR It actually models the query generation process Documents are ranked by the probability that a query would be observed as a random sample from the respective document model. Intuition: the user has a prototype document in mind and generates a query based on words that appear in this document. Often, users have a reasonable idea of terms that are likely to occur in documents of interest and they will choose query terms that distinguish these documents from others in the collection.

How to Construct LMs Simplest solution: Maximum Likelihood Estimation (MLE) P(t|M d ) = relative frequency of word t in d Problems of MLE What if a word doesn’t appear in the text? P(t|M d ) = 0 would be problematic

Smoothing for LMs In general, what probability should we give a word that has not been observed? If we want to assign non-zero probabilities to such words, we’ll have to discount the probabilities of observed words This is what “smoothing” is about …

Smoothing for LMs Linear Interpolation LMs A simple idea that works well in practice is to use a mixture between the document distribution and the collection distribution. Correctly setting (0< <1) is very important to the good performance. document model collection model

Smoothing for LMs word t P(t|d)P(t|d)

Example Document Collection d 1 : Xerox reports a profit but revenue is down d 2 : Lucent narrows quarter loss but revenue decreases further Model Unigram LM ( = ½ ) Query: q: revenue down

Example Ranking P(q|d 1 ) = [(1/8 + 2/16)/2] x [(1/8 + 1/16)/2] = 1/8 x 3/32 = 3/256 P(q|d 2 ) = [(1/8 + 2/16)/2] x [(0 + 1/16)/2] = 1/8 x 1/32 = 1/256 Results d 1 > d 2

More Sophisticated LMs N-gram Models In general, n-gram: conditioned only on the past n-1 words e.g., bigram: Remote-dependence language models e.g., maximum entropy model Structured language models e.g., probabilistic context-free grammar

Why Just Unigram Models? Difficulty in moving toward more complex models They involve more parameters, so need more data to estimate (A doc is an extremely small sample) They increase the computational complexity significantly, both in time and space Capturing word order or structure may not add so much value for “topical inference” But, using more sophisticated models can still be expected to improve performance...

Pros A novel way of looking at the problem of IR based on probabilistic language modeling mathematically precise conceptually simple computationally tractable intuitively appealing Importing the techniques from speech recognition and natural language processing Often beats TFxIDF and BM25 in IR experiments Appraisal

Cons The assumption of equivalence between document and query representation is unrealistic Usually very simple models of language (e.g., unigram) is used Without an explicit notion of relevance, relevance feedback is difficult to integrate into the model, as are user preferences Can ’ t easily accommodate notions of phrase and passage matching or Boolean retrieval operators Appraisal

Tools