Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
CpSc 881: Information Retrieval
Probabilistic Ranking Principle
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Information Retrieval Models: Probabilistic Models
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
IR Models: Overview, Boolean, and Vector
Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR
Chapter 7 Retrieval Models.
Hinrich Schütze and Christina Lioma
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modeling Modern Information Retrieval
Hinrich Schütze and Christina Lioma
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Vector Space Model CS 652 Information Extraction and Integration.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Vector Space Model : TF - IDF
Language Modeling Approaches for Information Retrieval Rong Jin.
Chapter 7 Retrieval Models.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Web search basics (Recap) The Web Web crawler Indexer Search User Indexes Query Engine 1 Ad indexes.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 9 9/20/2011.
Text Classification, Active/Interactive learning.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 23: Probabilistic Language Models April 13, 2004.
A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.
Vector Space Models.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Lecture 6: Scoring, Term Weighting and the Vector Space Model
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 9: Scoring, Term Weighting and the Vector Space Model.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
1 Ranking. 2 Boolean vs. Non-boolean Queries Until now, we assumed that satisfaction is a Boolean function of a query –it is easy to determine if a document.
VECTOR SPACE INFORMATION RETRIEVAL 1Adrienn Skrop.
1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at
IR 6 Scoring, term weighting and the vector space model.
CS276A Text Information Retrieval, Mining, and Exploitation
Lecture 13: Language Models for IR
CSCI 5417 Information Retrieval Systems Jim Martin
Lecture 15: Text Classification & Naive Bayes
Information Retrieval Models: Probabilistic Models
Language Models for Information Retrieval
Introduction to Statistical Modeling
Lecture 12 The Language Model Approach to IR
John Lafferty, Chengxiang Zhai School of Computer Science
Language Model Approach to IR
CS 430: Information Discovery
CS590I: Information Retrieval
INF 141: Information Retrieval
Conceptual grounding Nisheeth 26th March 2019.
Information Retrieval and Web Design
Language Models for TR Rong Jin
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR

Introduction to Information Retrieval Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates the query. ❸ What we need to do: ❹ Define the precise generative model we want to use ❺ Estimate parameters (different parameters for each document’s model) ❻ Smooth to avoid zeros ❼ Apply to query and find document most likely to have generated the query ❽ Present most likely document(s) to user

Introduction to Information Retrieval 3 What is a language model? We can view a finite state automaton as a deterministic language model. I wish I wish I wish I wish... Cannot generate: “wish I wish” or “I wish I”. Our basic model: each document was generated by a different automaton like this except that these automata are probabilistic.

Introduction to Information Retrieval 4 A probabilistic language model This is a one-state probabilistic finite-state automaton – a unigram language model – and the state emission distribution for its one state q 1. STOP is not a word, but a special symbol indicating that the automaton stops. frog said that toad likes frog STOP P(string) = 0.01 · 0.03 · 0.04 · 0.01 · 0.02 · 0.01 · 0.02 =

Introduction to Information Retrieval 5 A different language model for each document frog said that toad likes frog STOP P(string|M d1 ) = 0.01 · 0.03 · 0.04 · 0.01 · 0.02 · 0.01 · 0.02 = = 4.8 · P(string|M d2 ) = 0.01 · 0.03 · 0.05 · 0.02 · 0.02 · 0.01 · 0.02 = = 12 · P(string|M d1 ) < P(string|M d2 ) Thus, document d 2 is “more relevant” to the string “frog said that toad likes frog STOP” than d 1 is.

Introduction to Information Retrieval 6 Using language models in IR  Each document is treated as (the basis for) a language model.  Given a query q  Rank documents based on P(d|q)  P(q) is the same for all documents, so ignore  P(d) is the prior – often treated as the same for all d  But we can give a prior to “high-quality” documents, e.g., those with high PageRank.  P(q|d) is the probability of q given d.  So to rank documents according to relevance to q, ranking according to P(q|d) and P(d|q) is equivalent.

Introduction to Information Retrieval 7 Where we are  In the LM approach to IR, we attempt to model the query generation process.  Then we rank documents by the probability that a query would be observed as a random sample from the respective document model.  That is, we rank according to P(q|d).  Next: how do we compute P(q|d)?

Introduction to Information Retrieval 8 How to compute P(q|d)  We will make the same conditional independence assumption as for Naive Bayes. (|q|: length of q; t k : the token occurring at position k in q)  This is equivalent to:  tf t,q : term frequency (# occurrences) of t in q  Multinomial model (omitting constant factor)

Introduction to Information Retrieval 9 Parameter estimation  Missing piece: Where do the parameters P(t|M d ). come from?  Start with maximum likelihood (|d|: length of d; tf t,d : # occurrences of t in d)  we have a problem with zeros.  A single t with P(t|M d ) = 0 will make zero.  We would give a single term “veto power”.  For example, for query [Michael Jackson top hits] a document about “top songs” (but not using the word “hits”) would have P(t|M d ) = 0. – That’s bad.  We need to smooth the estimates to avoid zeros.

Introduction to Information Retrieval 10 Smoothing  Key intuition: A non-occurring term is possible (even though it didn’t occur),... ... but no more likely than would be expected by chance in the collection.  Notation: M c : the collection model; cf t : the number of occurrences of t in the collection; : the total number of tokens in the collection.  We will use to “smooth” P(t|d) away from zero.

Introduction to Information Retrieval 11 Mixture model  P(t|d) = λP(t|M d ) + (1 - λ)P(t|M c )  Mixes the probability from the document with the general collection frequency of the word.  High value of λ: “conjunctive-like” search – tends to retrieve documents containing all query words.  Low value of λ: more disjunctive, suitable for long queries  Correctly setting λ is very important for good performance.

Introduction to Information Retrieval 12 Mixture model: Summary  What we model: The user has a document in mind and generates the query from this document.  The equation represents the probability that the document that the user had in mind was in fact this one.

Introduction to Information Retrieval 13 Example  Collection: d 1 and d 2  d 1 : Jackson was one of the most talented entertainers of all time  d 2 : Michael Jackson anointed himself King of Pop  Query q: Michael Jackson  Use mixture model with λ = 1/2  P(q|d 1 ) = [(0/11 + 1/18)/2] · [(1/11 + 2/18)/2] ≈  P(q|d 2 ) = [(1/7 + 1/18)/2] · [(1/7 + 2/18)/2] ≈  Ranking: d 2 > d 1

Introduction to Information Retrieval 14 Exercise: Compute ranking  Collection: d 1 and d 2  d 1 : Xerox reports a profit but revenue is down  d 2 : Lucene narrows quarter loss but decreases further  Query q: revenue down  Use mixture model with λ = 1/2  P(q|d 1 ) = [(1/8 + 2/16)/2] · [(1/8 + 1/16)/2] = 1/8 · 3/32 = 3/256  P(q|d 2 ) = [(1/8 + 2/16)/2] · [(0/8 + 1/16)/2] = 1/8 · 1/32 = 1/256  Ranking: d 2 > d 1

Introduction to Information Retrieval 15 Vector space (tf-idf) vs. LM The language modeling approach always does better in these experiments but note that where the approach shows significant gains is at higher levels of recall.

Introduction to Information Retrieval 16 LMs vs. vector space model (1)  LMs have some things in common with vector space models.  Term frequency is directed in the model.  But it is not scaled in LMs.  Probabilities are inherently “length-normalized”.  Cosine normalization does something similar for vector space.  Mixing document and collection frequencies has an effect similar to idf.  Terms rare in the general collection, but common in some documents will have a greater influence on the ranking.

Introduction to Information Retrieval 17 LMs vs. vector space model (2)  LMs vs. vector space model: commonalities  Term frequency is directly in the model.  Probabilities are inherently “length-normalized”.  Mixing document and collection frequencies has an effect similar to idf.  LMs vs. vector space model: differences  LMs: based on probability theory  Vector space: based on similarity, a geometric/ linear algebra notion  Collection frequency vs. document frequency  Details of term frequency, length normalization etc.

Introduction to Information Retrieval 18 Language models for IR: Assumptions  Simplifying assumption: Queries and documents are objects of same type. Not true!  The vector space model makes the same assumption.  Simplifying assumption: Terms are conditionally independent.  Again, vector space model (and Naive Bayes) makes the same assumption.  Cleaner statement of assumptions than vector space  Thus, better theoretical foundation than vector space  … but “pure” LMs perform much worse than “tuned” LMs.

Introduction to Information Retrieval Three ways of developing language modeling approach Kullback-Leibler Divergence

Introduction to Information Retrieval Query Expansion in Language Modeling Basic Idea: We assume that the translation model can be represented by a conditional probability distribution T(·|·) between vocabulary terms. The form of the translation query generation model: