Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.

Slides:



Advertisements
Similar presentations
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Advertisements

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Expectation Maximization
Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Probabilistic Ranking Principle
K-means clustering Hongning Wang
Information Retrieval Models: Probabilistic Models
Mixture Language Models and EM Algorithm
Visual Recognition Tutorial
Chapter 7 Retrieval Models.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Maximum Likelihood (ML), Expectation Maximization (EM)
Expectation-Maximization
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EM Algorithm Likelihood, Mixture Models and Clustering.
Language Modeling Approaches for Information Retrieval Rong Jin.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Statistical Language Models for Information Retrieval Tutorial at ACM SIGIR 2005 Aug. 15, 2005 ChengXiang Zhai Department of Computer Science University.
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Lecture 19: More EM Machine Learning April 15, 2010.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
China-US-France Summer School, Lotus Hill Inst Lectures 2 & 3: Statistical Language Models for Information Retrieval ChengXiang Zhai ( 翟成祥 ) Department.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Probabilistic Ranking Principle Hongning Wang
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Statistical Language Models for Information Retrieval Tutorial at NAACL HLT 2007 April 22, 2007 ChengXiang Zhai Department of Computer Science University.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Statistical Language Models for Information Retrieval Tutorial at NAACL HLT 2007 April 22, 2007 ChengXiang Zhai Department of Computer Science University.
Relevance Feedback Hongning Wang
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David.
A Study of Poisson Query Generation Model for Information Retrieval
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Information Retrieval Models: Language Models
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Information Retrieval Models: Probabilistic Models
Relevance Feedback Hongning Wang
Probabilistic Models with Latent Variables
John Lafferty, Chengxiang Zhai School of Computer Science
CS 4501: Information Retrieval
Topic Models in Text Processing
Language Models Hongning Wang
Language Models for TR Rong Jin
Presentation transcript:

Language Models Hongning Wang

Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet prior (Bayesian)  Collection LM (1- )+ p(w|U) Stage-2 -Explain noise in query -2-component mixture User background model Can be approximated by p(w|C) 6501: Information Retrieval2

Estimating using EM algorithm [Zhai & Lafferty 02] Query Q=q 1 …q m 11 NN... P(w|d 1 )d1d1  P(w|d N )dNdN  …... Stage-1 (1- )p(w|d 1 )+ p(w|U) (1- )p(w|d N )+ p(w|U) Stage-2 Estimated in stage : Information Retrieval3 Consider query generation as a mixture over all the relevant documents

Introduction to EM 6501: Information Retrieval4 Most of cases are intractable We have missing date.

Background knowledge 6501: Information Retrieval5

Expectation Maximization 6501: Information Retrieval6 Jensen's inequality Lower bound: easier to compute, many good properties!

Intuitive understanding of EM Data likelihood p(X|  )  6501: Information Retrieval7 Lower bound Easier to optimize, guarantee to improve data likelihood

Expectation Maximization (cont) 6501: Information Retrieval8 Jensen's inequality

Expectation Maximization (cont) 6501: Information Retrieval9

Expectation Maximization (cont) 6501: Information Retrieval10

Expectation Maximization (cont) 6501: Information Retrieval11 Expectation of complete data likelihood

Expectation Maximization 6501: Information Retrieval12 Key step!

13 Intuitive understanding of EM Data likelihood p(X|  )  current guess Lower bound (Q function) next guess E-step = computing the lower bound M-step = maximizing the lower bound S=We have missing date. E-step: M-step:

Convergence guarantee Proof of EM 6501: Information Retrieval14 Then the change of log data likelihood between EM iteration is: Cross-entropy M-step guarantee this

6501: Information Retrieval15 What is not guaranteed

Estimating using Mixture Model [Zhai & Lafferty 02] 6501: Information Retrieval16 Query Q=q 1 …q m 11 NN... P(w|d 1 )d1d1  P(w|d N )dNdN  …... Stage-1 (1- )p(w|d 1 )+ p(w|U) (1- )p(w|d N )+ p(w|U) Stage-2 Estimated in stage-1 Origin of the query term is unknown, i.e., from user background or topic?

Variants of basic LM approach Different smoothing strategies Hidden Markov Models (essentially linear interpolation) [Miller et al. 99] Smoothing with an IDF-like reference model [Hiemstra & Kraaij 99] Performance tends to be similar to the basic LM approach Many other possibilities for smoothing [Chen & Goodman 98] Different priors Link information as prior leads to significant improvement of Web entry page retrieval performance [Kraaij et al. 02] Time as prior [Li & Croft 03] PageRank as prior [Kurland & Lee 05] Passage retrieval [Liu & Croft 02] 6501: Information Retrieval17

Improving language models Capturing limited dependencies Bigrams/Trigrams [Song & Croft 99] Grammatical dependency [Nallapati & Allan 02, Srikanth & Srihari 03, Gao et al. 04] Generally insignificant improvement as compared with other extensions such as feedback Full Bayesian query likelihood [Zaragoza et al. 03] Performance similar to the basic LM approach Translation model for p(Q|D,R) [Berger & Lafferty 99, Jin et al. 02,Cao et al. 05] Address polesemy and synonyms Improves over the basic LM methods, but computationally expensive Cluster-based smoothing/scoring [Liu & Croft 04, Kurland & Lee 04,Tao et al. 06] Improves over the basic LM, but computationally expensive Parsimonious LMs [Hiemstra et al. 04] Using a mixture model to “factor out” non-discriminative words 6501: Information Retrieval18

A unified framework for IR: Risk Minimization Long-standing IR Challenges Improve IR theory Develop theoretically sound and empirically effective models Go beyond the limited traditional notion of relevance (independent, topical relevance) Improve IR practice Optimize retrieval parameters automatically Language models are promising tools … How can we systematically exploit LMs in IR? Can LMs offer anything hard/impossible to achieve in traditional IR? 6501: Information Retrieval19

Idea 1: Retrieval as decision-making (A more general notion of relevance) Unordered subset ? Clustering ? Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? (  ) Choose: (D,  ) Query … Ranked list ? : Information Retrieval20

Idea 2: Systematic language modeling Document Language Models Documents DOC MODELING Query Language Model QUERY MODELING Loss Function User USER MODELING Retrieval Decision: ? 6501: Information Retrieval21

Generative model of document & query [Lafferty & Zhai 01b] observed Partially observed U User S Source inferred d Document q Query R 6501: Information Retrieval22

Applying Bayesian Decision Theory [Lafferty & Zhai 01b, Zhai 02, Zhai & Lafferty 06] Choice: (D 1,  1 ) Choice: (D 2,  2 ) Choice: (D n,  n )... query q user U doc set C source S qq 11 NN hiddenobservedloss Bayes risk for choice (D,  ) RISK MINIMIZATION Loss L 6501: Information Retrieval23

Special cases Set-based models (choose D) Ranking models (choose  ) Independent loss Relevance-based loss Distance-based loss Dependent loss Maximum Margin Relevance loss Maximal Diverse Relevance loss Boolean model Vector-space Model Two-stage LM KL-divergence model Probabilistic relevance model Generative Relevance Theory 6501: Information Retrieval24 Subtopic retrieval model

Optimal ranking for independent loss 6501: Information Retrieval25 Decision space = {rankings} Sequential browsing Independent loss Independent risk = independent scoring “Risk ranking principle” [Zhai 02]

Automatic parameter tuning Retrieval parameters are needed to model different user preferences customize a retrieval model to specific queries and documents Retrieval parameters in traditional models EXTERNAL to the model, hard to interpret Parameters are introduced heuristically to implement “intuition” No principles to quantify them, must set empirically through many experiments Still no guarantee for new queries/documents Language models make it possible to estimate parameters… 6501: Information Retrieval26

Parameter setting in risk minimization Query Language Model Document Language Models Loss Function User Documents Query model parameters Doc model parameters User model parameters Estimate Set 6501: Information Retrieval27

Summary of risk minimization Risk minimization is a general probabilistic retrieval framework Retrieval as a decision problem (=risk min.) Separate/flexible language models for queries and docs Advantages A unified framework for existing models Automatic parameter tuning due to LMs Allows for modeling complex retrieval tasks Lots of potential for exploring LMs… 6501: Information Retrieval28

What you should know EM algorithm Risk minimization framework 6501: Information Retrieval29