Introduction to Statistical Modeling

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
CpSc 881: Information Retrieval
Probabilistic Ranking Principle
Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR
Chapter 7 Retrieval Models.
Hinrich Schütze and Christina Lioma
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Vector Space Model CS 652 Information Extraction and Integration.
The Vector Space Model …and applications in Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Language Modeling Approaches for Information Retrieval Rong Jin.
Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks.
Chapter 7 Retrieval Models.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 23: Probabilistic Language Models April 13, 2004.
Language Models. Language models Based on the notion of probabilities and processes for generating text Documents are ranked based on the probability.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at
Essential Probability & Statistics
Information Retrieval Models: Language Models
True/False questions (3pts*2)
Lecture 13: Language Models for IR
Statistical Language Models
Ch3: Model Building through Regression
CSCI 5417 Information Retrieval Systems Jim Martin
Language Models for Information Retrieval
John Lafferty, Chengxiang Zhai School of Computer Science
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Language Model Approach to IR
Language Models Hongning Wang
CS590I: Information Retrieval
INF 141: Information Retrieval
Information Retrieval and Web Design
Mathematical Foundations of BME Reza Shadmehr
Language Models for TR Rong Jin
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

Introduction to Statistical Modeling Rong Jin

Why Statistical Modeling? Vector space model for information retrieval Both documents and queries are vectors in the term space Relevance is measured by the similarity between document vectors and query vector Many problems with vector space model Ad-hoc term weighting schemes Ad-hoc basis vectors Ad-hoc similarity measurement We need something that is much more principled !

A Simple Example (I) Question: how to guess which coin Alex choose? Consider you have three coins: C1, C2, C3 Alex picked up one of the coins and flipped it six times. You didn’t see which coin he picked out. But, you observed the results of flipping coins: t, h, t, h, t, t Question: how to guess which coin Alex choose?

A Simple Example (II) You experimented with the three coins, say 6 times C1: h, h, h, t, h, t C2: t, t, h, t, t, t C3: t, h, t, t, t, h Given: t, h, t, h, t, t Now, what one you think Alex choose?

A Simple Example (III) q: t, h, t, h, t, t  bias bq = 1/3 C1: h, h, h, t, h  bias b1 = 5/6 C2: t, t, h, t, t, t  bias b2 = 1/6 C3: t, h, t, t, t, h  bias b3 = 1/3 So, which coin you think Alex select? A more principled approach: Compute the likelihood p(q|Ci) for each coin

A Simple Example (IV) p(q|C1) = p(t, h, t, h, t, t | C1) = p(t|C1)*p(h|C1)*p(t|C1)*p(h|C1)*p(t|C1)*p(t|C1) = 1/6 * 5/6 * 1/6 * 5/6 * 1/6 * 1/6 ~ 5.3*10-4 Compute p(q|C2) and p(q|C3) Which coin has the largest likelihood ?

A Simple Example (IV) p(q|C1) = p(t, h, t, h, t, t | C1) = p(t|C1)*p(h|C1)*p(t|C1)*p(h|C1)*p(t|C1)*p(t|C1) = 1/6 * 5/6 * 1/6 * 5/6 * 1/6 * 1/6 ~ 5.3*10-4 Compute p(q|C2) and p(q|C3) p(q|C2) = 0.013, p(q|C3) = 0.02 Which coin has the largest likelihood ?

An Information Retrieval View Query (q): t, h, t, h, t, t Doc1(C1): h, h, h, t, h Doc2(C2): t, t, h, t, t, t Doc3(C3): t, h, t, t, t, h Which document is ranked first if we use the vector space model?

An Information Retrieval View Query (q): t, h, t, h, t, t Doc1(C1): h, h, h, t, h Doc2(C2): t, t, h, t, t, t Doc3(C3): t, h, t, t, t, h Which document is ranked first if we use the vector space model?

An Information Retrieval View Query (q): t, h, t, h, t, t Doc1(C1): h, h, h, t, h sim(D1) = 1/3*5/6+2/3*1/6 = 0.39 Doc2(C2): t, t, h, t, t, t sim(D2) = 1/3*1/6+2/3*5/6 = 0.61 Doc3(C3): t, h, t, t, t, h sim(D3) = 1/3*1/3+2/3*2/3 = 0.56 Which document is ranked first if we use the vector space model?

A Simple Example: Summary q: t, h, t, h, t, t ? ? ? C1:h, h, h, t, h C2:t, t, h, t, h, h C3: t, h, t, t, t, h

A Simple Example: Summary q: t, h, t, h, t, t Estimating likelihood p(q|bias) b2 = 1/2 b3 = 1/3 b1 = 5/6 Estimating bias for each coin C1:h, h, h, t, h C2:t, t, h, t, h, h C3: t, h, t, t, t, h

A Probabilistic Framework for Information Retrieval q: ‘Bush Kerry’ Estimating likelihood p(q| ) ? ? ? Estimating some statistics  for each document d1 … d1000

A Probabilistic Framework for Information Retrieval Three fundamental questions What statistics  should be chosen to describe the characteristics of documents ? How to estimate this statistics ? How to compute the likelihood of generating queries given the statistics ?

Unigram Language Model Probabilities for single word p(w) ={p(w) for any word w in vocabulary V} Estimate an unigram language model Simple counting Given a document d, count term frequency c(w,d) for each word w. Then, p(w) = c(w,d)/|d| How to estimate the likelihood p(q|)?

Estimate p(q|) q={w1, w2, …, wk} Similar to the example of flipping coins E.g.: q={‘bush’, ‘kerry’} d={p(‘bush’)=0.001, p(‘kerry’)=0.02} p(q|d)=0.001 * 0.02 = 2 * 10-5 What if the document didn’t mention word ‘bush’, instead it used phrase ‘president of united states’ ?

Estimate p(q|) q={w1, w2, …, wk} Similar to the example of flipping coins E.g.: q={‘bush’, ‘kerry’} d={p(‘bush’)=0.001, p(‘kerry’)=0.02} p(q|d)=0.001 * 0.02 = 2 * 10-5 What if the document didn’t mention word ‘bush’, instead it used phrase ‘president of united states’ ?

Illustration of Language Models for Information Retrieval q: t, h, t, h, t, t Estimating likelihood p(q|)=[p(h)]2[p(t)]4 2 = {p(h)=1/2, p(t)=1/2} 1 = {p(h)=1/3, p(t)=2/3} 1 = {p(h)=5/6, p(t)=1/6} Estimating language models by counting d1:h, h, h, t, h,h d2:t, t, h, t, h, h d3: t, h, t, t, t, h

A Simple Example: Summary q: t, h, t, h, t, t Estimating likelihood p(q|)=[p(h)]2[p(t)]4 2 = {p(h)=1/2, p(t)=1/2} 1 = {p(h)=1/3, p(t)=2/3} 1 = {p(h)=5/6, p(t)=1/6} Estimating language models by counting d1:h, h, h, t, h,h d2:t, t, h, t, h, h d3: t, h, t, t, t, h

A Simple Example: Summary q: t, h, t, h, t, t Estimating likelihood p(q|)=[p(h)]2[p(t)]4 2 = {p(h)=1/6, p(t)=5/6} 3 = {p(h)=1/3, p(t)=2/3} 1 = {p(h)=5/6, p(t)=1/6} Estimating language models by counting d1:h, h, h, t, h,h d2:t, t, h, t, h, h d3: t, h, t, t, t, h Problems?

Problems With Unigram LM Unigram probabilities Insufficient for representing true documents Simple counting for estimating unigram probabilities It does not account for variance in documents If you ask the same person to write the same story twice, it will be different Most words will have zero probabilities Sparse data problem

Sparse Data Problems Shrinkage Maximum a posterior (MAP) estimation Bayesian approach

Shrinkage: Jelinek Mercer Smoothing Linearly interpolate between document language model and the collection language model Estimation based on individual document Estimation based on the corpus 0 <  < 1: is a smoothing parameter

Smoothing & TF-IDF Weighting Are they totally irrelevant ?

Smoothing & TF-IDF Weighting Similar to TF.IDF weighting irrelevant to documents