Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

Similar presentations


Presentation on theme: "Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University."— Presentation transcript:

1 Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University

2 What is a Statistical LM? A probability distribution over word sequences –p(“ Today is Wednesday ”)  0.001 –p(“ Today Wednesday is ”)  0.0000000000001 –p(“ The eigenvalue is positive” )  0.00001 Context-dependent! Can also be regarded as a probabilistic mechanism for “generating” text, thus also called a “generative” model

3 Why is a LM Useful? Provides a principled way to quantify the uncertainties associated with natural language Allows us to answer questions like: –Given that we see “ John ” and “ feels ”, how likely will we see “ happy ” as opposed to “ habit ” as the next word? (speech recognition) –Given that we observe “baseball” three times and “game” once in a news article, how likely is it about “sports”? (text categorization, information retrieval) –Given that a user is interested in sports news, how likely would the user use “baseball” in a query? (information retrieval)

4 The Simplest Language Model (Unigram Model) Generate a piece of text by generating each word INDEPENDENTLY Thus, p(w 1 w 2... w n )=p(w 1 )p(w 2 )…p(w n ) Parameters: {p(w i )} p(w 1 )+…+p(w N )=1 (N is voc. size) Essentially a multinomial distribution over words A piece of text can be regarded as a sample drawn according to this word distribution

5 Text Generation with Unigram LM (Unigram) Language Model  p(w|  ) … text 0.2 mining 0.1 assocation 0.01 clustering 0.02 … food 0.00001 … Topic 1: Text mining … food 0.25 nutrition 0.1 healthy 0.05 diet 0.02 … Topic 2: Health Document Text mining paper Food nutrition paper Sampling

6 Estimation of Unigram LM (Unigram) Language Model  p(w|  )=? Document text 10 mining 5 association 3 database 3 algorithm 2 … query 1 efficient 1 … text ? mining ? assocation ? database ? … query ? … Estimation A “text mining paper” (total #words=100) 10/100 5/100 3/100 1/100

7 Language Models for Retrieval (Ponte & Croft 98) Document Text mining paper Food nutrition paper Language Model … text ? mining ? assocation ? clustering ? … food ? … food ? nutrition ? healthy ? diet ? … Query = “data mining algorithms” ? Which model would most likely have generated this query?

8 Ranking Docs by Query Likelihood d1d1 d2d2 dNdN q d1d1 d2d2 dNdN Doc LM p(q|  d 1 ) p(q|  d 2 ) p(q|  d N ) Query likelihood

9 But, where is the relevance? And, what’s good about this approach?

10 The Notion of Relevance Relevance  (Rep(q), Rep(d)) Similarity P(r=1|q,d) r  {0,1} Probability of Relevance P(d  q) or P(q  d) Probabilistic inference Different rep & similarity Vector space model (Salton et al., 75) Prob. distr. model (Wong & Yao, 89) … Generative Model Regression Model (Fox 83) Classical prob. Model (Robertson & Sparck Jones, 76) Doc generation Query generation LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a) Prob. concept space model (Wong & Yao, 95) Different inference system Inference network model (Turtle & Croft, 91)

11 Refining P(R=1|Q,D) Method 2: generative models Basic idea –Define P(Q,D|R) –Compute P(R|Q,D) using Bayes’ rule Special cases –Document “generation”: P(Q,D|R)=P(D|Q,R)P(Q|R) –Query “generation”: P(Q,D|R)=P(Q|D,R)P(D|R) Ignored for ranking D

12 Query Generation Assuming uniform prior, we have Query likelihood p(q|  d )Document prior Now, the question is how to compute ? Generally involves two steps: (1) estimate a language model based on D (2) compute the query likelihood according to the estimated model

13 Retrieval as Language Model Estimation Document ranking based on query likelihood Retrieval problem  Estimation of p(w i |d) Smoothing is an important issue, and distinguishes different approaches Document language model

14 A General Smoothing Scheme All smoothing methods try to –discount the probability of words seen in a doc –re-allocate the extra probability so that unseen words will have a non-zero probability Most use a reference model (collection language model) to discriminate unseen words Discounted ML estimate Collection language model

15 Smoothing & TF-IDF Weighting Plug in the general smoothing scheme to the query likelihood retrieval formula, we obtain Ignore for ranking IDF weighting TF weighting Doc length normalization (long doc is expected to have a smaller  d ) Smoothing with p(w|C)  TF-IDF + length norm.

16 Three Smoothing Methods (Zhai & Lafferty 01) Simplified Jelinek-Mercer: Shrink uniformly toward p(w|C) Dirichlet prior (Bayesian): Assume pseudo counts  p(w|C) Absolute discounting: Subtract a constant 

17 Comparison of Three Methods

18 The Need of Query-Modeling (Dual-Role of Smoothing) Verbose queries Keyword queries

19 Another Reason for Smoothing Query = “the algorithms for data mining” d1: 0.04 0.001 0.02 0.002 0.003 d2: 0.02 0.001 0.01 0.003 0.004 p( “algorithms”|d1) = p(“algorithm”|d2) p( “data”|d1) < p(“data”|d2) p( “mining”|d1) < p(“mining”|d2) But p(q|d1)>p(q|d2)! We should make p(“the”) and p(“for”) less different for all docs.

20 Two-stage Smoothing c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet prior(Bayesian)  (1- )+ p(w|U) Stage-2 -Explain noise in query -2-component mixture

21 Estimating  using leave-one-out P(w 1 |d - w 1 ) P(w 2 |d - w 2 ) log-likelihood Maximum Likelihood Estimator Newton’s Method Leave-one-out w1w1 w2w2 P(w n |d - w n ) wnwn...

22 Estimating using Mixture Model query 11 NN... Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm P(w|d 1 )d1d1  P(w|d N )dNdN  …... Stage-1 (1- )p(w|d 1 )+ p(w|U) (1- )p(w|d N )+ p(w|U) Stage-2

23 Automatic 2-stage results  Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)

24 Acknowledgement Many thanks to Chengxiang Zhai who generously shares his slides on language modeling approach for information retrieval


Download ppt "Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University."

Similar presentations


Ads by Google