Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.

Slides:



Advertisements
Similar presentations
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Advertisements

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Language Modeling.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Language Models Hongning Wang
Albert Gatt Corpora and Statistical Methods – Lecture 7.
SI485i : NLP Set 4 Smoothing Language Models Fall 2012 : Chambers.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
N-Gram Language Models CMSC 723: Computational Linguistics I ― Session #9 Jimmy Lin The iSchool University of Maryland Wednesday, October 28, 2009.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.
1 Smoothing LING 570 Fei Xia Week 5: 10/24/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A AA A A A.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Language Modeling Approaches for Information Retrieval Rong Jin.
1 Advanced Smoothing, Evaluation of Language Models.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Lecture 4 Ngrams Smoothing
Chapter 23: Probabilistic Language Models April 13, 2004.
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Natural Language Processing Statistical Inference: n-grams
Relevance Feedback Hongning Wang
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Lecture 13: Language Models for IR
Statistical Language Models
CSCI 5417 Information Retrieval Systems Jim Martin
Language Models for Information Retrieval
John Lafferty, Chengxiang Zhai School of Computer Science
CSCE 771 Natural Language Processing
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Language Models Hongning Wang
CS590I: Information Retrieval
INF 141: Information Retrieval
Conceptual grounding Nisheeth 26th March 2019.
Language Models for TR Rong Jin
Professor Junghoo “John” Cho UCLA
Presentation transcript:

Language Modeling Again So are we smooth now? Courtesy of Chris Jordan

So what did we talk about last week? Language models represent documents as multinomial distributions –What is a multinomial? The Maximum Likelihood Estimate calculates document model –What is the Maximum Likelihood Estimate? Smoothing document models

Why is smoothing so important? Maximum Likelihood Estimate gives 0 probabilities –Why is that an issue? What does smoothing do? What types of smoothing are there

Challenge questions What is common in every smoothing technique that we have covered? What does smoothing really do? Do it make for a more accurate document model? –Replace the need for more data?

A Study of Smoothing Methods of Language Models Applied to Ad Hoc Information Retrieval Thoughts? What is Additive? What is Interpolation? What is Backoff?

Laplace / Additive Smoothing Just increasing raw term frequencies Is that representative of the document model? How hard is this to implement? What happens if the constant added is really large?

Interpolation Jelinek Mercer p s (t) = p(t|d) + (1- )p(t|corpus) Dirichlet –Anyone know what this is? –Remember Gaussian? Poisson? Beta? Gamma? Distributions for Binomials –Distribution for Multinomials

Dirichlet / Absolute Discounting What do Absolute Discounting do? –How is it different from Laplace? Jelinek Mercer? What is the key difference between the  d in Jelinek Mercer and  d in Dirichlet and Absolute Discounting –  d is used to determine how much probability mass is subtracted from seen terms and added to unseen ones

Back off What is the idea here? –Do not pad the probability of seen terms Any idea why this isn’t work? –The seen terms have their probabilities decreased –Too much smoothing?

Pause… Review Why do we smooth? Does smoothing make sense? What is Laplace? What is Jelinek Mercer? What is the Dirichlet smoothing? What is Absolute Discounting? What is Back off

Let’s beat this horse some more! Everyone know what mean average precision is? Let’s have a look at the results –Are these really improvements –What is an increase of.05 precision really mean? –Will that matter to the user?

And now we come full circle What is a real performance improvement? –Cranfield paradigm evaluation Corpus Queries Qrels –User trials Satisfaction Effectiveness Efficiency

Cluster Based Smoothing What will clustering give us? –Cluster the corpus –Find clusters for each document Mixture model now involves –Document model –Cluster model –Corpus model Some performance gains –Significant but not so special

Relevance Modeling Blind Relevance Feedback approach –Top documents in the result set used as feedback –A language model is constructed from these top ranked documents for each query –This model is used as the relevance model for probabilistic retrieval

One the topic of Blind Relevance Feedback How can we use Relative Entropy here? –Derive a model that minimizes the relative entropy between the documents in the top rank Does Relevance Modeling make sense? Does using Relative Entropy make sense?

The big assumption Top ranked documents are a good source of relevant text –This obviously is not always true –There is a lot of noise –Are top rank representative of the relevant set? Relevance modeling and Relative Entropy BRF approaches have been shown to improve performance –But not really…

Review What is average precision? What is the Cranfield paradigm? What alternative sources can be used for smoothing? Do Blind Relevance Feedback make sense? –Why does it work?

You have been a good class We have covered –Language Modeling for ad-hoc document retrieval –Unigram model –Maximum Likelihood Estimate –Smoothing Techniques –Different mixture models –Blind Relevance Feedback for Language Modeling

Questions for me?

Questions for you Why do we work with the unigram model? Why is smoothing important? How does a language model represent a document? What is interpolation?

Let’s talk more about me

Another application of language modeling Unsupervised Morphological Analysis A morpheme is a basic unit of meaning in a language pretested : pre - test - ed English is a relatively easy language Turkish, Finnish, German are agglomerative –Very hard

Morfessor All terms in the vocabulary are candidate morphemes Terms are recursively split –Build up the candidate morpheme set –Repeatedly analyze the whole vocabulary until the candidate morpheme set can no longer be improved

Swordfish Ngram based unsupervised morpheme analyzer –Character Ngrams Substrings A language model is constructed over all ngrams of all lengths –Maximum Likelihood Estimate Terms recursive split based on the likelihood of the ngrams

Swordfish Results Reasonable Results Character ngrams are useful in finding morphemes –All morphemes are ngrams but not all ngrams are morphemes –The most prominent ngrams appear to be morphemes How one defines prominent is an open question Check out the PASCAL Morpho-Challenge