Conceptual grounding Nisheeth 26th March 2019.

Slides:

Advertisements

Similar presentations

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.

Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Language Models Hongning Wang

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Information Retrieval in Practice

Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR

Chapter 7 Retrieval Models.

1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou

Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.

Visual Recognition Tutorial

Scalable Text Mining with Sparse Generative Models

Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks.

Chapter 7 Retrieval Models.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.

Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Lecture 4 Ngrams Smoothing

Language Models. Language models Based on the notion of probabilities and processes for generating text Documents are ranked based on the probability.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.

Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.

Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.

Language Model for Machine Translation Jang, HaYoung.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at

Lecture 13: Language Models for IR

Statistical Models for Automatic Speech Recognition

CSCI 5417 Information Retrieval Systems Jim Martin

Lecture 15: Text Classification & Naive Bayes

Data Mining Lecture 11.

Relevance Feedback Hongning Wang

Language Models for Information Retrieval

Statistical Models for Automatic Speech Recognition

Statistical NLP: Lecture 4

N-Gram Model Formulas Word sequences Chain rule of probability

CSCE 771 Natural Language Processing

Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13

Michal Rosen-Zvi University of California, Irvine

Language Model Approach to IR

CPSC 503 Computational Linguistics

Topic Models in Text Processing

Language Models Hongning Wang

CS590I: Information Retrieval

INF 141: Information Retrieval

Language Models for TR Rong Jin

Probabilistic Information Retrieval

Presentation transcript:

Conceptual grounding Nisheeth 26th March 2019

The similarity function Objects represented as points in some coordinate space Metric distances between points reflect observed similarities But what reason do we have to believe the similarity space is endowed with a metric distance? Near Far https://pdfs.semanticscholar.org/9bc7/f2f2dd8ca2d5bdc03c82db264cbe0f2c0449.pdf

What makes a measure metric? Minimality D(a,b) ≥ D(a,a) = 0 Symmetry D(a,b) = D(b,a) Triangle inequality D(a,b) + D(b,c) ≥ D(c,a) Do similarity judgments satisfy any of these properties?

Tversky’s set-theoretic similarity Assumptions Matching: s(a,b) = f(a∩b, a-b, b-a) Monotonicity: s(a,b) ≥ s(a,c) whenever a∩c is a subset of a∩b, a - b is a subset of a - c, and b - a is a subset of c –a Independence: the joint effect on similarity of any two feature components is unaffected by the impact of other components Satisfying model = add up matching features, subtract out distinct features Feature definition unspecified

What Gives Concepts Their Meaning? Goldstone and Rogosky (2002) External grounding: a concept’s meaning comes from its connection to the external world Conceptual web: a concept’s meaning comes from its connections to other concepts in the same conceptual system Examples of “conceptual web” approach: Semantic Networks Probabilistic language models

Semantic Networks Hofstadter. Godel, Escher, Bach.

Language Model Unigram language model N-gram language model probability distribution over the words in a language generation of text consists of pulling words out of a “bucket” according to the probability distribution and replacing them N-gram language model some applications use bigram and trigram language models where probabilities depend on previous words

Language Model A topic in a document or query can be represented as a language model i.e., words that tend to occur often when discussing a topic will have high probabilities in the corresponding language model The basic assumption is that words cluster in semantic space Multinomial distribution over words text is modeled as a finite sequence of words, where there are t possible words at each point in the sequence commonly used, but not only possibility doesn’t model burstiness

Language models in information retrieval 3 possibilities: probability of generating the query text from a document language model probability of generating the document text from a query language model comparing the language models representing the query and document topics Commonly used in NLP applications

Implicit psychological premise Hofstadter. Godel, Escher, Bach. Document Document Query Document Document Rank documents by the closeness of the topics they represent in semantic space to the topic represented by the search query

Query-Likelihood Model Rank documents by the probability that the query could be generated by the document model (i.e. same topic) Given query, start with P(D|Q) Using Bayes’ Rule Assuming prior is uniform, unigram model

Estimating Probabilities Obvious estimate for unigram probabilities is Maximum likelihood estimate makes the observed value of fqi;D most likely If query words are missing from document, score will be zero Missing 1 out of 4 query words same as missing 3 out of 4

Smoothing Document texts are a sample from the language model Missing words should not have zero probability of occurring Smoothing is a technique for estimating probabilities for missing (or unseen) words lower (or discount) the probability estimates for words that are seen in the document text assign that “left-over” probability to the estimates for the words that are not seen in the text

Estimating Probabilities Estimate for unseen words is αDP(qi|C) P(qi|C) is the probability for query word i in the collection language model for collection C (background probability) αD is a parameter Estimate for words that occur is (1 − αD) P(qi|D) + αD P(qi|C) Different forms of estimation come from different αD

Dirichlet Smoothing αD depends on document length Gives probability estimation of and document score Take home question: what is Dirichlet about this smoothing method?

Query Likelihood Example For the term “president” fqi,D = 15, cqi = 160,000 For the term “lincoln” fqi,D = 25, cqi = 2,400 document |d| is assumed to be 1,800 words long collection is assumed to be 109 words long 500,000 documents times an average of 2,000 words μ = 2,000

Query Likelihood Example Negative number because summing logs of small numbers

Query Likelihood Example

Extension: Google Distance

Strong correlation with human similarity ratings Scaled NGD Human similarity ratings

Can operationalize creativity