IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

CS276A Text Retrieval and Mining Lecture 12 [Borrows slides from Viktor Lavrenko and Chengxiang Zhai]
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Information Retrieval in Practice
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Chapter 7 Retrieval Models.
Search Engines and Information Retrieval
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Modern Information Retrieval
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Information Retrieval in Practice
Question Answering using Language Modeling Some workshop-level thoughts.
INFO 624 Week 3 Retrieval System Evaluation
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Language Modeling Approaches for Information Retrieval Rong Jin.
Overview of Search Engines
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
Search Engines and Information Retrieval Chapter 1.
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Thi Nhu Truong, ChengXiang Zhai Paul Ogilvie, Bill Jerome John Lafferty, Jamie Callan Carnegie Mellon University David Fisher, Fangfang Feng Victor Lavrenko.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Automatic Labeling of Multinomial Topic Models
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Relevance Feedback Hongning Wang
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Information Retrieval in Practice
Queensland University of Technology
CS276A Text Information Retrieval, Mining, and Exploitation
Lecture 13: Language Models for IR
CSCI 5417 Information Retrieval Systems Jim Martin
Course Summary (Lecture for CS410 Intro Text Info Systems)
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Relevance Feedback Hongning Wang
Language Models for Information Retrieval
Lecture 12 The Language Model Approach to IR
John Lafferty, Chengxiang Zhai School of Computer Science
Language Model Approach to IR
INF 141: Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

IR Challenges and Language Modeling

IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical approach to language Evaluation methodology  Effectiveness and efficiency The importance of users

Current Status Everyone is an Web or language technology person today...  SIGMOD, VLDB  WWW  ACL, EMNLP  ICML  KDD  IJCAI, AAAI,.... Funding agencies have declared some problems “solved”

Defining the Research Challenges What are the driving forces? What should we work on? What are the grand challenges? What should be funded? cf. Asilomar Report produced by the database community

Language Modeling One challenge: Defining the formal basis for IR  retrieval models  indexing models Lots of papers, any consensus? Relationship to real systems? Language models are an attempt to provide a different perspective for retrieval models  shown promise in describing a range of IR “tasks”  potential for better integration with other language technologies

Why retrieval models? “Why do we need new retrieval models now that we have Google?”  Web search  IR  Typical web queries  information needs Google shows that, for some types of queries, effective ranking can be obtained by combining an AND query with a number of other features  effect of scale - ranking within the top group  features such as links, anchor text, tagging used Retrieval models provide frameworks for improving effectiveness in more general contexts

LM for IR What is a language model? Query-likelihood and document models Document-likelihood and query models KL divergence comparison of models Other models Applications

© Victor Lavrenko, Aug What is a Language Model? A statistical model for generating text –Probability distribution over strings in a given language M P ( | M )= P ( | M ) P ( | M, )

© Victor Lavrenko, Aug Unigram and higher-order models Unigram Language Models N-gram Language Models Other Language Models –Grammar-based models, etc. = P ( )P ( | ) P ( ) P ( ) P ( ) P ( ) P ( | ) P ( | ) P ( | )

© Victor Lavrenko, Aug The fundamental problem of LMs Usually we don’t know the model M –But have a sample of text representative of that model Estimate a language model from a sample Then compute the observation probability P ( | M ( ) ) M

Models of Text Generation Query ModelQuery Doc ModelDoc Searcher Writer Is this the same model?

Retrieval Using Language Models Query ModelQuery Doc ModelDoc Retrieval: Query likelihood (1), Document likelihood (2), Model comparison (3) 1 2 3

Query Likelihood P(Q|D m ) Major issue is estimating document model  i.e. smoothing techniques instead of tf.idf weights cf. Van Rijsbergen’s P(D  Q) and InQuery’s P(I|D) Good retrieval results  e.g. UMass, BBN, Twente, CMU Problems dealing with relevance feedback, query expansion, structured queries

Document Likelihood Rank by likelihood ratio P(D|R)/P(D|N)  treat as a generation problem  P(w|R) is estimated by P(w|Q m )  Q m is the query or relevance model  P(w|N) is estimated by collection probabilities P(w) Issue is estimation of query model  Treat query as generated by mixture of topic and background  Estimate relevance model from related documents (query expansion)  Relevance feedback is easily incorporated Good retrieval results  e.g. UMass at SIGIR 01  inconsistent with heterogeneous document collections

Model Comparison Estimate query and document models and compare Obvious measure is KL divergence D(Q m ||D m )  equivalent to query-likelihood approach if simple empirical distribution used for query model More general risk minimization framework has been proposed  Zhai and Lafferty Consistently better results than query-likelihood or document-likelihood approaches

Other Approaches HMMs (BBN) Probabilistic Latent Semantic Indexing (Hofmann)  assume documents are generated by a mixture of “aspect” models  estimation more difficult Translation model (Berger and Lafferty)

Applications CLIR TDT Novelty and redundancy Links Distributed retrieval QA Filtering Summarization

The Future of IR and LM