Gravitation-Based Model for Information Retrieval Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song, Wei-Ying Ma Microsoft Research Asia SIGIR 2005.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

SIGIR’2005 Gravitation-Based Model for Information Retrieval Shuming Shi Ji-Rong Wen Qing Yu Ruihua Song Wei-Ying Ma Microsoft Research Asia
A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, and ChengXiang Zhai University of Illinois at Urbana Champaign SIGIR 2004 (Best paper.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
1 Non-Linear and Smooth Regression Non-linear parametric models: There is a known functional form y=  x,  derived from known theory or from previous.
A Machine Learning Approach for Improved BM25 Retrieval
Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
A Novel TF-IDF Weighting Scheme for Effective Ranking Jiaul H. Paik Indian Statistical Institute, Kolkata, India SIGIR’2013 Presenter:
CpSc 881: Information Retrieval
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval Models: Probabilistic Models
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Hinrich Schütze and Christina Lioma
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Web Object Retrieval Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen, Wei-Ying Ma, MSRA.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Tuesday, Oct. 28, 2014PHYS , Fall 2014 Dr. Jaehoon Yu 1 PHYS 1443 – Section 004 Lecture #18 Tuesday, Oct. 28, 2014 Dr. Jaehoon Yu Torque and Angular.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.
Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, ChengXiang Zhai University of Illinois at Urbana Champaign Urbana SIGIR 2004 Presented.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
A Formal Study of Information Retrieval Heuristics
Probabilistic Retrieval Models
Information Retrieval Models: Probabilistic Models
Language Models for Information Retrieval
Toshiyuki Shimizu (Kyoto University)
CS590I: Information Retrieval
INF 141: Information Retrieval
Information Retrieval and Web Design
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

Gravitation-Based Model for Information Retrieval Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song, Wei-Ying Ma Microsoft Research Asia SIGIR 2005

INTRODUCTION Most IR models fail to satisfy some basic intuitive heuristic constraints IR models commonly lack intuitive physical interpretations View information retrieval from the perspective of physics (theory of gravitation)

INTRODUCTION Derived a ranking formulas which satisfy all heuristic constraints The BM25 term weighting function can be easily derived from our basic model A more reasonable approach for structured document retrieval

Newton ’ s Theory of Gravitation masses m1, m2 distance d universal gravitation constant G

Okapi ’ s BM25 formula Origin representation of w(t) has the potential “ negative IDF ” so we will use ln((N+1)/df(t)) Amati and Rijsbergen propose a nonparametric term weighting functions. They claim that the BM25 function with some special parameters (k1=1.2,b=0.75; or k1=2, b=0.75) can be approximated numerically by one of their generated functions

Structured Document Retrieval The most commonly used approach for structured document retrieval may be score/rank linear combination Robertson combines term frequencies instead of field scores

GRAVITATION-BASED MODEL A term is viewed as a physical object composed of some amount of particles A particle has two attributes Type: Determined by the term it composes Mass

Term Object A term is composed of particles with a specified structure (or shape). Sphere ideal cylinder - a cylinder whose radius is small enough and can be viewed as a line There are three attributes related to a term type, mass, and diameter

Term Object Sphere-shape terms Ideal-cylinder-shape terms

Document Object There are two categories of term objects in each document explicit (or seen) terms - all the terms occurred in the content of the document implicit (or hidden) terms - not occurred in the document, used to represent the hidden meaning of the document We use |D| and |H(D)| to represent the number of explicit and implicit terms

Document Object Mass of a document is the total masses of all its seen and hidden terms The diameter of a document is defined as the sum of the diameters of all its terms a query is modeled as an object composed of its terms (no hidden terms are assumed for simplicity)

Basic Model : Discrete Version

Basic Model : continuous version

Mass and Diameter Estimation Assumption 1 For any two terms, their mass ratio in any document is equal to the ratio of their average masses in the whole collection Assumption 2 The average mass of a term depends on and only on the inverse document frequency

Mass and Diameter Estimation Assumption 3 The average global importance of all terms in a document is constant Now we try to estimate the mass of a term in a document

Mass and Diameter Estimation Assumption 4 The number of hidden terms in a document relies only on the statistic information of the whole collection

Model Analysis Continuous model

Model Analysis Discrete model

Relationship with the BM25 formula In previous section, if m(D) and di(D) are constant, then it is easy to prove that Continuous model is equivalent to simplified BM25

Field Functions

Structured Document Retrieval Because masses of terms in field 1 are larger than those of field 2, field 1 ’ s terms are nearer to the query

Analysis and Comparison Score combination A single query term over many fields could get unreasonably higher score than matches all the query terms in a few fields TF combination Fields weights F1=5, F2=1 score of d3 contains t in its high-weight field, and enough term in its low-weight field should be larger than that of d4

EXPERIMENTS Experiments on two corpora and seven query sets used from TREC 2000 to 2004

Term Weighting Experiments VSM-Piv : Pivoted normalization VSM formula LM-Dir :Language model formula with Dirichlet prior smoothing GBM-Inv :the GBM formula derived by gravitational field function 1/x GBM-Exp :the GBM formula derived by e ^ -x GBM-Std :equivalent to the BM25 formula

Field Structure Experiments The scores for the baseline, ScoreComb, and FreqComb are all generated by the BM25(b=0.75)

CONCLUSION AND FUTURE WORK In this paper, a gravitation-based IR model (GBM) was proposed GBM model can not only give a physical interpretation of BM25, but also derive new effective ranking formulas The derived term weighting functions satisfies all the heuristic constraints

CONCLUSION AND FUTURE WORK A novel approach for structured document retrieval Use static document ranks e.g. PageRank to derive m(D) Study the relationship and possible combination between our model and existing models