Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Beyond Bags of Words: A Markov Random Field Model for Information Retrieval Don Metzler.
A Markov Random Field Model for Term Dependencies Chetan Mishra CS 6501 Paper Presentation Ideas, graphs, charts, and results from paper of same name by.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Semi-supervised protein classification using cluster kernels Jason Weston, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Elisseeff and William Stafford.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Radial Basis Function Networks
Information Retrieval in Practice
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Learning to Rank for Information Retrieval
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Effective Query Formulation with Multiple Information Sources
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Chapter 23: Probabilistic Language Models April 13, 2004.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Gravitation-Based Model for Information Retrieval Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song, Wei-Ying Ma Microsoft Research Asia SIGIR 2005.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
By: Seth Fields and Karl Morris. Hypothesis: We believe that some of the Eagles’ stats will be linear and have a positive correlation (meaning that the.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Learning Deep Generative Models by Ruslan Salakhutdinov
Semantic Processing with Context Analysis
Chinese Academy of Sciences, Beijing, China
Vector-Space (Distributional) Lexical Semantics
Compact Query Term Selection Using Topically Related Text
Language Models for Information Retrieval
A Markov Random Field Model for Term Dependencies
Topic Models in Text Processing
CMU Y2 Rosetta GnG Distillation
Shared Features in Log-Linear Models
Learning to Rank Typed Graph Walks: Local and Global Approaches
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Information Retrieval and Web Design
SVMs for Document Ranking
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Need for term dependency The meaning of a term often depends on other terms used in the same context – Term dependency – E.g. computer architecture, hot dog, … Unigram model is unable to capture term dependency – hot + dog ≠ "hot dog" Dependency: a group of terms (a pair of terms) 2

Previous approaches Phrase + unigram – 2 representations: phrase model and unigram model – Interpolation (each model with a fixed weight) – Assumption: phrases represent useful dependencies between terms for IR – E.g. Q = the price of hot dog P Unigram : price, hot, dog P Phrase : price, hot_dog P(price hot dog|D) =  P phrase (price hot dog|D) + (1-  P Unigram (price hot dog|D) or score =  score phrase + (1-  score unigram – Effect: documents with the phrase “hot dog” have a higher score 3

Dependency model Dependency language model (Gao et al. 2005) – Determine the strongest dependencies among query terms (a parsing process): – price hot dog – The determined dependencies define an additional requirement for documents: Documents have to contain the unigrams Documents have to contain the required dependencies The two criteria are linearly interpolated 4

Markov Random Field (MRF) (Metzler&Croft) Sequential Full Potential function Sequential model: Interpolation of unigram model, ordered bigram and unordered bigram 5

Limitations The importance of a (type of) dependency is fixed in the combined model in the same way for all the queries – A fixed weight is assigned to each component model price-dog is as important as hot-dog (dependency model) price-hot is as important as hot-dog (MRF) in the ordered model Are they equally strong dependencies? – hot-dog > price-dog, price-hot Intuition: a stronger dependency forms a stronger constraint 6

Limitations Can a phrase model solve this problem? – Some phrases form a semantically stronger dependency than some others hot-dog > cute-dog Sony digital-camera > Sony-digital camera, Sony-camera digital – Is a semantically stronger dependency more useful for IR? Not necessarily digital-camera could be less useful than Sony-camera The importance of a dependency in IR depends on its usefulness to retrieve better documents. 7

Limitations MRF sequential model – Only consider consecutive pairs of terms – No dependency between distant terms Sony digital camera: Sony-digital, digital-camera Full model – Can cover long distance dependencies – But large increase in complexity 8

Proximity: more flexible dependency Tao&Zhai, 2007 Zhao&Yun 2009 Prox B (w i ): proximity centrality – Min/average/sum dist. to the other query terms However, is still fixed. 9

A recent extension to MRF model Bendersky, Metzler, Croft, 2010 – Weighted dependencies – w j uni and w j bi : the importance of different features – g j uni and g j bi : the weight of each unigram and bigram according to its utility – However f o and f u are mixed up Only consider dependency between pairs of adjacent terms 10

Go further Using discriminative model instead of MRF – Can consider dependencies between more distant terms, without having the exponential complexity growth We only consider pair-wise dependencies Assumption: pair-wise dependencies capture the most important part of dependencies Consider several types of dependencies between query terms – Ordered bigram – Unordered pair of terms within some distance (2, 4, 8, 16) Dependencies at different distances have different strengths Co-occurrence dependency ~ variable proximity 11

General discriminative model Breaking down each component model to consider the strength/usefulness of a term dependency U, B, C w : importance of a unigram, a bigram and a co- occurrence pair within distance w in documents 12

An example corporate pension plans funds corporatepensionfundsplans bi co2 co4 co8 (co16 omitted) 13

Further development Set U at 1 and vary the other Features: 14

How to determine the usefulness of a bigram and a co- occurrence pair B and  Cw ? - Using a learning method based on some features - Cross-validation 15

Learning method Parameters Goal: – T i : Training data – R i : Document ranking using the parameters – E: measure of effectiveness (MAP) Training data: – {x i, z i } a bigram or a pair of term within distance w and its best value for the query – Finding the best value by coordinate-level ascendent search Epsilon SVM with radial basis kernel function 16

Features 17

Test collections 18

Results with other models 19

With our model 20

Analysis Some intuitively strong dependencies should not be considered as important in the retrieval process Disk1-query 088:“crude oil price trends” – Ideal weights (bi,co2,4,8,16)=0, AP=0.103 – leant bi=0.2, co2..16=0, AP=0.060 Disk1-query 003: “joint ventures” – Ideal weights (bi,co2,4,8,16)=0, AP=0.086 – leant bi=0.07,co2..16=0, AP=0.084 Disk1-query 094: “computer aided crime” – Ideal weights (bi,co2,4,8,16) =0, AP=0.223 – leant bi=0.3, co2..16=0, AP=

Analysis Some intuitively weakly connected words should be considered as strong dependencies: Disk1-query184: “corporate pension plans funds” – Ideal wt.bi=0.5, co2=0.7, co4=0.2, AP=0.253 – Learnt wt.bi=0.2,co8=0.01, co16=0.001, AP=0.201 (Uni=0.131) Disk1-query115: “impact 1986 immigration law” – Ideal wt.co2=0.1, co4=0.35, co8=0.05, AP=0.511 – Learnt wt.bi=0, co16=0.01, AP=0.492 (Uni=0.437) 22

Disk1-query115: “impact 1986 immigration law” Ideal AP =0.511, uni=0.437, learnt= impact1986immigr.law (Learnt)imp-1986imp-immimp-law1986-imm1986-lawimm-law wt.bi wt.co wt.co wt.co bi co2 co4 co8 (co16 omitted) 23

Disk1-query184: “corporate pension plans funds” AP ideal=0.253, uni=0.132, learnt=0.201 corporatepensionfundsplans (Learnt)corp-pencorp-plancorp-fundpen-planpen-fundplan-fund wt.bi wt.co wt.co wt.co bi co2 co4 co8 (co16 omitted) 24

Typical case 1: weak bigram dependency, weak co-occurrence dependency 25

Typical case 2: strong dependencies 26

Typical case 3: Weak bigram dependency, strong co-occurrence dependency 27

Conclusions Different types of dependency between query terms to be considered They have variable importance/usefulness for IR, and should be integrated in IR model with different weights. – Not necessarily correlate with semantic dependency The new model is better than the existing models in most cases (stat. significance in some cases) 28