Compact Query Term Selection Using Topically Related Text

Slides:



Advertisements
Similar presentations
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Information Retrieval in Practice
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Effective Query Formulation with Multiple Information Sources
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 23: Probabilistic Language Models April 13, 2004.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Single Document Key phrase Extraction Using Neighborhood Knowledge.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Queensland University of Technology
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Chinese Academy of Sciences, Beijing, China
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Multimedia Information Retrieval
Language Models for Information Retrieval
Applying Key Phrase Extraction to aid Invalidity Search
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
A Markov Random Field Model for Term Dependencies
Representation of documents and queries
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
CMU Y2 Rosetta GnG Distillation
Relevance and Reinforcement in Interactive Browsing
Large Scale Findability Analysis
Retrieval Performance Evaluation - Measures
Presented by Nick Janus
Ranking using Multiple Document Types in Desktop Search
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Compact Query Term Selection Using Topically Related Text K. Tamsin Maxwell, W. Bruce Croft SIGIR 2013

Outline Introduction Related Work Principle for Term Selection PhRank Algorithm Evaluation Framework Experiments Conlusion

Introduction Recent query reformulation techniques usually uses pseudo relevant feecback in their approaches. But since they consider words which not in the original query, the expansion may include peripheral words and causes query drift PhRank also uses PRF, but uses them for in-query term selection. Each indicate term include 1-3 words, and ranked with score which from a co-occurrence graph Here we list advantages of PhRank It’s the first method to use PRF for in-query term selection Only small number of terms are selected, so it retaining the flexibility for more or longer terms if required The affinity graph captures aspects of both syntactic and non-syntactic word associations

Related Work Markov chain framework The Markov chain framework uses the stationary distribution of a random walk over an affinity graph G to estimate the importance of vertices in the graph A random walk describes a succession of random or semi-random steps between vertices 𝑣 𝑖 and 𝑣 𝑗 in 𝐺 If we define transition probability between 𝑣 𝑖 and 𝑣 𝑗 as ℎ 𝑖𝑗 , and 𝜋 𝑗 𝑡 as affinity score of 𝑣 𝑗 at time t, then 𝜋 𝑗 𝑡+1 is the sum of scores for each 𝑣 𝑖 connect to 𝑣 𝑗

Related Work Sometimes 𝑣 𝑖 step to some 𝑣 𝑗 that may be unconnected, so we often define a minimum probability 𝑢=1/𝑛, where 𝑛 is the number of vertices in 𝐺 then we uses a factor 𝛼 to control the balance between transition probability and minimum probability

Principle for Term Selection For an informative word Is informative relative to a query:a word should represent the meaning of query, but query usually doesn’t have enough information. PRF is used to enhancing a query representation Is related to other informative words:The Association Hypothesis states that, “if one index term is good at discriminating relevant from non-relevant documents, then any closely associated index term is also likely to be good at this”. With a affinity graph, we can get the information above by estimate the number of word connects to a target word and the value

Principle for Term Selection For a informative term Contains informative words:We deduce all terms must contain informative words, so we consider individual words when ranking terms Is discriminative in retrieval collection:A term that occurs many times within a small number of documents gives a pronounced relevance signal. So we weights terms with a normalized tf.idf inspired weight

The PhRank Algorithm Graph construction For a query, we first retrieve top 𝑘 documents. Then we define set 𝑁 as set of query itself and its relevant documents Do stemming for documents in 𝑁. Each unique word is now a vertex in graph 𝐺 Edges between vertices 𝑣 𝑖 and 𝑣 𝑗 are connected if word 𝑖 and 𝑗 is adjacent in 𝑁 Edge weights Transition probability is based on linear combination of word 𝑖 and 𝑗 co-occur in window size of 2 and 10

The PhRank Algorithm Edge weights are defined by 𝑝 𝑑 𝑘 𝑄 is the probability of document in which word 𝑖 and 𝑗 co-occur given 𝑄, and 𝑐 𝑖𝑗 𝑤 2 and 𝑐 𝑖𝑗 𝑤 10 is the count of co-occur in window 2 and 10 𝑟 is the 𝑖𝑑𝑓 style weight confirms importance between 𝑖 and 𝑗 in 𝑁

The PhRank Algorithm Random walk A random walk of 𝐺 is proceed as we represent in related work The edge weights are normalized to sum to one The iteration stopped when the difference between any vertex dies not exceed 0.0001 Vertex weights The word are also weighted to exhaustiveness represent the query. Some words like “make ” would high score in affinity graph, but it is not more informative

The PhRank Algorithm We define 𝑠 as factor to balance exhaustively with global saliency to identify stems that are poor discriminators been relevant and non-relevant documents For a word 𝑤 𝑛 , 𝑠 𝑤 𝑛 = 𝑤 𝑛 𝑓 𝑎𝑣𝑔 ∗ 𝑖𝑑𝑓 𝑤 𝑛 𝑤 𝑛 𝑓 𝑎𝑣𝑔 is the frequency of 𝑤 𝑛 in 𝑁, and 𝑖𝑑𝑓 𝑤 𝑛 is 𝑖𝑑𝑓 of 𝑤 𝑛 in 𝑁

The PhRank Algorithm Term ranking For a term 𝑥, Factor 𝑧 represents the degree to which the term is discriminative in a collection. 𝑧 is defined by 𝑥 𝑒 is the frequency of words in 𝑥 co-occur in 4*number of term window in collection, 𝑖𝑑𝑓 𝑥 𝑒 defined just like 𝑖𝑑𝑓 𝑤 𝑛 , and 𝑙 𝑥 = 𝑥 𝑥 Finally, the rank of a term 𝑥 for 𝑄is defined as

The PhRank Algorithm After finish the rank, we still have some terms that includes uninformative words. This is because we rank terms by the whole score, so some terms would contain the similar words and decrease the diversity We apply a simple filtering with top-down constraints For term 𝑥 , If a higher rank term contains all words in 𝑥 or 𝑥 contains all words in higher rank term, we discard 𝑥

Evaluation Framework Robustness Precision Succinctness Compare with sequential dependence of Markov random field model. This model uses linear combine for query likelihood, 2 and 8 window sized bigram Precision The subset distribution model achieves high mean average precision Succinctness We use Key Concepts as the succinctness approach. This approach linear combined bag-of-words query representation and weighted bag-of-words query representation

Evaluation Framework Word dependence We refers four models of phrase belief as the figure

Experiments We use Indri on Robust04, WT10G and GOV2 for evaluate Feature analysis Here we list the results of using the features in PhRank

Experiments

Experiments Compare with other model

Conclusion PhRank is a novel method to select succinct term within a query which works on Markov chain frameworks Although the term is succinct, but its risky strategy and causes the decreasing of mAP compared with sequential dependence