Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Slides:



Advertisements
Similar presentations
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
INFO 624 Week 3 Retrieval System Evaluation
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Presented by Zeehasham Rasheed
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Universit at Dortmund, LS VIII
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Relevance Feedback Hongning Wang
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Multimedia Information Retrieval
Relevance Feedback Hongning Wang
Language Models for Information Retrieval
Modern Information Retrieval
John Lafferty, Chengxiang Zhai School of Computer Science
Panagiotis G. Ipeirotis Luis Gravano
Junghoo “John” Cho UCLA
INF 141: Information Retrieval
Presentation transcript:

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science Carnegie Mellon University International Conference on Web Intelligence(WIC ’ 03) Presented by Chu Huei-Ming 2005/02/24

2 Reference Relevance Models in Information Retrieval –Victor Lavrenko and W.Bruce Croft –Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts (UMass) – Kluwer Academic Publishers. Printed in the Netherlands

3 Outline Introduction Relevance Model Web image retrieval re-ranking Estimating a Relevance Model –Estimation form a set of examples –Estimation without examples Ranking Criterion Experiment Conclusion

4 Introduction (1/2) Most current large-scale web image search engines exploit text and link structure to “ understand ” the content of the web images. This paper propose a re-ranking method to improve web image retrieval by reordering the images retrieved from an image search engine. The re-ranking process is based on a relevance model.

5 Introduction (2/2) The web image retrieval with relevance model re-ranking

6 Relevance Model (1/2) Mathematical Formalism – is a vocabulary in some language –C is a large collection of documents –Define the relevant class R to be the subset of document in C which are relevant to some particular information needs –Define the relevance model to be the probability distribution –For every word, the relevance model gives the probability that observe w in the randomly selected some document D form the relevant class R and then picked a random word from D

7 Relevance Model (2/2) The important issue in IR is capturing the topic discussed in a sample of text, and to that end unigram models fare quite well. The choice of estimation techniques has a particularly strong influence on the quality of relevance models

8 Web image retrieval re-ranking (1/2) For each image I in the rank list returned from a web image search engine, there is one associated HTML document D Can we estimate the probability that the image is relevant given text of document D, i.e. Pr(R|D) ? By Bayes ’ Theorem –Pr(D) is equal for all documents and assume every document is equally possible –Pr(D|R) is needed to estimate if we want to know he relevance of the document

9 Web image retrieval re-ranking (2/2) Suppose the document D is consisted of words Apply common word independence assumption Pr(w|R) can be estimated without training data

10 Estimating a Relevance Model Estimation form a set of examples –There has full information about the set R of relevant documents Estimation without examples –There has no examples from which we could estimate directly

11 Estimation form a set of examples (1/2) There has perfect knowledge of the entire relevant class R The probability distribution P(w|R) is a randomly picked word from a random document will be the word w Let denote the probability of randomly picking document D from the relevant set R. Assume each relevant document is equally likely to be picked at random |R| is the total number of document in R

12 Estimation form a set of examples (2/2) The probability of randomly picking a document D and then observing the word w is Assumed that the document model of D completely determines word probabilities, when fix D, the probability of observing w is independent of the relevant class R and only depends on D The smoothing is achieved by interpolating the maximum-likelihood probability from (5) with some background distribution P(w) over the vocabulary.

13 Estimation without examples (1/6) In the ad-hoc information retrieval, there has only a short 2-3 word query, indicative of the user ’ s information need and no examples of relevant documents

14 Estimation without examples (2/6) Assume that for every information need there exists an underlying relevance model R Assigns the probabilities to the word occurrence in the relevant documents Given a large collection of documents and a user query

15 Estimation without examples (3/6) Method 1: i.i.d(random) sampling Assume that the query words and the word w in relevant documents are sampled identically Pick a distribution with probability p(D) and sample from it k+1 times. Then the total probability of observing w together with is Assumed w and all q i are sampled independently and identically to each other Final estimation

16 Estimation without examples (4/6) Method 2: conditional sampling Using chain rule and make the assumption that query words are independent given word w To estimate the conditional probabilities we compute the expectation over the universe C of our unigram models

17 Estimation without examples (5/6) Additional assumption that q i is independent of w once we picked a disbution D i Then the final estimation for the joint probability of w and query is

18 Estimation without examples (6/6) The word prior probability is The probability of picking a distribution D i based on w is P(D i ) is kept uniform over all the documents in C

19 Comparison of Estimation Method1 and 2 Probability Ranking Principle Document ranked the decreasing probability ratio Cross-Entropy

20 Ranking Criterion (1/2) Ranking by (2) will favor short document Use Kullback-Leibler(KL) divergence to avoid the short document bias is the unigram model from the document associated with rank i image in the list is the aforementioned relevance model

21 Ranking Criterion (2/2)

22 Experiment (1/4) Test the idea of re-ranking on six text queries to a large-scale web image search engine, Google Image Search. From July 2001 to March 2003 there are 425 million images indexed by it Six queries are chosen from image categories in Core Image Database Each text query is typed into Google Image Search and top 200 entries are saved for evaluation The 1200 images for six queries are fetched, they are manually labeled into three categories: relevant, ambiguous, irrelevant

23 Experiment (2/4)

24 Experiment (3/4) For each query, send the same keywords to Google Web Search and obtain a list of relevant documents via Google Web APIs Top-ranked 200 web documents are removed all the HTML tag and filter out the words appearing in the INQUERY stop-word list and stem words using Porter algorithm The smoothing parameter is 0.6

25 Experiments (4/4) The average precision at DCP over six queries

26 Conclusion The average precision at the top 50 documents with the precision improvement from the original 30-35% to 45 % The Internet users are usually with limit time and patience, high precision at top-ranked documents will save user a lot of efforts and help them find relevant images more easily and quickly