Relevance and Reinforcement in Interactive Browsing

Slides:

Advertisements

Similar presentations

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Introduction to Information Retrieval (Part 2) By Evren Ermis.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

K nearest neighbor and Rocchio algorithm

Morris LeBlanc.  Why Image Retrieval is Hard?  Problems with Image Retrieval  Support Vector Machines  Active Learning  Image Processing ◦ Texture.

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

Learning Techniques for Information Retrieval Perceptron algorithm Least mean.

Modern Information Retrieval Chapter 5 Query Operations.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.

Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

MARS: Applying Multiplicative Adaptive User Preference Retrieval to Web Search Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.

Learning Techniques for Information Retrieval We cover 1.Perceptron algorithm 2.Least mean square algorithm 3.Chapter 5.2 User relevance feedback (pp )

Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

Querying Structured Text in an XML Database By Xuemei Luo.

Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.

Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Relevance Feedback Hongning Wang

Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Queensland University of Technology

Chapter 7. Classification and Prediction

Information Retrieval in Practice

Evaluation of IR Systems

An Empirical Study of Learning to Rank for Entity Search

Compact Query Term Selection Using Topically Related Text

Relevance Feedback Hongning Wang

Applying Key Phrase Extraction to aid Invalidity Search

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.

Retrieval Utilities Relevance feedback Clustering

Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.

INF 141: Information Retrieval

Learning to Rank with Ties

Restructuring Sparse High Dimensional Data for Effective Retrieval

Lab 2: Information Retrieval

Topic: Semantic Text Mining

Chi-An (Rocky) Wu, Cadence Design Systems, Inc.

Presentation transcript:

Relevance and Reinforcement in Interactive Browsing Anton Leuski Proceedings of CIKM'00, Washington, DC November 6-11, 2000. pp. 119-126 Summarized by Seung-Joon Yi

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Introduction The document selection procedure can be implemented as a “wizard” that comes up right after the documents are retrieved. The wizard examines the relevance values of each unexamined document and highlight the most likely to be relevant documents As the user examines the documents and marks them as relevant or not, the wizard reevaluates its estimations and changes the highlighting accordingly. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Introduction(cont’d) This paper concentrate on analysis of documents that are already present in the retrieved set. In addition, this paper focues only in the inter-document similarity information obtained after the original retrieval session and ignores all term-level statistics. The feedback problem is formulated in terms of reinforcement learning. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Related Work Relevance feedback Rocchio’s algorithm Incremental relevance feedback Connectionist approach (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Search Strategy The problem of navigating the retrieved document set can be naturally expressed as a reinforcement learning problem. Environment State Dt:defined by the inter-document similarities, what documents were examined, and what relevance jugments were assigned. Action d: next document to examine. Reward:whether the examined document is relevant or not. Goal:find all relevant documents ASAP. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Search Strategy(cont’d) Agent implementation is defined at each time step by a mapping between a state representation combined with an action and a numeric value, F(Dt,d) (search strategy function) Agent computes the mapping for each unexamined document and selects d with the highest value of F(Dt,d) TD-learning learning rule (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Simple Rocchio Design search strategy function F1(Dt,d) as a single perceptron unit that has four inputs:bias or constant input, document similarity to query, average similarity between the document and all examined relevant documents, and average similarity between the document and all examined non-relevant documents. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Application Coefficients Design secondary search strategy function F2(Dt,d) as a linear combination of three instances of the search strategy function from the previous section (F1(Dt,d) ), where the coefficients-called application coefficients-are smooth functions of the number of the documents. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Tile Coding The feature space is partitioned with a regular grid and a single number is assigned to each cell in the partition. The set of tilings define the final function F3(Dt,d) :given a point in the feature space, a tile containing the point is selected from each grid and the average of the corresponding numbers is returned. Can approximate more complex functions. Five dimensions(4 of them are the same feature that used in F2, and the 5th feature is the number of examined documents squared) (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Experiments Retrieved document sets for experiments are generated by running the Inquery retrieval engine on two standard TREC collections. The engine assigns what is called a belief score to each document in the collection, which is used as the query-document similarity value. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Document Representation To compute the inter-document similarities, a vector-space approach where each document is represented by a vector term V is used. Weight of the ith term is computed using the Inquery weighting fomula. The inter-document similarity is measured by the cosine of the angle between two vectors. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Experimental setup TREC ad-hoc queries with their corresponding collections and the relevance judgments supplied bu NIST accessors. For each TREC topic A query constructed by extensive analysis and expansion The description foeld of the topic The title of the topic A query constructed from the title by expanding it using Local Text Analysis(LCA) (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Training and Evaluation Procedure Model:a situation where a user located the first relevant document by following the ranked list. Experimental task:given the highest ranked relevant document as the starting point, find the rest of the relevant information. Perforcemece measure:Average precision on the unexamined portion of the document set. Data sets:training,testing,evaluation subsets from 8 data sets-one for each query type on each collection (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Training and Evaluation Procedure(cont’d) Parameters Each search strategy function began with all parameters initialized to zero. Learning rate η:0.1 Discount factor ρ:0.4 Application coefficients μ:1,25,50 and σ=6. Tiles: 256 tiles. The learning process terminated when the average precision failed to improve for several iterations. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Baseline Evaluation Procedure Starting with the ranked list, follow it until the first relevant document is found. At that point all examined documents analyzed and a new query is created by expanding the old query with several top ranked terms from the examined documents. The rest of the unexamined documents are re-ordered using the modified query and the process continues until all documents are examined. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Results 1st strategy: 5% improvement 2nd strategy:9% improvement 3rd strategy:10% improvement (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Results Used more documents(100) Similar improvements (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Results No starting point case The search strategies were to start without any relevance information and to explore the whole retrieved set. Small but significant improvement (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Discussions Reward function Using relevance value:RL maximizes total discounted relevance In this paper, perforce was evaluated using the average precision Alternative reward function:use P at the end of the search and all intermediate reward set to zero Learned similar search strategy (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Conclusions Formalized the relevance feedback problem in terms of reinforcement learning. The technique is very successful when only the inter-document similarity data is available and no term information is provided. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/