Evaluating Information Retrieval Systems

Slides:



Advertisements
Similar presentations
Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Advertisements

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
Introduction to Information Retrieval
Super Awesome Presentation Dandre Allison Devin Adair.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
A Quality Focused Crawler for Health Information Tim Tang.
Evaluating Search Engine
Presenters: Başak Çakar Şadiye Kaptanoğlu.  Typical output of an IR system – static predefined summary ◦ Title ◦ First few sentences  Not a clear view.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
INFO 624 Week 3 Retrieval System Evaluation
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Retrieval Evaluation Hongning Wang
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011.
Authors: Maryam Kamvar and Shumeet Baluja Date of Publication: August 2007 Name of Speaker: Venkatasomeswara Pawan Addanki.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Issues in Assessment Design, Vertical Alignment, and Data Management : Working with Growth Models Pete Goldschmidt UCLA Graduate School of Education &
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Post-Ranking query suggestion by diversifying search Chao Wang.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Chapter 8: Evaluation Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
Sampath Jayarathna Cal Poly Pomona
Inferring People’s Site Preference in Web Search
Evaluation of IR Systems
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Lecture 10 Evaluation.
Multimedia Information Retrieval
Modern Information Retrieval
Evaluation of IR Performance
Lecture 6 Evaluation.
Recuperação de Informação B
Cumulated Gain-Based Evaluation of IR Techniques
INF 141: Information Retrieval
Learning to Rank with Ties
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Recuperação de Informação B
Topic: Semantic Text Mining
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

Evaluating Information Retrieval Systems Kostousov Sergei sergkosto94@gmail.com 1 Hannover 14 June 2016

I. Novelty and Diversity in Information Retrieval Evaluation IR system Redundancy - Novelty + Ambiguity - Diversity + Measures MAP bpref nDCG These evaluation measures may produce unsatisfactory results when redundancy and ambiguity are considered. 2 Sergei Kostousov 2

I. Novelty and Diversity in Information Retrieval Evaluation Web Search Example 3 Sergei Kostousov 3

I. Novelty and Diversity in Information Retrieval Evaluation Question Answering Example 4 Sergei Kostousov 4

I. Novelty and Diversity in Information Retrieval Evaluation EVALUATION FRAMEWORK Principal: «If an IR system’s response to each query is a ranking of documents in order of decreasing probability of relevance, the overall effectiveness of the system to its user will be maximized» document binary random variable (relevance) info need occasioning a user to formulate q Information nuggets – any binary property of document Answer for a query Topicality Indicator of part of the site Specific fact 5 Sergei Kostousov 5

I. Novelty and Diversity in Information Retrieval Evaluation Formulate objective function & & Relevance Judgments J(d, i) = 1 if the assessor has judged that d contains nugget ni, and J(d, i) = 0 if not the possibility of assessor error 6 Sergei Kostousov 6

I. Novelty and Diversity in Information Retrieval Evaluation Ambiguity and Diversity «queries are linguistically ambiguous» Redundancy and Novelty 7 Sergei Kostousov 7

I. Novelty and Diversity in Information Retrieval Evaluation Normalized Discounted Cumulative Gain measure (nDCG) 1. gain vector 2. cumulative gain vector 3. discounted cumulative gain vector Computing Ideal Gain 8 Sergei Kostousov 8

I. Novelty and Diversity in Information Retrieval Evaluation Conclusion Goal was to define a workable evaluation framework for information retrieval that accounts for novelty and diversity in a sound fashion Serious criticism could be applied to many links in our chain of assumptions Despite these concerns, we believe we have made substantial progress towards our goal. Unusual features of our approach include recognition of judging error and the ability to incorporate a user model. 9 Sergei Kostousov 9

II. Adaptive Effort for Search Evaluation Metrics Searchers wish to find more but spend less We need accurately measure the amount of relevant information they found (gain) and the effort they spent (cost) Metrics: nDCG, GAP, RBP and ERR Two suggested approaches: parameter for the ratio of effort between relevant and non-relevant entries; time-based that measures effort by the expected time to examine the results 10 Sergei Kostousov 10

II. Adaptive Effort for Search Evaluation Metrics Existing IR Evaluation Metrics M1: E(gain)/E(effort) M2: E(gain/effort) 11 Sergei Kostousov 11

II. Adaptive Effort for Search Evaluation Metrics 12 Sergei Kostousov 12

II. Adaptive Effort for Search Evaluation Metrics Adaptive Effort Metrics 1. Parameter relevant non-relevant 2. Time-Based relevance grades: 0,1…rmax effort vector: [e0, e1, e2, ..., ermax] Relevance grades (r = 0, 1, 2) >> the effort vector is 13 Sergei Kostousov 13

II. Adaptive Effort for Search Evaluation Metrics Computation Relevance grade Effort vector 14 Sergei Kostousov 14

II. Adaptive Effort for Search Evaluation Metrics 15 Sergei Kostousov 15

II. Adaptive Effort for Search Evaluation Metrics Experiment 16 Sergei Kostousov 16

II. Adaptive Effort for Search Evaluation Metrics Conclusion Adaptive can better indicate users search experience compared with statics Future research - broad set of queries of different types - explore the effect of different effort levels 17 Sergei Kostousov 17