Super Awesome Presentation Dandre Allison Devin Adair.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

ICML 2009 Yisong Yue Thorsten Joachims Cornell University
Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.
Introduction to Information Retrieval
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
A Machine Learning Approach for Improved BM25 Retrieval
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
Evaluating Search Engine
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Online Search Evaluation with Interleaving Filip Radlinski Microsoft.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings  Interleaving.
Retrieval Evaluation Hongning Wang
Modern Retrieval Evaluations Hongning Wang
An Experimental Comparison of Click Position-Bias Models Nick Craswell Onno Zoeter Michael Taylor Bill Ramsey Microsoft Research.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
Learning to Rank From Pairwise Approach to Listwise Approach.
COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.
Retrieval Evaluation Hongning Wang
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Paired Experiments and Interleaving for Retrieval Evaluation Thorsten Joachims, Madhu Kurup, Filip Radlinski Department of Computer Science Department.
Post-Ranking query suggestion by diversifying search Chao Wang.
Chapter 8: Evaluation Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
Modern Retrieval Evaluations Hongning Wang
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Sampath Jayarathna Cal Poly Pomona
Modern Retrieval Evaluations
Evaluation of IR Systems
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Lecture 10 Evaluation.
Modern Information Retrieval
Lecture 6 Evaluation.
Evaluating Information Retrieval Systems
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Feature Selection for Ranking
Cumulated Gain-Based Evaluation of IR Techniques
How does Clickthrough Data Reflect Retrieval Quality?
INF 141: Information Retrieval
Learning to Rank with Ties
Retrieval Evaluation - Measures
Presentation transcript:

Super Awesome Presentation Dandre Allison Devin Adair

Comparing the Sensitivity of Information Retrieval Metrics Filip Radlinski Microsoft Cambridge, UK Nick Craswell Microsoft Redmond, WA, USA

How do you evaluate Information Retrieval effectiveness? Precision (P) Mean Average Precision (MAP) Normalized Discounted Cumulative Gain (NDCG)

Precision Average the number of relevant documents in the top 5 for a given query Average over all queries

Mean Average Precision For each relevant document in the top 10, find the precision up until its rank for a given query Sum the precisions and normalize by the known relevant documents Average over all queries

Normalized Discounted Cumulative Gain Normalize the Discounted Cumulative Gain by the Ideal Discounted Cumulative Gain for a given query Average over all queries

Normalized Discounted Cumulative Gain Discounted Cumulative Gain – Give more emphasis to relevant documents by using 2 relevance – Give more emphasis to earlier ranks by using a logarithmic reduction factor – Sums over top 5 Ideal Discounted Cumulative Gain – Same as DCG by sorts by relevance

What’s the problem Sensitivity Might reject small but significant improvements Bias Judges removed from search process Fidelity Evaluation should reflect user success!!

Alternative Evaluation Use actually user searches Judges become actual users Evaluation becomes user success

Interleaving System A Results + System B Results Team-Draft Algorithm

Captain AhabCaptain Barnacle

Captain AhabCaptain Barnacle Interleaved List

Crediting Whoever has the most distinct clicks is considered “better” In case of tie - ignored

Retrieval Systems Pairs Major improvements – majorAB – majorBC – majorAC Minor improvements – minorE – minorD

Evaluation 12,000 queries – Samples n-times with replacement count sampled queries where rankers differ – Ignores ties Percent where better ranker scores better

Interleaving Evaluation

Credit Assignment Alternatives Shared top k – Ignore? – Lower clicks treated the same Not all clicks are created equal – log(rank) – 1/rank – Top – Bottom

Conclusions Performance measured by: – Judgment-based – Usage-based Surprise surpise small sample size is stupid – (check out that alliteration) Interleaving is transitive