Ben Carterette Paul Clough Evangelos Kanoulas Mark Sanderson.

Slides:

Advertisements

Similar presentations

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud

Advertisements

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

On Enhancing the User Experience in Web Search Engines Franco Maria Nardini.

Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

Evaluating Search Engine

Search Engines and Information Retrieval

Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Modern Information Retrieval Chapter 5 Query Operations.

A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.

1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.

1 CS 430: Information Discovery Lecture 20 The User in the Loop.

Experimental Components for the Evaluation of Interactive Information Retrieval Systems Pia Borlund Dawn Filan 3/30/04 610:551.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Problem Addressed The Navigation –Aided Retrieval tries to provide navigational aided query processing. It claims that the conventional Information Retrieval.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

The Role of Automated Categorization in E-Government Information Retrieval Tanja Svarre & Marianne Lykke, Aalborg University, DK ISKO conference, 8th of.

Modern Retrieval Evaluations Hongning Wang

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Search Engines and Information Retrieval Chapter 1.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Tag Data and Personalized Information Retrieval 1.

Information Retrieval Evaluation and the Retrieval Process.

Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.

The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield

Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.

Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.

Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.

IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.

IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

Structure of IR Systems INST 734 Module 1 Doug Oard.

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Search Tools and Search Engines Searching for Information and common found internet file types.

Retroactive Answering of Search Queries Beverly Yang Glen Jeh.

Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.

ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.

The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.

VISUAL ARTS 2 UNIT COURSE

Why IR test collections are so bad Mark Sanderson University of Sheffield.

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Evaluation of Information Retrieval Systems Xiangming Mu.

Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.

Information Retrieval Quality of a Search Engine.

Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.

ASSOCIATIVE BROWSING Evaluating 1 Jin Y. Kim / W. Bruce Croft / David Smith by Simulation.

We are Gathered Here Together: Why? Introduction to the NII Workshop on Whole-Session Evaluation of Interactive Information Retrieval Systems Whole-Session.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.

Developments in Evaluation of Search Engines

Evaluation Anisio Lacerda.

User Characterization in Search Personalization

Modern Information Retrieval

Exploratory Search Beyond the Query–Response Paradigm

تهیه و تنظیم: فلورا باغبان رضوان

Retrieval Performance Evaluation - Measures

Preference Based Evaluation Measures for Novelty and Diversity

Presentation transcript:

Ben Carterette Paul Clough Evangelos Kanoulas Mark Sanderson

 Put user in the evaluation loop Cranfield Paradigm Simple user model Controlled experiments Reusable but static test collections Interactive Evaluation/ Live Labs Full user participation Many degrees of freedom Unrepeatable experiments Session Track

 Simulate an interactive experiment ◦ Compare IIR systems  by controlling user interactions ◦ Build a collection that is portable and reusable ◦ Devise measures to evaluate the utility a user obtains throughout a session  Test systems in a large variety of cases that lead to sessions

 Categorization on “who’s to blame” basis? ◦ Corpus  There is no composite document that fulfills the users need ◦ User  Users cannot express their need by the appropriate query  Users learn about their need throughout the session ◦ System

 Accumulate information through the session ◦ Learn from the past  Improve on the current query ◦ Look into the future

(G1) Test the ability of retrieval systems to use the history of user interactions (past queries,…) to improve performance on the current query (G2) Evaluate system performance over an entire session instead of a single query

 All participants should be provided with a fixed history of user interactions Q1Q1 RL 1 QnQn RL n ✓ ✓ ✓ ✓ Qn +1

 For each session participants submitted 4 ranked lists (RLs) (RL1) current query (RL2) current query & past queries in the session (RL3) current query & past queries in the session and ranked URLs (RL4) current query & past queries in the session, ranked URLs, clicks and dwell times

 Come up with topics that can lead to sessions with more than one query. ◦ In 2011 we generated about 100 topics, collected about 1000 sessions:  90% were single session queries. ◦ In 2012 we got about the same percentage of single-query sessions

 Come up with topics that can lead to different search task types. ◦ In 2011 we generated factual tasks with specific goal(s) ◦ In 2012 we generated a variety of task types  factual / intellectual tasks  specific / amorphous goals

 Evaluating retrieval systems ◦ The notion of relevance may change throughout the session ◦ Novelty needs to be considered ◦ Different levels of diversification throughout the session should be applied

 Whole-session evaluation

 Simulate user interactions ◦ Given a topic, a ranked list, and relevance information Simulate browsing behavior Simulate clicks, dwell times, etc. Simulate query reformulation  Quantify utility

 Built static test collection  Evaluate retrieval performance when session information is available  Still far from a whole-session evaluation ◦ Is it possible to build reusable and portable but dynamic test collections? ◦ Is it worth it?