Lessons Learned from Information Retrieval Chris Buckley Sabir Research

Slides:



Advertisements
Similar presentations
Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
ImageCLEF breakout session Please help us to prepare ImageCLEF2010.
Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review Gordon V. Cormack University of Waterloo
Evaluating Search Engine
Search Engines and Information Retrieval
Evaluating Evaluation Measure Stability Authors: Chris Buckley, Ellen M. Voorhees Presenters: Burcu Dal, Esra Akbaş.
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Modern Information Retrieval
INFO 624 Week 3 Retrieval System Evaluation
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade
CS/Info 430: Information Retrieval
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Evaluating the Performance of IR Sytems
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Information Search UDSM Library. Search Techniques Information search techniques largely dependent on how information is structured and how the search.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
University of Malta CSA4080: Topic 8 © Chris Staff 1 of 49 CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Information Retrieval Evaluation and the Retrieval Process.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Users and Assessors in the Context of INEX: Are Relevance Dimensions Relevant? Jovan Pehcevski, James A. Thom School of CS and IT, RMIT University, Australia.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Understanding Search Engines. Basic Defintions: Search Engine Search engines are information retrieval (IR) systems designed to help find specific information.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Performance Measurement. 2 Testing Environment.
A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.
Jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation of Information Retrieval Systems Xiangming Mu.
Ch 8 Estimating with Confidence 8.1: Confidence Intervals.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
How to structure good history writing Always put an introduction which explains what you are going to talk about. Always put a conclusion which summarises.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for.
SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Developments in Evaluation of Search Engines
Evaluation Anisio Lacerda.
IR Theory: Evaluation Methods
Evaluation of IR Performance
Introduction to Information Retrieval
CS246: Information Retrieval
Cumulated Gain-Based Evaluation of IR Techniques
Presentation transcript:

Lessons Learned from Information Retrieval Chris Buckley Sabir Research

Chris Buckley – ICAIL 072 Legal E-Discovery Important, growing problem Current solutions not fully understood by people using them Imperative to find better solutions that scale Evaluation required How do we know we are doing better? Can we prove a level of performance?

Chris Buckley – ICAIL 073 Lack of Shared Context The basic problem of both search and e- discovery Searcher does not necessarily know beforehand “vocabulary” or background of either author or intended audience of documents to be searched

Chris Buckley – ICAIL 074 Relevance Feedback Human judges some documents as relevant, system finds others based on judgements Only general technique to improve system knowledge of context proven successful –works from small collections of 1970’s to large collections of present (TREC HARD track) Difficult to apply to discovery –Need to change entire discovery process

Chris Buckley – ICAIL 075 Toolbox of other techniques Many other aids to search –Ontologies, linguistic analysis, semantic analysis, data mining, term relationships Good techniques for IR uniformly: –Give big wins for some searches –Give mild losses for others Need a set of techniques, a toolbox In practice for IR research, issue not finding big wins, but avoiding the losses

Chris Buckley – ICAIL 076 Implications of toolbox No expected silver bullet AI solution Boolean search will not expand to accommodate combinations of solutions Test collections are critical

Chris Buckley – ICAIL 077 Test Collection Importance Needed to develop tools Needed to develop decision procedures of when to use tools Toolbox requirement means needed to distinguish a good overall system from one with a good tool –All systems are able to show searches on which individual tools work well –Good system shows performance gain on entire set of searches.

Chris Buckley – ICAIL 078 Test Collection Composition Large set of realistic documents Set (at least 30) of topics or information needs Set of judgements: what documents are responsive (or non-responsive) to each topic –Judgements are expensive and limit how test collection results can be interpreted

Chris Buckley – ICAIL 079 Incomplete Judgements Judgements are too time consuming and expensive to be complete (judge every one) Pool retrieved documents from a variety of systems Feasible, but: –Known incomplete –We can’t even accurately estimate how incomplete

Chris Buckley – ICAIL 0710 Inexact Judgements Humans differ substantially on judgements Standard TREC collections: –Topics include 1-3 paragraphs describing what makes a document relevant –Given same pool of documents, 2 humans overlap on 70% of their relevant sets 76% agreement on small TREC legal test

Chris Buckley – ICAIL 0711 Implications of Judgements No gold standard of perfect performance is even possible Any system claiming better than 70% precision at 70% recall is working on a problem other than general search Almost impossible to get useful absolute measures of performance

Chris Buckley – ICAIL 0712 Comparative Evaluation Comparisons between systems on moderate size collections (several GBytes) are solid. Comparative results on larger collections (500 GBytes) are showing strains –Believable but larger error margin –Active area of research Overall goal for e-discovery has to be comparative evaluation

Chris Buckley – ICAIL 0713 Sabir TREC Legal Results Submitted 7 runs –Very basic approach (1995 technology) –3 tools from my toolbox –3 query variations One of the top systems All results basically the same –tools did not help on average