Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.

Slides:



Advertisements
Similar presentations
1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Advertisements

Elliot Holt Kelly Peterson. D4 – Smells Like D3 Primary Goal – improve D3 MAP with lessons learned After many experiments: TREC 2004 MAP = >
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Group 3 Chad Mills Esad Suskic Wee Teck Tan. Outline  System and Data  Document Retrieval  Passage Retrieval  Results  Conclusion.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval Models: Probabilistic Models
Evaluating Search Engine
Passage Retrieval and Re-ranking Ling573 NLP Systems and Applications May 3, 2011.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Modern Information Retrieval
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Modern Information Retrieval Chapter 5 Query Operations.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Evaluating the Performance of IR Sytems
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Evaluation CSC4170 Web Intelligence and Social Computing Tutorial 5 Tutor: Tom Chao Zhou
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Team 8 Mowry, Srinivasan and Wong Ling 573, Spring University of Washington.
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Overview of Search Engines
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
Evaluation INST 734 Module 5 Doug Oard. Agenda  Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.
Deep Processing QA & Information Retrieval Ling573 NLP Systems and Applications April 11, 2013.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
Evaluation of IR Systems
IR Theory: Evaluation Methods
ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Retrieval Performance Evaluation - Measures
Information Retrieval and Web Design
Precision and Recall Reminder:
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011

Main Components Document retrieval Evaluated with Mean Average Precision Passage retrieval/re-ranking Evaluated with Mean Reciprocal Rank (MRR)

Mean Average Precision (MAP) Traverse ranked document list: Compute precision each time relevant doc found Average precision up to some fixed cutoff R r : set of relevant documents at or above r Precision(d) : precision at rank when doc d found Mean Average Precision: 0.6 Compute average of all queries of these averages Precision-oriented measure Single crisp measure: common TREC Ad-hoc

Baselines Indri: Default settings: #combine 2003: MAP: : No processing: MAP: 0.13 Simple concatenation: MAP: 0.35 Conservative pseudo-relevance feedback: 5 top docs, 5 terms, default weighting: MAP: 0.35 Per-query variation

MRR Classical: Return ranked list of answer candidates Idea: Correct answer higher in list => higher score Measure: Mean Reciprocal Rank (MRR) For each question, Get reciprocal of rank of first correct answer E.g. correct answer is 4 => ¼ None correct => 0 Average over all questions

Baselines 2004: Indri passage retrieval 100 term passages Strict: MRR: 0.19 Lenient: MRR: 0.28

Pattern Matching Litkowski pattern files: Derived from NIST relevance judgments on systems Format: Qid answer_pattern doc_list Passage where answer_pattern matches is correct If it appears in one of the documents in the list

Pattern Matching Litkowski pattern files: Derived from NIST relevance judgments on systems Format: Qid answer_pattern doc_list Passage where answer_pattern matches is correct If it appears in one of the documents in the list MRR scoring Strict: Matching pattern in official document Lenient: Matching pattern

Examples Example Patterns 1894 (190|249|416|440)(\s|\-)million(\s|\-)miles? APW NYT NYT NYT NYT million-kilometer APW million - mile NYT Ranked list of answer passages APW the casta way weas APW million miles APW million miles

Evaluation Issues Exhaustive vs Pooled scoring Exhaustive Every document/passage evaluated for every query Pooled scoring All participant responses are collected and scored All correct responses form basis for patterns/qrels Scores usually well-correlated with exhaustive Exhaustive: More thorough; MUCH!! more expensive Pooled: Cheaper, faster; penalizes non-conforming systems

Presentations Group 8 Mowry, Srinivasan, Wong Group 4 Alotayq, Pham, Wang Group 9 Hermsen. Lushtak, Lutes