Text REtrieval Conference (TREC)

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Evaluating Search Engine
Search Engines and Information Retrieval
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
1 CS 430: Information Discovery Lecture 10 Cranfield and TREC.
Modern Information Retrieval
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
CS 430 / INFO 430 Information Retrieval
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
Overview of Search Engines
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Search Engines and Information Retrieval Chapter 1.
Text REtrieval Conference (TREC) The TREC Conferences Ellen Voorhees.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Information Retrieval Evaluation and the Retrieval Process.
Chapter 6: Information Retrieval and Web Search
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Text REtrieval Conference (TREC) Implementing a Question-Answering Evaluation for AQUAINT Ellen M. Voorhees Donna Harman.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Evaluation of Retrieval Effectiveness 1.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Evaluation of Retrieval Effectiveness 2.
Evaluation of Information Retrieval Systems Xiangming Mu.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Abstract  An abstract is a concise summary of a larger project (a thesis, research report, performance, service project, etc.) that concisely describes.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Search Engine Optimization
Information Retrieval in Practice
Information Retrieval in Practice
Evaluation of Information Retrieval Systems
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Evaluation Anisio Lacerda.
Information Retrieval (in Practice)
TJTS505: Master's Thesis Seminar
TREC T RIEVAL ONFERENCE E X T Eric Robinson Daniel Kunkle Alan Feuer
Evaluation of IR Systems
Preface to the special issue on context-aware recommender systems
Evaluation.
IR Theory: Evaluation Methods
CS 430: Information Discovery
Introduction to Information Retrieval
Evaluation of Information Retrieval Systems
Retrieval Evaluation - Reference Collections
Engleski jezik struke 3 Sreda,
Retrieval Performance Evaluation - Measures
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Presentation transcript:

Text REtrieval Conference (TREC) Rekha Ramesh (114388002) Abhishek Mallik (113050019) Aliabbas Petiwala (114380003) 9/19/2018 TREC

Motivation You are experimenting in Information Retrieval area You have built a retrieval system You want to test it with realistically sized test data You want to compare results across other similar systems Where to go? The answer is TREC …… 9/19/2018 TREC

What is TREC ? A workshop series that provides the infrastructure for large-scale testing of (text) retrieval technology Realistic test collections Uniform, appropriate scoring procedures A forum for the exchange of research methodology 9/19/2018 TREC

TREC Philosophy TREC is a modern example of the Cranfield tradition system evaluation based on test collection Emphasis on advancing the state of the art from evaluation results TREC’s primary purpose is not competitive benchmarking Experimental workshop 9/19/2018 TREC

Yearly Conference Cycle 9/19/2018 TREC

A Brief History of TREC 1992: first TREC conference Started by Donna Harman and Charles Wayne as 1 of 3 evaluations in DARPA,s TIPSTER program First 3 CDS of documents from this era Open to IR not funded by DARPA 25 groups submitted runs Two tasks: ad hoc retrieval, routing 2GB of text, 50 topics Primarily an exercise in scaling up systems 9/19/2018 TREC

Adhoc and Routing Task Routing task Adhoc task Same questions are always being asked, but that new data is being searched. For e.g. News clipping services, library profiling systems. Adhoc task New questions are being asked against a static set of data. For e.g. Researcher using a library 9/19/2018 TREC

Goals of TREC To encourage research in text retrieval based on large test collections To increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas To speed the transfer of technology from research labs into commercial products To increase the availability of appropriate evaluation techniques for use by industry and academia 9/19/2018 TREC

TREC Tracks TREC-1(1992) and TREC-2(1993) concentrated on ad hoc task and the routing task Starting in TREC-3, a variety of other ‘‘tracks,’’ were introduced (shown in table) The tracks invigorate TREC by focusing research on new areas or particular aspects of text retrieval The tracks represent the majority of the work done in TREC 9/19/2018 TREC

TREC Tracks Set of tracks in a particular TREC depends on: Interest of participants Appropriateness of task to TREC Needs of sponsors resource constraints Need to submit proposal for new track in writing to NIST 9/19/2018 TREC

Tasks Performed in TREC 9/19/2018 TREC

Use of only static test Use of streamed text Million Query Track: to explore ad hoc retrieval on a large document set. Robust Track: to focus research on improving the consistency of retrieval technology by concentrating on poorly performing topics. Use of streamed text Filtering: for each document in a document stream, decide to whether to retrieve it in response to standing query Routing, batch filtering, adaptive filtering 9/19/2018 TREC

Human in the LOOP Interactive: Investigate searching as an interactive task by examining the process as well as the outcome Observational study of subjects using live web to perform search task Controlled laboratory experiment of hypothesis suggested by observations HARD: High Accuracy Retrieval from Documents to improve retrieval performance by targeting retrieval results to the specific user 9/19/2018 TREC

Beyond just English CLIR: Cross Lingual Information Retrieval documents are in one language and topics are in another language Multilingual (Spanish & Chinese): Investigates the retrieval performance when both documents and topics is in a language other than English 9/19/2018 TREC

Beyond text Confusion Track: investigates how retrieval performance is affected through noisy and confused data Data processed through OCR SDR Track: retrieval methodology for spoken documents The Video Track: to promote progress in content based retrieval task 9/19/2018 TREC

Web Searching Web Track: build a test system that more closely mimics the retrieval environment of WWW Documents were a collection of web pages Terabyte Track: develop an evaluation methodology for terabyte-scale document collections. This collection contains a large proportion of the crawlable pages in .gov, including html and text, plus extracted text of pdf, word and postscript files Enterprise Track: satisfying a user who is searching the data of an organization to complete some task. Enterprise data generally consists of diverse types such as published reports, intranet web sites, and email, 9/19/2018 TREC

Answers, not Docs Novelty Track: investigate systems’ abilities to locate relevant and new (non redundant) information within an ordered set of documents. Q& A Track: systems that return actual answers, as opposed to ranked lists of documents, in response to a question. the test set of questions consisted of factoid, list, and definition questions. DEFINITION What is a comet? FACTOID How often does it approach the earth? LIST In what countries was the comet visible on its last return? OTHER 9/19/2018 TREC

Retrieval in Domains Legal Track : to focus specifically on the problem of e-discovery, the effective production of digital or digitized documents as evidence in litigation Genome Track: a forum for evaluation of information access systems in the genomics domain document collections were a set of full-text articles from several biomedical journals 9/19/2018 TREC

Personal Documents Blogs Track: explore information seeking behavior in the blogosphere to discover the similarities and differences between blog search and other types of search Spams Track: to evaluate how well systems are able to separate spam and ham (non-spam) when given an email sequence The task involved classifying email messages as ham or spam, differing in the amount and frequency of the feedback the system received. 9/19/2018 TREC

Participation in TREC 9/19/2018 TREC

TREC Impacts Test collections Incubator for new research areas Papers in general literature use TREC collections Incubator for new research areas PhD theses resulting from CLIR, SDR, QA Common evaluation methodology and improved measures for text retrieval Document best practices in IR research methodology for new researchers 9/19/2018 TREC

TREC Impacts Open forum for exchange of research Technology transfer TREC papers figure prominently in IR syllabi on the web Publication of all results prevents unsuccessful research from being duplicated Technology transfer Impact is far greater than just those who actually participate 9/19/2018 TREC

TREC Approach 9/19/2018 TREC

Creating a test collection for Adhoc task 9/19/2018 TREC

Documents Must be representative of real task of interest TREC Genre Diversity (subject, style and vocabulary) Amount Full text vs. abstract TREC Generally newswire/news paper General interest topics Full text 9/19/2018 TREC

Topics Distinguish between statement of user need (topic) & system data structure (query) Topics gives criteria for relevance Allows for different query construction techniques TREC topics are NOT all created equal 1-150: very detailed, rich content 151-250: method of topic creation resulted in focused and easy topics 201-250: single sentence only 301-450: title is set of hand-picked keywords A topic generally consists of four sections: an identifier, a title (consist of up to three words that best describe the topic), a description (one sentence description of the topic area), and a narrative (a concise description of what makes a document relevant) 9/19/2018 TREC

An Example Topic <num> Number: 951 <title> Mutual Funds <desc> Description: Blogs about mutual funds performance and trends. <narr> Narrative: Ratings from other known sources (Morningstar) correlative to key performance indicators (KPI) such as inflation, currency markets and domestic and international vertical market outlooks. News about mutual funds, mutual fund managers and investment companies. Specific recommendations should have supporting evidence or facts linked from known news or corporate sources. (Not investment spam or pure, uninformed conjecture.) 9/19/2018 TREC

Relevance Judgments (1) The relevance judgments turns a set of documents and topics into a test collection Retrieve all of the relevant documents and none of the irrelevant documents TREC usually uses binary relevance judgments 9/19/2018 TREC

Relevance Judgments (2) In test collections, judgments are usually binary, static and assumed to be complete But… Relevance is idiosyncratic Relevance does not entail utility Documents have different degrees of relevance Relevance can change over time with the same user For realistic collections judgments cannot be complete 9/19/2018 TREC

Relevance Judgments (3) Consistency Idiosyncratic nature of relevance judgments does not effect comparative results Incompleteness Complete judgments should be unbiased TREC pooling has been adequate to produce unbiased judgments 9/19/2018 TREC

Creating relevance judgments 9/19/2018 TREC

Evaluation at TREC In TREC, ad hoc tasks are evaluated using the trec-eval package This package reports about 85 different numbers for a run Including recall and precision at various cut-off levels And single valued summary measures that are derived from recall and precision. 9/19/2018 TREC

Evaluation Measure Criteria Related to user satisfaction Interpretable Able to average or collect Have high discrimination power Able to be analyzed 9/19/2018 TREC

Ranked Retrieval Chart 9/19/2018 TREC

Evaluation Contingency Table Relevant Non Relevant Retrieved r n-r Non Retrieved R-r N-n-R+r N - Number of docs in collection n - Number of docs retrieved R - Number of docs relevant r - Number of relevant docs retrieved 9/19/2018 TREC

Recall Precision Graph 9/19/2018 TREC

Uninterpolated R-P curve for Single Topic 9/19/2018 TREC

Interpolated R-P curve for Single Topic 9/19/2018 TREC

Single Number Summary Score Precision (n) : r / n Recall (n) : r / R Average: Avg rd( Prec (rank of rd)) R-Precision :Prec (R) Recall at .5 precision Use Prec (10) if precision is < 0.5 in top 10 Rank of first relevant (expected search length) 9/19/2018 TREC

Average Precision Advantage Disadvantage Sensitive to entire ranking: changing a rank will change final score Stable: a small change in ranking makes relatively small change in scores Has both precision and recall oriented factors Ranks closest to 1 receives largest weight computed over all relevant documents Disadvantage Less easily interpretable 9/19/2018 TREC

Summary TREC emphasizes individual experiments evaluated against benchmark task Leverages modest government investments into substantially more R&D than could funded directly Improves state-of-the-art Accelerates technology transfer 9/19/2018 TREC

References (1) Harman D. (Ed.). (1993). The First Text REtrieval Conference (TREC-1). National Institute of Standards and Technology Special Publication 500-207, Gaithersburg, Md. 20899 Donna Harman. Overview of the fourth Text REtrieval Conference (TREC-4). In D. K. Harman,editor, Proceedings of the Fourth Text REtrieval Conference (TREC-4), pages 1–23, October 1996. NIST Special Publication 500-236. Ellen M. Voorhees. Overview of the sixteenth Text REtrieval Conference (TREC-16). In Ellen M. Voorhees, editor, Proceedings of the Sixteenth Text REtrieval Conference (TREC-4), pages 1–16, November 6–9, 2007. NIST Special Publication All subsequent 19 Text REtrieval Conference (TREC) publications Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36:697–716, 2000 9/19/2018 TREC

References (2) Chris Buckley, Darrin Dimmick, Ian Soboroff, and Ellen Voorhees. Bias and the limits of pooling for large collections. Information Retrieval, 10:491–508, 2007. by Kalervo Järvelin, Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422–446, 2002. Chris Buckley. Trec-eval IR evaluation package. Available from http:/trec.nist.gov/trec-eval/ 9/19/2018 TREC