Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Evaluating Search Engine
Search Engines and Information Retrieval
1 CS 430: Information Discovery Lecture 10 Cranfield and TREC.
Modern Information Retrieval
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde.
Evaluating the Performance of IR Sytems
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
1 Discussion Class 5 TREC. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Search Engines and Information Retrieval Chapter 1.
Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Performance Measurement. 2 Testing Environment.
Information Retrieval
Reference Collections: Collection Characteristics.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Evaluation of Retrieval Effectiveness 1.
1 CS 430: Information Discovery Lecture 8 Evaluation of Retrieval Effectiveness II.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Evaluation of Information Retrieval Systems Xiangming Mu.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
Retrieval Evaluation Modern Information Retrieval, Chapter 3
Evaluation of Information Retrieval Systems
Evaluation Anisio Lacerda.
IR Theory: Evaluation Methods
IL Step 3: Using Bibliographic Databases
Introduction to Information Retrieval
Evaluation of Information Retrieval Systems
Retrieval Evaluation - Reference Collections
Retrieval Performance Evaluation - Measures
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Presentation transcript:

Reference Collections: Task Characteristics

TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches for information retrieval from large text collections: –Uniform scoring procedures –Large corpus of news and technical texts –Texts tagged in SGML (includes some metadata and document structure) –Specified tasks

Example Task Number: 168 Topic: Financing AMTRAK Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). Narrative: A relevant document must provide information on the government’s responsibility to make AMTRAK an economically viable entity. It could also discuss the privatization of AMTRAK as an alternative to continuous government subsidies. Documents comparing government subsidies given to air and bus transportation with those provided to AMTRAK would also be relevant.

Deciding What is Relevant Pooling method –Set (pool) of potentially relevant documents is obtained by combining top N results from various retrieval systems. –Humans then examine these to determine which are truly relevant –Assumes relevant documents will be in the pool and that documents not in the pool are not relevant. –Assumptions have been verified (at least for evaluation purposes)

Types of TREC Tasks Ad hoc tasks: –New queries against static collection –IR systems return ranked results –Systems get task and collection Routing tasks: –Standing queries for changing collection –Basically a batch-mode filtering task –Example: identifying topic from AP newswire –Results must be ranked –Systems get task and two collections, one for training and one for evaluation

Secondary Tasks at TREC Chinese –Documents and queries in Chinese Filtering –Determine whether each new document is relevant (no rank order) Interactive –Human searcher interacts with system to determine relevant (no rank order) NLP –Examining value of NLP in IR

Secondary Tasks at TREC Cross Languages –Documents in one language while tasks in another High Precision –Retrieve 10 documents that answer a given information request in 5 minutes. Spoken Document Retrieval –Documents are transcripts of radio broadcasts Very Large Corpus –> 20 GB collection

Evaluation Measures Summary Table Statistics –# of requests in task, # of documents retrieved, # of relevant docs retrieved, total # of relevant docs Recall-Precision Averages –11 standard recall levels Document Level Averages –Avg. precision for specified # of retrieved docs (R) Average Precision Histogram –Graph showing how algorithm did for each request compared to average of all algorithms

Reference Collections: Collection Characteristics

CACM Collection 3204 Communications of the ACM articles Focus of collection: computer science Structured subfields: –Author names –Date information –Word stems from title and abstract –Categories from hierarchical classification –Direct references between articles –Bibliographic coupling connections –Number of co-citations for each pair of articles

CACM Collection 3204 Communications of the ACM articles Test information requests: –52 information requests in natural language with two Boolean query expressions –Average of 11.4 terms per query –Requests are rather specific with an average of about 15 relevant documents –Result in relatively low precision and recall

ISI Collection 1460 documents from the Institute of Scientific Information Focus of collection: information science Structured subfields: –Author names –Word stems from title and abstract –Number of co-citations for each pair of articles

ISI Collection 1460 documents from the Institute of Scientific Information Test information requests: –35 information requests in natural language with Boolean query expressions –Average of 8.1 terms per query –41 information requests in NL without Boolean query expression –Requests are fairly general with an average of about 50 relevant documents –Higher precision and recall

Observation Collection# of Docs# of TermsTerms/Doc CACM ISI Number of terms increases slowly with number of documents

Cystic Fibrosis Collection 1239 articles with “Cystic Fibrosis” index in MEDLINE Structured subfields: –MEDLINE accession number –Author –Title –Source –Major subjects –Minor subjects –Abstract (or extract) –References in the document –Citations to the document

Cystic Fibrosis Collection 1239 articles with “Cystic Fibrosis” index in MEDLINE Test information requests: –100 information requests –Relevance assessed by four experts with a scale of 0 (not relevant), 1 (marginal relevance), and 2 (high relevance) –Overall relevance is sum (0-8)

Discussion Questions In developing a search engine: –How would you use metadata (e.g. author, title, abstract)? –How would you use document structure? –How would you use references, citations, co-citations? –How would you use hyperlinks?