Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
Dissecting A Journal Article
Literature Survey, Literature Comprehension, & Literature Review.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Hinrich Schütze and Christina Lioma
Search Engines and Information Retrieval
ISP 433/533 Week 2 IR Models.
Dissemination and Critical Evaluation of Published Research Peg Bottjen, MPA, MT(ASCP)SC.
Modern Information Retrieval
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Practice
The Simple Regression Model
A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
1 Discussion Class 5 TREC. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Water Quality Completing Lab 8. Reminder - What’s Due for Lab 8 Full Lab Report (One per Group) (a)Title & Abstract start with catchy title and list of.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Search Engines and Information Retrieval Chapter 1.
Academic Writing for MSc dissertation students Part 2: Structure ©Dr Alison Crerar, School of Computing, Napier University, June (These slides include.
Literature Review Evaluating Existing Research
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Project Guidelines. Introduction Introduction should include support/justification “why” the research should be done. The focus is on the dependent variable.
Review of Literature Announcement: Today’s class location has been rescheduled to TEC 112 Next Week: Bring four questions (15 copies) to share with your.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Qualitative Papers. Literature Review: Sensitizing Concepts Contextual Information Baseline of what reader should know Establish in prior research: Flaws.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Scientific Paper. Elements Title, Abstract, Introduction, Methods and Materials, Results, Discussion, Literature Cited Title, Abstract, Introduction,
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
The World Wide Web is a great place to find more information about a topic. But there are a lot of sites out there—some are good and some are not so good.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Business and Management Research
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Principals of Research Writing. What is Research Writing? Process of communicating your research  Before the fact  Research proposal  After the fact.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Information Retrieval in Practice
Just What Is Science Anyway???
IR Theory: Evaluation Methods
Getting Started BPS 7e Chapter 0 © 2014 W. H. Freeman and Company.
Retrieval Performance Evaluation - Measures
Presentation transcript:

Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

2 Abstract Formulation of topic properties is the goal of this paper. Formulation of topic properties is the goal of this paper. Applying statistical method to both TREC-6 and TREC-7 IR results, we identify topic pairs that exemplify topic properties useful in relating topic statements to system performance. Applying statistical method to both TREC-6 and TREC-7 IR results, we identify topic pairs that exemplify topic properties useful in relating topic statements to system performance. We formulate topic properties by relating the corresponding topic statements to what is known about IR systems. We formulate topic properties by relating the corresponding topic statements to what is known about IR systems. Some properties apparent in the topic pairs identified are linked to topic expansion, and these pairs exemplify both the need for expansion and the danger in automatic expansion. Some properties apparent in the topic pairs identified are linked to topic expansion, and these pairs exemplify both the need for expansion and the danger in automatic expansion.

3 Topic Properties System developers would like to be able to judge topic difficulty from reasoning involving topic properties. System developers would like to be able to judge topic difficulty from reasoning involving topic properties.  Ex: a set of topic properties to describe whether several sentences are needed to narrow the topic sufficiently? There is some doubt that formulation of topic properties is even possible. There is some doubt that formulation of topic properties is even possible.  “ Little is known about what makes a topic difficult, ” conclude Voorhees and Harman in their TREC-6 overview. Our analysis of the ad hoc task shows that one can connect system performance to generic topic properties. Our analysis of the ad hoc task shows that one can connect system performance to generic topic properties.

4 Topic Properties The effect of topic properties on system performance depends on the document collection. The effect of topic properties on system performance depends on the document collection.  Our statistical methods might connect two topics not because the topic statements share some property but because the document collection has some unexpected characteristic.  Topic properties formulated through the TREC ad hoc task might not be as useful with other document collections. For each topic in the ad hoc task, the TREC evaluation provides system performances, a collection of numbers that might be termed a performance profile. For each topic in the ad hoc task, the TREC evaluation provides system performances, a collection of numbers that might be termed a performance profile.  Partial answer to the question “ How do topics differ? ”

5 Topic Properties This paper presents a pair of TREC-6 topics and three pairs of TREC-7 topics for the reader to study. This paper presents a pair of TREC-6 topics and three pairs of TREC-7 topics for the reader to study.  It ’ s hoped that the reader will be able to offer an opinion. Presentation of the four pairs involves two alternative measures of system performance and a method for decomposing the system-by-topic table of performance measurements. Presentation of the four pairs involves two alternative measures of system performance and a method for decomposing the system-by-topic table of performance measurements.  Average precision Partially depend on the number of relevant documents Partially depend on the number of relevant documents  Depth at 25 percent recall  Two-way analysis of variance

6 Topic Pairs for Study Statistical analysis suggests the following two pairs of TREC-7 topics for careful study. Statistical analysis suggests the following two pairs of TREC-7 topics for careful study.  372 and 379 for study with respect to the 40 best systems  372 and 391 with respect to the 14 best automatic systems that use all parts of the topic statement

7 Topic Pairs for Study

8 The reason that statistical analysis of system performance connects these topics. The reason that statistical analysis of system performance connects these topics.  The appearance of common system successes and common system failures in expansion of these topics.  Manual systems seem to do relatively well with topics 372 and 379 whereas automatic systems do relatively poorly. In trying to conceive of the topic property common to the members of these pairs, one must also recognize other topic properties in which these topics differ. In trying to conceive of the topic property common to the members of these pairs, one must also recognize other topic properties in which these topics differ.

9 Data Analysis Comparison of two topics in search of a common topic property involves for each topic: Comparison of two topics in search of a common topic property involves for each topic:  The statement  The performance for a group of systems Average precision, depth at 25 percent recall Average precision, depth at 25 percent recall Performance is divided into components. Performance is divided into components.  The overall component Compare two topics in terms of the overall abilities of the systems. Compare two topics in terms of the overall abilities of the systems.  The distinctive component Compare two topics in terms of deviations from these overall abilities. Compare two topics in terms of deviations from these overall abilities. The topic-by-topic similarity of the distinctive component The topic-by-topic similarity of the distinctive component

10 Computation of the Components Let N s be the number of systems, N t be the number of topics, and y ij be the performance measure for system i and topic j. Let N s be the number of systems, N t be the number of topics, and y ij be the performance measure for system i and topic j. The difficulty of topic j is: The (centered) average performance of system i is: Variation from topic to topic in the effect of overall system abilities:

11 Computation of the Components The overall component is given by: The overall component is given by: The remainder is: The remainder is: The remainder reflects interactions, cases where after adjustment for overall performance, one system is better than another for one topic but not for another topic. The remainder reflects interactions, cases where after adjustment for overall performance, one system is better than another for one topic but not for another topic.  The remainder is not easy to interpret because it ’ s noisy. The distinctive component = the remainder excluding noise The distinctive component = the remainder excluding noise

12 Overall Component for Pair-1 Topic 372= “ Native American casino ”, 379= “ mainstreaming ” Topic 372= “ Native American casino ”, 379= “ mainstreaming ” “ t ” =title, “ d ” =description, “ s ” =title+description, “ l ” = “ s ” +narrative, “ m ” =manual “ t ” =title, “ d ” =description, “ s ” =title+description, “ l ” = “ s ” +narrative, “ m ” =manual

13 Distinctive Component for Pair-1 Too many query types …

14 Interpretation of the Data System-to-system variation in the overall component reflects the average performance. System-to-system variation in the overall component reflects the average performance. There is discrepancy between the two measures. There is discrepancy between the two measures.  AP depends on # of rel-docs. There is little to suggest that one topic is easier than the other. There is little to suggest that one topic is easier than the other.

15 Interpretation of the Data Figure 4 shows that share one or more topic properties related to challenges in topic expansion. Figure 4 shows that share one or more topic properties related to challenges in topic expansion.  Manual vs. automatic

16 Components for Automatic systems that use the entire topic statement. Automatic systems that use the entire topic statement. What features of these systems cause them to favor topics 372 and 391 when overall performance of these systems is variable, both better and worse than the average? What features of these systems cause them to favor topics 372 and 391 when overall performance of these systems is variable, both better and worse than the average?

17 The Distinctive Component A N s × N t matrix with elements A N s × N t matrix with elements Analysis of the residual matrix for choosing topic pairs Analysis of the residual matrix for choosing topic pairs  Singular value decomposition  Approximation  Optimization  …

18 Topic 352 and 385

19 Components for The title-only systems perform relatively better than description-only systems in terms of the distinctive component. The title-only systems perform relatively better than description-only systems in terms of the distinctive component. The key noun phrase differs between title and description. The key noun phrase differs between title and description.

20 Topic 312 and 316

21 Components for Very specific key words: “ hydroponics ” and “ polygamy ”. Very specific key words: “ hydroponics ” and “ polygamy ”. The manual systems that did well seem better able to take advantage of these key words than automatic systems. The manual systems that did well seem better able to take advantage of these key words than automatic systems.

22 Conclusions This paper offers four pairs of topics chosen by statistical methods. This paper offers four pairs of topics chosen by statistical methods. Faced with the challenge of hypothesizing what each pair has in common, it seems that one has some basis for a response. Faced with the challenge of hypothesizing what each pair has in common, it seems that one has some basis for a response. One topic property that seems to have surfaced is the need for parsimonious expansion. One topic property that seems to have surfaced is the need for parsimonious expansion.

23 Measure of depth at 25 percent recall = -log(r.25 )