Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.

Slides:



Advertisements
Similar presentations
XIRQL: Eine Anfragesprache für Information Retrieval in XML-Dokumenten
Advertisements

Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh
INEX: Evaluating content-oriented XML retrieval Mounia Lalmas Queen Mary University of London
Evaluating content-oriented XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
Evaluating XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
XML Retrieval: from modelling to evaluation Mounia Lalmas Queen Mary University of London qmir.dcs.qmul.ac.uk.
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Skövde, Jan Information Access: Leif Grönqvist1 Systematic Evaluation of Swedish IR Systems using a Relevance Judged Document Collection Leif.
Evaluation of Relevance Feedback Algorithms for XML Retrieval Silvana Solomon 27 February 2007 Supervisor: Dr. Ralf Schenkel.
XML Ranking Querying, Dagstuhl, 9-13 Mar, An Adaptive XML Retrieval System Yosi Mass, Michal Shmueli-Scheuer IBM Haifa Research Lab.
XML R ETRIEVAL Tarık Teksen Tutal I NFORMATION R ETRIEVAL XML (Extensible Markup Language) XQuery Text Centric vs Data Centric.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Information Retrieval in Practice
Search Engines and Information Retrieval
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Modern Information Retrieval
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Hybrid XML Retrieval Revisited Jovan Pehcevski PhD Candidate School of CS and IT, RMIT University
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Overview of Search Engines
Information Retrieval in Practice
LOGO XML Keyword Search Refinement 郭青松. Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work.
Search Engines and Information Retrieval Chapter 1.
INEX – a broadly accepted data set for XML database processing? Pavel Loupal, Michal Valenta.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
TopX 2.0 at the INEX 2009 Ad-hoc and Efficiency tracks Martin Theobald Max Planck Institute Informatics Ralf Schenkel Saarland University Ablimit Aji Emory.
1 Query Operations Relevance Feedback & Query Expansion.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Information Retrieval
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Relevance Feedback Hongning Wang
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Enterprise Track: Thread-based Retrieval Enterprise Track: Thread-based Retrieval Yejun Wu and Douglas W. Oard Goal Explore -- document expansion.
Information Retrieval in Practice
Information Retrieval in Practice
Search Engine Architecture
CADIAL search engine at INEX
Relevance Feedback Hongning Wang
Information Retrieval
Toshiyuki Shimizu (Kyoto University)
IR Theory: Evaluation Methods
Murat Açar - Zeynep Çipiloğlu Yıldız
Searching EIT, Author Gay Robertson, 2017.
Dagstuhl Seminar on Ranked XML Querying
Information Retrieval and Web Design
Presentation transcript:

Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald

TopX Results with INEX ,000 XMLified English Wikipedia articles 107 topics with –structural query (CAS) –nonstructural (aka keyword) query (CO) –informal description of information need –assessed answers (text passages) Evaluation metric based on recall/precision: fraction of relevant characters retrieved 1% recall result list C: #characters retrieved R: #relevant characters retrieved P[0.01]=R/C

Results with INEX 2007 structure queries keyword queries Structural constraints can improve result quality document retrieval improved structure queries improved keyword queries (unchecked)

Users vs. Structural XML IR //professor[contains(.,SB) and contains(.//course,IR] I need information about a professor in SB who teaches IR. Structural query languages do not work in practise: Schema is unknown or heterogeneous Language is too complex Humans dont think XPath Results often unsatisfying System support to generate good structured queries: User interfaces (advanced search) Natural language processing Interactive query refinement

Relevance Feedback for Interactive Query Refinement 1. User submits query … 2. User marks relevant and nonrelevant docs 3. System finds best terms to distinguish between relevant and nonrelevant docs query evaluation XML IR index Fagin IR index 4. System submits expanded query XML not(Fagin) Feedback for XML IR: Start with keyword query Find structural expansions Create structural query

Tag+Content of descendants sec Semistructured data… Structural Features article body sec subsec XML has evolved… frontmatterbackmatter sec subsec pp p With the advent of XSLT… author Baeza-Yates Content of result User marks relevant result Possible features: Tag+Content of ancestors Tag+Content of descen- dants of ancestors C: XML D: p[XSLT] A: sec[data] AD: article//author[Baeza]

where r f number of relevant results with f R number of relevant results ef f number of elements that contain f Enumber of all elements Feature Selection Compute Robertson-Sparck-Jones weight for each feature (also used as weight in query): Order features by Robertson Selection Value: where p f probability that f occurs in relevant result, q f probability that f occurs in nonrelevant result

Query Construction C: XML D: p[XSLT] A: sec[data] AD: article//author[Baeza] Initial query: query evaluation Tag+Content of descendants Content of result Tag+Content of ancestors Tag+Content of descen- dants of ancestors *[query evaluation]*[query evaluation XML] p[XSLT] sec[data] article author[Baeza] needs schema information! descendant- or-self axis

More Fancy Query Construction *[query evaluation]*[query evaluation XML] p[XSLT] sec[data] article author[Baeza] No valid NEXI query, but XPath (ancestor axis) DAG queries in TopX needs disjunctive evaluation

Example: pyramids of egypt

Architecture TopX Search Engine query + results feedback Weighting + Selection expanded query query results C Module D Module A Module Candidate Classes AD Module INEX Tools & Assessments

RF in the TopX 2.0 Interface

Evaluation Methodology Goal: avoid training on the data Freeze known results at the top Remove known results+X from the collection –resColl-result: remove results only (~doc retrieval) –resColl-desc: remove results+descendants –resColl-anc: remove results+ancestors –resColl-path: remove results+desc+anc –resColl-doc: remove whole doc with known results

Evaluation: INEX 2003&2004 INEX collection (IEEE-CS journal and conference articles): –12,107 XML docs with 12 mio. elements –queries with manual relevance assessments 52 keyword queries from 2003 & 2004 with our TopX Search Engine [VLDB05] Baseline run with MAP~0.1, Automatic feedback for top-k from relevance assessments Evaluation ignores results used for feedback and descendants of results (rescoll-desc)

INEX 2003&2004, rescoll-desc All dimensions together are best.Reasonable results for INEX 2005 RF Track

Results for INEX 2005 Track INEX IEEE collection (scientific articles) Feedback for the top-20 from the assessments (with the strict quantisation -> only relevant and nonrelevant) top 10 expansion features runs with top 1500 results MAP with inex_eval (with strict quantisation)

(Some) Results for INEX 2006 RF Track INEX Wikipedia collection Feedback for the top-20 from the assessments (with the generalized quantisation -> graded relevance) top 10 expansion features runs with top 100 results for first 50 topics (time…) MAP with inex_eval (with generalised quantisation) Significance tests (Wilcoxon signed-rank, t-test)

Conclusions Queries with structural constraints to improve result quality Relevance Feedback to create such queries Structure of collection matters a lot