INEX 2002 - 2006: Understanding XML Retrieval Evaluation Mounia Lalmas and Anastasios Tombros Queen Mary, University of London Norbert Fuhr University.

Slides:



Advertisements
Similar presentations
INEX: Evaluating content-oriented XML retrieval Mounia Lalmas Queen Mary University of London
Advertisements

Evaluating content-oriented XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
Evaluating XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
XML Retrieval: from modelling to evaluation Mounia Lalmas Queen Mary University of London qmir.dcs.qmul.ac.uk.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Towards Adaptive Web-Based Learning Systems Katerina Georgouli, MSc, PhD Associate Professor T.E.I. of Athens Dept. of Informatics Tempus.
XML Ranking Querying, Dagstuhl, 9-13 Mar, An Adaptive XML Retrieval System Yosi Mass, Michal Shmueli-Scheuer IBM Haifa Research Lab.
Project Proposal.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
XML R ETRIEVAL Tarık Teksen Tutal I NFORMATION R ETRIEVAL XML (Extensible Markup Language) XQuery Text Centric vs Data Centric.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
WP5 – Knowledge Resource Sharing and Management Kick-off Meeting – Valkenburg 8-9 December 2005 Dr. Giancarlo Bo Giunti Interactive Labs S.r.l.
Evaluating Search Engine
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 10: XML Retrieval 1.
Search Engines and Information Retrieval
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Modern Information Retrieval
XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Information Retrieval in Practice
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval ACM.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Hybrid XML Retrieval Revisited Jovan Pehcevski PhD Candidate School of CS and IT, RMIT University
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Information Retrieval – Introduction and Survey Norbert Fuhr University of Duisburg-Essen Germany
Information Retrieval in Practice
XML Information Retrieval and INEX Norbert Fuhr University of Duisburg-Essen.
On the edge: designing online modules in EAP George Blue Julie Watson Vicky Wright
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Information Retrieval – Introduction and Survey Norbert Fuhr University of Duisburg-Essen Germany
Search Engines and Information Retrieval Chapter 1.
XML Retrieval: A content-oriented perspective Mounia Lalmas Department of Computer Science Queen Mary, University of London.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
INEX – a broadly accepted data set for XML database processing? Pavel Loupal, Michal Valenta.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Search Engine Architecture
MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann University of Dortmund, Germany.
Users and Assessors in the Context of INEX: Are Relevance Dimensions Relevant? Jovan Pehcevski, James A. Thom School of CS and IT, RMIT University, Australia.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Information Retrieval in Practice
User Characterization in Search Personalization
RECENT TRENDS IN METADATA GENERATION
Search Engine Architecture
“INEX 2005: Playground for XML-retrieval” Sergey Chernov
Toshiyuki Shimizu (Kyoto University)
IR Theory: Evaluation Methods
Martin Rajman, EPFL Switzerland & Martin Vesely, CERN Switzerland
CSE 635 Multimedia Information Retrieval
Search Engine Architecture
“The need for Semantic Desktop Dataset” L3S and University of Hannover, Germany Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou (chernov, iofciu,
Presentation transcript:

INEX : Understanding XML Retrieval Evaluation Mounia Lalmas and Anastasios Tombros Queen Mary, University of London Norbert Fuhr University of Duisburg-Essen

XML retrieval vs. document retrieval ( Retrieval of structured vs. unstructured documents) No predefined unit of retrieval Dependency of retrieval units Aims of XML retrieval: –Not only to find relevant elements –But those at the appropriate level of granularity Book Chapters Sections Subsections XML retrieval allows users to retrieve document components that are more focused, e.g. a subsection of a book instead of an entire book.

Outline Collections Topics Retrieval tasks Relevance and assessment procedures Metrics

Evaluation of XML retrieval: INEX Promote research and stimulate development of XML information access and retrieval, through  Creation of evaluation infrastructure and organisation of regular evaluation campaigns for system testing  Building of an XML information access and retrieval research community  Construction of test-suites Collaborative effort  participants contribute to the development of the collection End with a yearly workshop, in December, in Dagstuhl, Germany INEX has allowed a new community in XML information access to emerge

INEX: Background  University of Amsterdam, NL  University of Otago, NZ  University of Waterloo  CWI, NL  Carnegie Mellon University, USA  IBM Research Lab, IL  University of Minnesota Duluth, USA  University of Paris 6, FR  Queensland University of Technology, AUS  University of California, Berkeley, USA  Royal School of LIS, DK  Queen Mary, University of London, UK  University of Duisburg-Essen, DE  INRIA-Rocquencourt, FR  Yahoo! Research  Microsoft Research Cambridge, UK  Max-Planck-Institut fur Informatik, DE  Since 2002  Sponsored by DELOS Network of Excellence for Digital Libraries under FP5 and FP6 – IST programme  Mainly dependent on voluntary efforts  Coordination is distributed for tasks and tracks  64 participants in 2005; 80+ in 2006 Main Institutions involved in Coordination for 2006

Document collections Year number documents number elements size average number elements average element depth corpus ,1078M494MB1, IEEE Journal Articles ,81911M764MB‘’ ,38830M 60GB 4.6GB w/o images Wikipedia

Topics

Two types of topics in INEX Content-only (CO) topics –ignore document structure –simulates users, who do not have any knowledge of the document structure or who choose not to use such knowledge Content-and-structure (CAS) topics –contain conditions referring both to content and structure of the sought elements –simulate users who do have some knowledge of the structure of the searched collection

CO topics "Information Exchange", +"XML", "Information Integration" How to use XML to solve the information exchange (information integration) problem, especially in heterogeneous data sources? Relevant documents/components must talk about techniques of using XML to solve information exchange (information integration) among heterogeneous data sources where the structures of participating data sources are different although they might use the same ontologies about the same content.

CAS topics //article[(./fm//yr = '2000' OR./fm//yr = '1999') AND about(., '"intelligent transportation system"')]//sec[about(.,'automation +vehicle')] Automated vehicle applications in articles from 1999 or 2000 about intelligent transportation systems. To be relevant, the target component must be from an article on intelligent transportation systems published in 1999 or 2000 and must include a section which discusses automated vehicle applications, proposed or implemented, in an intelligent transportation system.

CO+S topics markov chains in graph related algorithms //article//sec[about(.,+"markov chains" +algorithm +graphs)] Retrieve information about the use of markov chains in graph theory and in graphs-related algorithms. I have just finished my Msc. in mathematics, in the field of stochastic processes. My research was in a subject related to Markov chains. My aim is to find possible implementations of my knowledge in current research. I'm mainly interested in applications in graph theory, that is, algorithms related to graphs that use the theory of markov chains. I'm interested in at least a short specification of the nature of implementation (e.g. what is the exact theory used, and to which purpose), hence the relevant elements should be sections, paragraphs or even abstracts of documents, but in any case, should be part of the content of the document (as opposed to, say, vt, or bib).

Expressing structural constraints: NEXI Narrowed Extended XPath I INEX Content-and-Structure (CAS) Queries Specifically targeted for content-oriented XML search (i.e. “aboutness”) //article[about(.//title, apple) and about(.//sec, computer)]

Retrieval Tasks

Retrieval tasks Ad hoc retrieval: “a simulation of how a library might be used and involves the searching of a static set of XML documents using a new set of topics” –Ad hoc retrieval for CO topics –Ad hoc retrieval for CAS (+S) topics Core task: –“identify the most appropriate granularity XML elements to return to the user, with or without structural constraints”

CO retrieval task ( ) Specification: –make use of the CO topics –retrieves the most specific elements and only those, which are relevant to the topic –no structural constraints regarding the appropriate granularity –must identify the most appropriate XML elements to return to the user Two main strategies Focused strategy Thorough strategy

Focused strategy ( ) Specification: “find the most exhaustive and specific element on a path within a given document containing relevant information and return to the user only this most appropriate unit of retrieval” –no overlapping elements –preference for specificity over exhaustivity

Thorough strategy ( ) Specification: –“core system's task underlying most XML retrieval strategies, which is to estimate the relevance of potentially retrievable elements in the collection” –overlap problem viewed as an interface and presentation issues –challenge is to rank elements appropriately Task that most XML approaches performed up to 2004 in INEX.

Fetch & Browse Document ranking, and in each document, element ranking Query: wordnet information retrieval

Fetch & Browse Document ranking, and in each document –All in context task: rank relevant elements, no overlap allowed (actual refinement of fetch & Browse) –Best in context task: identify the one element from where to start reading in the document Likely to be the two tasks in INEX 2007

Retrieval strategies - to recap Focussed: assume that user prefers a single element that is the most relevant. Thorough: assume that user prefers all highly relevant elements. All In Context: assume that user interested in highly relevant elements that are contained only within highly relevant articles. Best In Context: assume that user interested in the best entry points, one per article, of highly relevant articles

Relevance and assessment procedures

Relevance in XML retrieval smallest component (specificity) that is highly relevant (exhaustivity) specificity specificity: extent to which a document component is focused on the information need, while being an informative unit exhaustivity exhaustivity: extent to which the information contained in a document component satisfies the information need. XML retrieval evaluation XML retrieval article ss1 ss2 s1 s2 s3 XML evaluation Query: ‘XML retrieval evaluation’

Relevance in XML retrieval: INEX Relevance = (0,0) (1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1) (3,2) (3,3) exhaustivity = how much the section discusses the query: 0, 1, 2, 3 specificity = how focused the section is on the query: 0, 1, 2, 3 If a subsection is relevant so must be its enclosing section,... XML retrieval evaluation XML retrieval article ss1 ss2 s1 s2 s3 XML evaluation

Relevance assessment task Pooling technique CompletenessCompleteness –Rules that force assessors to assess related elements –E.g. element assessed relevant  its parent element and children elements must also be assessed –… ConsistencyConsistency –Rules to enforce consistent assessments –E.g. Parent of a relevant element must also be relevant, although to a different extent –E.g. Exhaustivity increases going up; specificity increases going down –…

Quality of assessments Very laborious assessment task, eventually impacting on the quality of assessments Interactive study shows that assessors agreement levels are high only at extreme ends of the relevance scale (very vs. not relevant) Statistical analysis of 2004 data showed that comparisons of approaches would lead to same outcomes using a reduced scale A simplified assessment procedure based on highlighting

Relevance in XML specificity defined continuous scale defined as ratio (in characters) of the highlighted text to element size. Exhaustivity –Highly exhaustive (2): –Partly exhaustive (1) –Not exhaustive (0) –Too Small (?) New assessment procedure led to better quality assessments

Latest analysis Statistical analysis on the INEX 2005 data: –The exhaustivity 3+1 scale is not needed in most scenarios to compare XML retrieval approaches –“too small” may be simulated by some threshold length INEX 2006 used only the specificity dimension to “measure” relevance –The same highlighting approach is used Use of a highlighting procedure simplifies everything and is enough to “properly” compare the effectiveness of XML retrieval systems

Metrics

Measuring effectiveness: Metrics A research problem in itself! Quantizations reflecting preference scenarios Metrics  inex_eval - official INEX metric through 2004  inex_eval_ng (consider overlap & size)  ERR (expected ratio of relevant units)  XCG (XML cumulative gain) - official INEX metric 2005  t2i (tolerance to irrelevance)  PRUM (Precision Recall with User Modelling)  HiXEval  …..

XML retrieval allows users to retrieve document components that are more focussed, e.g. a section of a book instead of an entire book BUT: what about if the chapter or one the subsections is returned? Near-misses (3,3) (3,2) (3,1) (1,3) (exhaustivity, specificity) as defined in 2004

best Retrieve the best XML elements according to content and structure criteria (2004 scale): Most exhaustive and the most specific = (3,3) Near misses = (3,3) + (2,3) (1,3)  specific Near misses = (3, 3) + (3,2) (3,1)  exhaustive Near misses = (3, 3) + (2,3) (1,3) (3,2) (3,1) (1,2) … near-misses

Quantization functions - reward near misses (2004 scale) Strict - no reward General - some rewards

Other INEX tracks Interactive ( ) Relevance feedback ( ) Natural language query processing ( ) Heterogeneous collection ( ) Multimedia track ( ) Document mining ( ) together with PASCAL network - User - case studies (2006) XML entity ranking ( ) Other tracks under discussion for 2007, including a book search track

Looking Forward Much recent work on evaluation Larger more realistic collection - Wikipedia –More assessed topics! –Better suite for analysis and reusability Better understanding of –information needs and retrieval scenarios –measuring effectiveness Introduction of a passage retrieval task in INEX 2007 Questions?