“INEX 2005: Playground for XML-retrieval” Sergey Chernov

Slides:



Advertisements
Similar presentations
INEX: Evaluating content-oriented XML retrieval Mounia Lalmas Queen Mary University of London
Advertisements

Evaluating content-oriented XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
Evaluating XML retrieval: The INEX initiative Mounia Lalmas Queen Mary University of London
XML Retrieval: from modelling to evaluation Mounia Lalmas Queen Mary University of London qmir.dcs.qmul.ac.uk.
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Search Engines and Information Retrieval
1 Adaptive Management Portal April
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Information Retrieval in Practice
Hybrid XML Retrieval Revisited Jovan Pehcevski PhD Candidate School of CS and IT, RMIT University
XML Information Retrieval and INEX Norbert Fuhr University of Duisburg-Essen.
INEX : Understanding XML Retrieval Evaluation Mounia Lalmas and Anastasios Tombros Queen Mary, University of London Norbert Fuhr University.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
INEX – a broadly accepted data set for XML database processing? Pavel Loupal, Michal Valenta.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.
Context: definition and specification. Leuven, 21 november 2003 Agenda Introduction Work method Context in literature  Definitions  Specifications Where.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann University of Dortmund, Germany.
Event Gazetteers and Timelines for Accessing Culture and History Bob Allen College of Information Studies U. Maryland.
Users and Assessors in the Context of INEX: Are Relevance Dimensions Relevant? Jovan Pehcevski, James A. Thom School of CS and IT, RMIT University, Australia.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Challenges with XML Challenges with Semi-Structured collections Ludovic Denoyer University of Paris 6 Bridging the gap between research communities.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
E VALUATING YOUR E - LEARNING COURSE LTU Workshop 11 March 2008.
Information Retrieval
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
RCDL 2007, Pereslavl-Zalessky, Oct 2007 Converting Desktop into a Personal Activity Dataset Sergey Chernov, Enrico Minack, and Pavel Serdyukov.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
23. Juli deskWeb2.0: Combining Desktop and Social Search Sergej Zerr, Elena Demidova, Sergej Chernov L3S Research Center Hannover, Germany
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
Information Retrieval in Practice
Information Retrieval in Practice
WP5: Semantic Multimedia
Evaluation Anisio Lacerda.
Near Duplicate Detection
Modularization and Semantics of Learning Objects in a Cooperative Knowledge Space Nadine Ludwig Center for Multimedia in eLearning & eResearch, Berlin.
A Web Mining Platform for Enhancing Knowledge Management on the Web KOK-LEONG ONG WEE-KEONG NG EE-PENG LIM Center for Advanced Information Systems,
Collaboration with Google Drive
Model-Driven Analysis Frameworks for Embedded Systems
Chapter 12: Automated data collection methods
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
WIRED Week 2 Syllabus Update Readings Overview.
Toshiyuki Shimizu (Kyoto University)
Data Warehousing and Data Mining
Interactive Information Access to Federations of DL Services
Magnet & /facet Zheng Liang
Automated Analysis and Code Generation for Domain-Specific Models
“The need for Semantic Desktop Dataset” L3S and University of Hannover, Germany Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou (chernov, iofciu,
Functional skills Week 10.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

“INEX 2005: Playground for XML-retrieval” Sergey Chernov

Why Do We Need XML Retrieval?* *Slide is taken from Prabhakar Raghavan Sergey Chernov, Info Lunch at L3S 22/11/18

Why Do We Need XML Retrieval??* Raghavan *Slide is taken from Prabhakar Raghavan Sergey Chernov, Info Lunch at L3S 22/11/18

A Scenario for Desktop Search Xuan searches for “the articles about multimedia conferences and workshops, which are titled “call for papers” or “upcoming events” and were recommended by Mounia”. Query: multimedia workshop /title upcoming events /receivedFrom Mounia affiliatedTo fn uid:123 Queen Mary Uni Mounia Lalmas family receivedFrom given http://inex.is.informatik.uni-duisburg.de/2005/index.html Lalmas accessedFrom msgid:00465 Mounia Upcoming Events storedFrom publication title type publishedIn c:\inex1.8\xml\mu\1998\u40c2.xml IEEE MULTIMEDIA 1999 issn 1070-986X Multimedia Computing and Networking 1999 (MMCN 99) … This conference … multimedia systems… year text 1998 Sergey Chernov, Info Lunch at L3S 22/11/18

What is INEX?* *Slide is taken from Norbert Fuhr Sergey Chernov, Info Lunch at L3S 22/11/18

INEX in the Pictures Paul Ogilvie Gabriella Kazai Saadia Malik Börkur Sigurbjörnsson Arjen P. de Vries Ray Larson Patrick Gallinari Roelof van Zwol Birger Larsen Andrew Trotman Norbert Fuhr Mounia Lalmas Shlomo Geva Ludovic Denoyer Benjamin Piwowarski INEX in the Pictures Sergey Chernov, Info Lunch at L3S 22/11/18

INEX in Numbers community: 58 research groups participated in 2005 collection: 17000 IEEE articles from 1995-2004, 740Mb topics (queries): 87 in total, 40 CO+S and 47 CAS topics tracks: 7 (Adhoc, Relevance Feedback, Natural Language Processing, Heterogeneous, Interactive, Document Mining, Multimedia) publications over 4 years: >125 important dates: April – start, November - finish Sergey Chernov, Info Lunch at L3S 22/11/18

Adhoc Track: Collection and Queries IEEE collection (journals and transactions) Language used for structural conditions: NEXI Topics (queries) Content-only + Structure (CO+S) – Structural part is OPTIONAL Content and Structure (CAS) – Structural part is MANDATORY Example content: "call for papers" conference workshop +multimedia Example structure: //article[about(.//atl,"upcoming events") OR about(.//atl,"call for papers")]//sec[about(., +multimedia conference workshop)] Target element: //article//sec Support elements: //article[about(.//atl,"upcoming events") ; //article[about(.//atl,"call for papers") //article//sec[about(., +multimedia conference workshop)] Sergey Chernov, Info Lunch at L3S 22/11/18

Adhoc Track: Relevance Assessment Methodology Select the top 1500 components in a topic’s retrieval results Assess w.r.t. two dimensions Exhaustivity (E), which describes the extent to which the document component discusses the topic. Specificity (S), which describes the extent to which the document component focuses on the topic. Highly exhaustive Partially exhaustive Too small Sergey Chernov, Info Lunch at L3S 22/11/18

Online Relevance Assessment System X-Rai Sergey Chernov, Info Lunch at L3S 22/11/18

Adhoc: CO Retrieval Strategies CO.Focussed : find the most exhaustive and specific element in a path. Retrieved elements cannot contain any overlapping elements. CO.Thorough : find all highly exhaustive and specific elements. Overlapping is considered as an interface and results presentation issue. CO.FetchBrowse : first identify relevant articles, and then to identify the most exhaustive and specific elements within the fetched articles. Sergey Chernov, Info Lunch at L3S 22/11/18

Adhoc: CAS Retrieval Strategies VVCAS: structural constraints in both the target elements and the support elements are interpreted as vague. SVCAS : target – strict, support - vague. VSCAS : target – vague, support - strict. SSCAS : target and support - strict. Sergey Chernov, Info Lunch at L3S 22/11/18

Adhoc: Relevance Values (RV) Sergey Chernov, Info Lunch at L3S 22/11/18

Adhoc: Metrics Consider: Two dimensions of relevance Independency assumption does not hold No predefined retrieval unit Overlap Extended Cumulative Gain xCG and normalised version nxCG Sergey Chernov, Info Lunch at L3S 22/11/18

Adhoc: Competition The nXCG curves of runs in CO. Thorough task with generalized quantization Sergey Chernov, Info Lunch at L3S 22/11/18

Other Tracks Relevance Feedback Collection: IEEE Goal: investigation of relevance feedback in the context of XML retrieval. The approach should ideally consider not only content but also the structural features of XML documents. Interactive Goal: investigation the behaviour of users when interacting with components of XML documents, and evaluates approaches for XML retrieval which are effective in user-based environments. Heterogeneous Collection: Berkeley bib, FIZ Karlsruhe, Duisburg-Essen bib, DBLP, HCI resources, QMUL db, ZDNet Goal: creation of a heterogeneous test collection, retrieval experiments with a small number of both CO and CAS queries, qualitative analysis of the results. Sergey Chernov, Info Lunch at L3S 22/11/18

Other Tracks (continued) Multimedia Collection: Lonely Planet document collection Goal: an evaluation platform/forum for structured document retrieval systems that do not only include text in the retrieval process. Document Mining Collection: IMdB collection Goal: generic tasks of classification and clustering. Natural Language Processing Collection: Any Goal: design and build software that will analyse, understand, and generate results in response to queries that humans express naturally. Sergey Chernov, Info Lunch at L3S 22/11/18

A Scenario for Desktop Search Xuan searches for “the articles about multimedia conferences and workshops, which are titled “call for papers” or “upcoming events” and were recommended by Mounia”. Query: multimedia workshop /title upcoming events /receivedFrom Mounia affiliatedTo fn uid:123 Queen Mary Uni Mounia Lalmas family receivedFrom given http://inex.is.informatik.uni-duisburg.de/2005/index.html Lalmas accessedFrom msgid:00465 Mounia Upcoming Events storedFrom publication title type publishedIn c:\inex1.8\xml\mu\1998\u40c2.xml IEEE MULTIMEDIA 1999 issn 1070-986X Multimedia Computing and Networking 1999 (MMCN 99) … This conference … multimedia systems… year text 1998 Sergey Chernov, Info Lunch at L3S 22/11/18

Desktop Metadata Missing from INEX StoredFrom - Web links as sources of publications ReceivedFrom - Email activity information, emails containing publications EmailAnnotations - Email annotations (from sender) SearchKeyword - Search keywords, which were used at Web search engine to find the document OpenLast, MovedFrom - User action history in regard to the publications Annotation - User annotations Sergey Chernov, Info Lunch at L3S 22/11/18

Challenges for Designing a Dataset for Desktop Data obtained through logging Pros: real-data Cons: privacy issues, high level of user cooperation is required, low-scalability Data created through simulations Pros: scalable, easy-to-modify, cheap, less restrictions regarding privacy Cons: can be based on wrong assumptions Sergey Chernov, Info Lunch at L3S 22/11/18

Thanks a lot and Merry Christmas! Last slide Sergey Chernov, Info Lunch at L3S 22/11/18