Download presentation
Presentation is loading. Please wait.
Published byGünter Arnold Modified over 6 years ago
1
“INEX 2005: Playground for XML-retrieval” Sergey Chernov
2
Why Do We Need XML Retrieval?*
*Slide is taken from Prabhakar Raghavan Sergey Chernov, Info Lunch at L3S 22/11/18
3
Why Do We Need XML Retrieval??*
Raghavan *Slide is taken from Prabhakar Raghavan Sergey Chernov, Info Lunch at L3S 22/11/18
4
A Scenario for Desktop Search
Xuan searches for “the articles about multimedia conferences and workshops, which are titled “call for papers” or “upcoming events” and were recommended by Mounia”. Query: multimedia workshop /title upcoming events /receivedFrom Mounia affiliatedTo fn uid:123 Queen Mary Uni Mounia Lalmas family receivedFrom given Lalmas accessedFrom msgid:00465 Mounia Upcoming Events storedFrom publication title type publishedIn c:\inex1.8\xml\mu\1998\u40c2.xml IEEE MULTIMEDIA 1999 issn X Multimedia Computing and Networking 1999 (MMCN 99) … This conference … multimedia systems… year text 1998 Sergey Chernov, Info Lunch at L3S 22/11/18
5
What is INEX?* *Slide is taken from Norbert Fuhr
Sergey Chernov, Info Lunch at L3S 22/11/18
6
INEX in the Pictures Paul Ogilvie Gabriella Kazai Saadia Malik
Börkur Sigurbjörnsson Arjen P. de Vries Ray Larson Patrick Gallinari Roelof van Zwol Birger Larsen Andrew Trotman Norbert Fuhr Mounia Lalmas Shlomo Geva Ludovic Denoyer Benjamin Piwowarski INEX in the Pictures Sergey Chernov, Info Lunch at L3S 22/11/18
7
INEX in Numbers community: 58 research groups participated in 2005
collection: IEEE articles from , 740Mb topics (queries): 87 in total, 40 CO+S and 47 CAS topics tracks: 7 (Adhoc, Relevance Feedback, Natural Language Processing, Heterogeneous, Interactive, Document Mining, Multimedia) publications over 4 years: >125 important dates: April – start, November - finish Sergey Chernov, Info Lunch at L3S 22/11/18
8
Adhoc Track: Collection and Queries
IEEE collection (journals and transactions) Language used for structural conditions: NEXI Topics (queries) Content-only + Structure (CO+S) – Structural part is OPTIONAL Content and Structure (CAS) – Structural part is MANDATORY Example content: "call for papers" conference workshop +multimedia Example structure: //article[about(.//atl,"upcoming events") OR about(.//atl,"call for papers")]//sec[about(., +multimedia conference workshop)] Target element: //article//sec Support elements: //article[about(.//atl,"upcoming events") ; //article[about(.//atl,"call for papers") //article//sec[about(., +multimedia conference workshop)] Sergey Chernov, Info Lunch at L3S 22/11/18
9
Adhoc Track: Relevance Assessment Methodology
Select the top 1500 components in a topic’s retrieval results Assess w.r.t. two dimensions Exhaustivity (E), which describes the extent to which the document component discusses the topic. Specificity (S), which describes the extent to which the document component focuses on the topic. Highly exhaustive Partially exhaustive Too small Sergey Chernov, Info Lunch at L3S 22/11/18
10
Online Relevance Assessment System X-Rai
Sergey Chernov, Info Lunch at L3S 22/11/18
11
Adhoc: CO Retrieval Strategies
CO.Focussed : find the most exhaustive and specific element in a path. Retrieved elements cannot contain any overlapping elements. CO.Thorough : find all highly exhaustive and specific elements. Overlapping is considered as an interface and results presentation issue. CO.FetchBrowse : first identify relevant articles, and then to identify the most exhaustive and specific elements within the fetched articles. Sergey Chernov, Info Lunch at L3S 22/11/18
12
Adhoc: CAS Retrieval Strategies
VVCAS: structural constraints in both the target elements and the support elements are interpreted as vague. SVCAS : target – strict, support - vague. VSCAS : target – vague, support - strict. SSCAS : target and support - strict. Sergey Chernov, Info Lunch at L3S 22/11/18
13
Adhoc: Relevance Values (RV)
Sergey Chernov, Info Lunch at L3S 22/11/18
14
Adhoc: Metrics Consider:
Two dimensions of relevance Independency assumption does not hold No predefined retrieval unit Overlap Extended Cumulative Gain xCG and normalised version nxCG Sergey Chernov, Info Lunch at L3S 22/11/18
15
Adhoc: Competition The nXCG curves of runs in CO.
Thorough task with generalized quantization Sergey Chernov, Info Lunch at L3S 22/11/18
16
Other Tracks Relevance Feedback Collection: IEEE
Goal: investigation of relevance feedback in the context of XML retrieval. The approach should ideally consider not only content but also the structural features of XML documents. Interactive Goal: investigation the behaviour of users when interacting with components of XML documents, and evaluates approaches for XML retrieval which are effective in user-based environments. Heterogeneous Collection: Berkeley bib, FIZ Karlsruhe, Duisburg-Essen bib, DBLP, HCI resources, QMUL db, ZDNet Goal: creation of a heterogeneous test collection, retrieval experiments with a small number of both CO and CAS queries, qualitative analysis of the results. Sergey Chernov, Info Lunch at L3S 22/11/18
17
Other Tracks (continued)
Multimedia Collection: Lonely Planet document collection Goal: an evaluation platform/forum for structured document retrieval systems that do not only include text in the retrieval process. Document Mining Collection: IMdB collection Goal: generic tasks of classification and clustering. Natural Language Processing Collection: Any Goal: design and build software that will analyse, understand, and generate results in response to queries that humans express naturally. Sergey Chernov, Info Lunch at L3S 22/11/18
18
A Scenario for Desktop Search
Xuan searches for “the articles about multimedia conferences and workshops, which are titled “call for papers” or “upcoming events” and were recommended by Mounia”. Query: multimedia workshop /title upcoming events /receivedFrom Mounia affiliatedTo fn uid:123 Queen Mary Uni Mounia Lalmas family receivedFrom given Lalmas accessedFrom msgid:00465 Mounia Upcoming Events storedFrom publication title type publishedIn c:\inex1.8\xml\mu\1998\u40c2.xml IEEE MULTIMEDIA 1999 issn X Multimedia Computing and Networking 1999 (MMCN 99) … This conference … multimedia systems… year text 1998 Sergey Chernov, Info Lunch at L3S 22/11/18
19
Desktop Metadata Missing from INEX
StoredFrom - Web links as sources of publications ReceivedFrom - activity information, s containing publications Annotations - annotations (from sender) SearchKeyword - Search keywords, which were used at Web search engine to find the document OpenLast, MovedFrom - User action history in regard to the publications Annotation - User annotations Sergey Chernov, Info Lunch at L3S 22/11/18
20
Challenges for Designing a Dataset for Desktop
Data obtained through logging Pros: real-data Cons: privacy issues, high level of user cooperation is required, low-scalability Data created through simulations Pros: scalable, easy-to-modify, cheap, less restrictions regarding privacy Cons: can be based on wrong assumptions Sergey Chernov, Info Lunch at L3S 22/11/18
21
Thanks a lot and Merry Christmas! Last slide
Sergey Chernov, Info Lunch at L3S 22/11/18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.