Download presentation
Presentation is loading. Please wait.
1
Navigation-Aided Retrieval
Shashank Pandit and Christopher Olstony Presentation by Yang Yu CSE 450 Web Data Mining
2
Outline Introduction Related Work System Model Prototype System
Evaluation Summary & Future Work
3
Introduction Background reasons for this work Navigation-Aided
Difficulty in formulating appropriate queries Open-ended search tasks Preference for orienteering Navigation-Aided Retrieval
4
Introduction Organic versus Synthetic Structure Contributions
One is trying to synthesize structure automatically into query results One is trying to use structure that naturally exists in documents Advantages of organic NAR Human oversight. Familiar user interface. A single view of the document collection. Robust implementation by a third party Contributions Formal model of navigation-aided retrieval An overview of techniques for a NAR-based retrieval system Empirical evaluation via a user study
5
Related Work Selecting Starting Points Guiding Navigation
Best Trails system An ad-hoc scoring function for starting points Restrict starting points to be documents that themselves match the query It does not take into account navigability factors User interface departs substantially from the traditional interface Topic distillation that mainly uses HITS Only effective for broad topic areas for which there are many hubs and authorities Guiding Navigation WebWatcher highlights hyperlinks along paths taken by previous users who had posed similar queries.
6
System Model Generic Model Query submodel: Navigation submodel:
generic scoring function Assumption: every member of relevance set St is a singleton set. “Fatten" St into {d1, d2, …, dn}.
7
System Model Instantiations of Generic Model
Conventional Probabilistic IR Model Navigation-Conscious Model The two terms embody the two key factors the number of documents reachable from d that are relevant to the search task the ease and accuracy with which the user is able to navigate to those documents.
8
Prototype System Preprocessing Content Engine
Connectivity Engine: <d1, d2, dW, W(N(d2), d1, d2)> Intermediary
9
Prototype System
10
Prototype System Selecting Starting Points
1. Retrieve from the content engine all documents d’ relevant to q. 2. For each relevant document d’ retrieved in Step 1, then retrieve from the connectivity engine all documents d that can navigate to d’; 3. For each unique document d in Step 2, compute the starting point score; 4. Sort the documents in decreasing order of this score, truncate after the top k documents.
11
Prototype System Adding Navigation Guidance Efficiency and Scalability
1. Retrieve from the content engine all documents d’ for which R(d’, q)>= T; 2. For each document d’ retrieved in Step 1, retrieve from the connectivity engine the tuple corresponding to <d, d’>, if it exists. 3. For each <d1, d2, dW, W(N(d2), d1, d2)> tuple retrieved in Step 2, highlight links on d that point to dW. Efficiency and Scalability
12
Evaluation Experimental Hypotheses Search Task Test Sets
In query-only scenarios, Volant does not perform significantly worse In combined query/navigation scenarios, Volant performs better The best organic starting point is of higher quality than one that can be synthesized using existing techniques. Search Task Test Sets Unambiguous: Ambiguous: Performance on Unambiguous Queries
13
Evaluation Performance on Ambiguous Queries
4 Criteria - Breadth; Accessibility; Appeal; Usefulness.
14
Summary and Future Work
Effectiveness Relationship to conventional IR Relationship to synthetic approaches Future Work Add redundancy to corpora Tune scoring function to be applicable for synthetic starting points Unified method can both for exploration and directly return document
15
Thank you! Questions or Comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.