Download presentation
Presentation is loading. Please wait.
Published byLuke Davis Modified over 8 years ago
1
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo
2
Search & Navigation Trends Users often search and then supplement the search by extensively navigating beyond the search page to locate relevant information. Why ? Query formulation problems Open ended search tasks Preference for orienteering
3
Search & Navigation Trends User behaviour in IR tasks not often fully exploited by search engines ……….. Content based – words PageRank – in and out links for popularity Collaborative – clicks on results Search engines do not examine these navigation patterns ………(they fail to mention SearchGuide – Coyle et al that does)
4
NAR – Navigation Aided Recommendation New retrieval paradigm that incorporates post query user navigation as an explicit component – NAR A query is seen as a means to identify starting points for further navigation by users The starting points are presented to the user in a result-list and they permit easy navigation to many documents which match the users query
6
Existing Context Navigators Synthetic structured navigation-aided retrieval Serves as contextual backdrop for query results and provide semantically meaningful avenues for exploration Do not rely on the presence of good starting points
7
NAR Navigation retrieval with Organic structure Structure naturally present in pre-existing web documents Advantages Human oversight – human generated categories etc Familiar user Interface – list of documents (i.e. result-list) Single view of document collection Robust implementation – no semantic knowledge required
8
The model D – set of documents in corpus, T - users search task S T – answer set for search task, Q T - the set of valid queries for task T Query submodel – belief distribution for the answer set given a query. What is the likelihood that doc d solves the task - Relevance Navigation submodel – likelihood that a user starting at a particular document will be able to navigate (under guidance) to a document that solves the task.
9
Conventional probabilistic IR Model No outward navigation considered Probability of solving the task depends on whether there is a document in the document collection which solves the task Probability of the document solving a task is based on its “relevance” to the query
10
Navigation-Conscious Model Considers browsing as part of the search task Query submodel – any probabilistic IR relevance ranking model Navigation submodel – Stochastic model of user navigation WUFIS (Chi et al)
11
WUFIS W(N, d 1, d 2 ) - probability that a user with need N will navigate from d 1 to d 2. Scent provided by anchor and surrounding text. The probability of a link being followed is related to how well a user’s need matches the scent – similarity between weighted vector of need terms and scent terms.
12
Final Model Documents starting point score = Query submodel X Navigation submodel
13
Volant - Prototype
14
Volant - Preprocessing Content Engine R(d,q) –estimated by Okapi DM25 scoring function Connectivity Engine Estimates the probability of a user with need N(d 2 ) navigating from d 1 to d 2 starting with d w Dijikstra’s algorithm used to generate tuples
15
Volant – Starting points Query entered -> ranked list of starting points 1. Retrieve from the content engine all documents, d’, that are relevant to the query 2. For each document retrieved from 1 retrieve from the connectivity engine all documents d for which W(N(d’),d,d’)>0 3. For each unique d, compute the starting point score. 4. Sort in decreasing order of starting point score
16
Volant – Navigation Guidance When a user is navigation Volant intercepts the document and highlights links that lead to documents relevant to their query, q. 1. Retrieve from content engine all documents d’ that are relevant to q 2. For each d’ retrieved, get the documents that can lead to d from the connectivity engine i.e. W(N(d’),d,d’)>0 3. For each tuple retrieved in step 2 highlight the links that point to d w
17
Evaluation Hypothesis 1. In query only scenarios Volant does not perform significantly worse that conventional approaches 2. In combined query/navigation scenarios Volant selects high-quality starting points. 3. In a significant fraction of query navigation scenarios the best organic starting point is of higher quality than the one that can be synthesized using existing techniques.
18
Search Task Test Sets Navigation prone scenarios are difficult to predict. Simplified Clarity Score was used to determine a set of ambiguous and unambiguous queries Unambiguous – 20 search tasks with highest clarity from Trek 2000 Ambiguous - 48 randomly selected tasks from Trek 2003
19
Performance on Unambiguous Queries Mean Average Precision No significant difference Why? Relevant documents tended not to be siblings or close cousins so Volant deemed that the best starting points were the documents themselves.
20
Performance on Ambiguous Queries User study – 48 judges judge the suitability of starting documents as starting points 30 starting points generated 10 Trec winner 2003 CSIRO 10 Volant with user guidance 10 (same as first 10 Volant) Volant without user guidance
21
Performance on Ambiguous Queries Rating criteria Breadth – spectrum of people, different interests Accessibility – how easy to navigate and find info Appeal – presentation of material Usefulness – would people be able to complete their task from this point. Each judge spent 5 hours on their task
22
Results
23
Summary & Future Work Effectiveness – responds to users and positions them at suitable starting point for their task, guides them to further information in a query driven fashion. Relationship to conventional IR – generalizes conventional probabilistic IR model and is successful in scenarios where IR techniques fail – ambiguous queries etc
24
Discussion Cold Start Problem Scalability Bias in Evaluation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.