Download presentation
Presentation is loading. Please wait.
Published byDortha Murphy Modified over 5 years ago
1
Ranking using Multiple Document Types in Desktop Search
Jinyoung Kim, W. Bruce Croft SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/05/02
2
Outline Introduction Retrieval Model Experiment Conclusion
Type-specific Retrieval Type Prediction Result Merging Experiment Conclusion
3
Introduction People have many types of documents on their desktop with different sets of metadata for each type. →sender、receiver office document→filename、author Desktop search system: being able to predict which type of document a user is looking for given a query.
4
Introduction Main goal: to show how the retrieval effectiveness of a desktop search system can be enhanced by improving type prediction performance.
5
Retrieval Model query Q = (q1,…,qm)
each collection C contains documents of n field types (F1,…,Fn) each document d may include fields (f1,…,fn)
6
Type-specific Retrieval
Goal: to rank documents from each sub-collection. Probabilistic Retrieval Model for Semi-structured Data PQL(qi|fj)=(1-λ)P(qi|fj)+λP(qi|Fj)
7
Type Prediction Goal: to score each collection given a user query.
Methods for Type Prediction Query-likelihood of Collection Query-likelihood of Query Log
8
Type Prediction Methods for Type Prediction Geometric Average ReDDE
Query Clarity
9
Type Prediction Methods for Type Prediction Dictionary-based Matching
Built the dictionary for each sub-collection by using the names of the collection and metadata fields. Using Document Metadata Fields New method: field-based collection query likelihood (FQL).
10
Type Prediction Combining Type Prediction Methods
Grid-search of Parameter Values Golden Section Search Multi-class Classification MultiSVM (Liblinear Toolkit) Rank-learning Method RankSVM
11
Result Merging C: collection score (from type prediction)
D: document score (from type-specific retrieval)
12
Experiment Pseudo-desktop Collection Generation Method
collect documents with similar characteristics generate queries by statistically taking terms from each of the target documents
13
Experiment Pseudo-desktop Collection Prediction Accuracy
Retrieval Performance Mean Reciprocal Rank
14
Experiment Best: use the retrieval method with the best aggregate performance for each sub-collection. Uniform: each collection has the same chance of containing the relevant document. Oracle: have perfect knowledge of the collection that contains the relevant document
15
Experiment CS Collection Generation Method Prediction Accuracy
DocTrack game Prediction Accuracy
16
Experiment CS Collection Retrieval Performance
Leave-one-out Prediction Accuracy
17
Conclusion Suggest a retrieval model for desktop search where type-specific retrieval results are merged into the final rank list based on type prediction scores. Introduce FQL – a new type prediction method.
18
Conclusion Develop a human computation game for collecting queries in a more realistic setting. Show that the combination method can improve type prediction performance.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.