Ranking using Multiple Document Types in Desktop Search

Ranking using Multiple Document Types in Desktop Search
Jinyoung Kim, W. Bruce Croft SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/05/02

Outline Introduction Retrieval Model Experiment Conclusion
Type-specific Retrieval Type Prediction Result Merging Experiment Conclusion

Introduction People have many types of documents on their desktop with different sets of metadata for each type. →sender、receiver office document→filename、author Desktop search system: being able to predict which type of document a user is looking for given a query.

Introduction Main goal: to show how the retrieval effectiveness of a desktop search system can be enhanced by improving type prediction performance.

Retrieval Model query Q = (q1,…,qm)
each collection C contains documents of n field types (F1,…,Fn) each document d may include fields (f1,…,fn)

Type-specific Retrieval
Goal: to rank documents from each sub-collection. Probabilistic Retrieval Model for Semi-structured Data PQL(qi|fj)=(1-λ)P(qi|fj)+λP(qi|Fj)

Type Prediction Goal: to score each collection given a user query.
Methods for Type Prediction Query-likelihood of Collection Query-likelihood of Query Log

Type Prediction Methods for Type Prediction Geometric Average ReDDE
Query Clarity

Type Prediction Methods for Type Prediction Dictionary-based Matching
Built the dictionary for each sub-collection by using the names of the collection and metadata fields. Using Document Metadata Fields New method: field-based collection query likelihood (FQL).

Type Prediction Combining Type Prediction Methods
Grid-search of Parameter Values Golden Section Search Multi-class Classification MultiSVM (Liblinear Toolkit) Rank-learning Method RankSVM

Result Merging C: collection score (from type prediction)
D: document score (from type-specific retrieval)

Experiment Pseudo-desktop Collection Generation Method
collect documents with similar characteristics generate queries by statistically taking terms from each of the target documents

Experiment Pseudo-desktop Collection Prediction Accuracy
Retrieval Performance Mean Reciprocal Rank

Experiment Best: use the retrieval method with the best aggregate performance for each sub-collection. Uniform: each collection has the same chance of containing the relevant document. Oracle: have perfect knowledge of the collection that contains the relevant document

Experiment CS Collection Generation Method Prediction Accuracy
DocTrack game Prediction Accuracy

Experiment CS Collection Retrieval Performance
Leave-one-out Prediction Accuracy

Conclusion Suggest a retrieval model for desktop search where type-specific retrieval results are merged into the final rank list based on type prediction scores. Introduce FQL – a new type prediction method.

Conclusion Develop a human computation game for collecting queries in a more realistic setting. Show that the combination method can improve type prediction performance.

Ranking using Multiple Document Types in Desktop Search

Similar presentations

Presentation on theme: "Ranking using Multiple Document Types in Desktop Search"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ranking using Multiple Document Types in Desktop Search

Similar presentations

Presentation on theme: "Ranking using Multiple Document Types in Desktop Search"— Presentation transcript:

Similar presentations

About project

Feedback