Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011
Introduction Result Diversification – solution to over - specialization problem: retrieval of too homogeneous results – personalization: complementing preferences Problem to be solved – all itemset: X, |X| = n – select k divergent item, include in S – diversity among S is maximized 2/11
Introduction (con’t...) Ways of diversification 1.Content: (dis)similarity btw. items 2.Novelty: most dissimilar compared to previous ones 3.Coverage : items from different categories Approaches in diversification algorithms 1.Greedy 2.Interchange 3/11
Content-based Diversification p-dispersion problem [1] – choosing p out of n points s.t. min. distance btw. any 2 pair is maximized The objective function in web search diversification: – maximizing average intra-list similarity Extension of k-nearest neighbor: Gower coefficient [2] – spatially closest but enough to be divergent from the rest 4/11
Novelty-based Diversification Novelty vs. Diversity – novelty: avoiding redundancy – diversity: resolving ambiguity Information nuggets: intents or classes of query [3] Another diversification measure [4] where iff 5/11
Coverage-based Diversification Typical example, employing classes [5] Maximizes the probability that each relevant category is represented with a document in diversified search result list 6/11
Greedy Heuristics in Diversification itemset distance Flow of recommender algorithm 1.Calculate an itemset distance of new items to S 2.Sort new items according to relevance to the query and item-set distance 3.Combine the ranks of these sorted lists minimum ranked one is added to S by removing the last one 4.Continue with Step 1 until k new items are added 7/11
Interchange Heuristics in Diversification Flow of algorithm [6] 1. S is initialized with k most relevant items 2.The item which contributes the diversity least is interchanged with the most relevant one in X/S Structured Search Results [7] – identification of subset of features that can differentiate the instances more than others 8/11
Evaluation Redundancy-aware Precision and Recall [8] For NDCG calculation Gain is updated as 9/11
Conclusion 3 factors – Content-based – Novelty-based – Coverage-based 2 approaches – Heuristics – Interchanges Employing more than 1 factor in an approach Updated evluation metrics to measure the diversity are used 10/11
References [1]E. Erkut Y. Ulkusal, O. Yenicerioglu. A comparison of p-dispersion heuristics. Computers and OR, 21 (10): , 1994 [2] J. R. Haritsa. The KNDN problem: A quest for unity in diversity. IEEE Data Eng. Bull., 32(4):15–22, [3] C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659–666, [4] Y. Zhang, J. P. Callan, and T. P. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR, pages 81–88, [5] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5–14, [6] C. Yu, L. V. S. Lakshmanan, and S. Amer-Yahia. It takes variety to make a world: diversification in recommender systems. In EDBT, pages 368–378, [7] Z. Liu, P. Sun, and Y. Chen. Structured search result differentiation. PVLDB, 2(1):313–324, [8] Y. Zhang, J. P. Callan, and T. P. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR, pages 81–88, /11