An Empirical Study of Learning to Rank for Entity Search

An Empirical Study of Learning to Rank for Entity Search
Jing Chen, Chenyan Xiong, Jamie Callan Language Technologies Institute Carnegie Mellon University Overall Retrieval Performance Abstract Our work investigates the effectiveness of learning to rank methods for entity search. Entities are represented by multi-field documents and field-based text similarity features are extracted. We explore state-of-the-art learning to rank methods and experiments achieve state-of-the-art performances on a DBpedia entity search test collection. Baseline models: SDM-CA MLM-CA Prior state-of-the-art: FSDM Ad-hoc Entity Retrieval Using a query to retrieve one or more entities satisfying some underlying information need Statistical significance: †: over SDM-CA ‡: over MLM-CA §:over FSDM Entity Representation Entity Retrieval Entity Ranking Our Focus Entity Representation Group RDF triples (Web knowledge representation) of each entity into 5 fields following Zhiltsov et al. [1] Field Description Name Names of entities Cat Categories of entities Attr All attributes but names of entities RelEn Names of entity’s neighbor entities SimEn Name of entity’s aliases Learning to Rank Entities Rank entities using state-of-the-art learning to rank (LeToR) algorithms, whose main idea is to apply machine learning techniques to re-rank an initial ranked list of entities. Query-entity Features: we here adopt traditional IR features Favors keyword-based queries Feature Description Dim FSDM Fielded sequential dependency model score [1] 1 SDM on all fields Sequential dependency model score 5 BM25 on all fields BM25 model score (default parameters) Lm on all fields Coordinate match on all fields Coordinate match score Cosine on all fields Cosine correlation score X-axis lists all queries, ordered by relative performance. Y-axis is the relative performance of RankSVM comparing with FSDM in Positive value indicates improvement and negative value indicates loss. Field and Feature Study Different queries benefit from different fields and features Coordinate Ascent: gradient-based list-wise model, aiming to directly optimize evaluation metrics such as mean average precision (MAP). X-axis lists the fields or feature groups. Y-axis is the relative difference between RankSVM used with all fields or feature groups and without corresponding field or feature group. Larger values indicate more contribution. Experiment Setup Datasets: DBpedia version 3.7 Query sets: 4 query sets from collection [2], 485 queries in total, binary relevance judgments Conclusion Experiments confirm that LeToR methods are as powerful for ranking entities as for documents Our work establishes a new state-of-the-art for accuracy on the benchmark dataset utilized We are extending this work to investigate better entity representations constructed from DBpedia more effective learning features for ad-hoc entity search task [1] N. Zhiltsov, A. Kotov, and F. Nikolaev. Fielded sequential dependence model for ad-hoc entity retrieval in the web of data. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015), pages 253–262. ACM, 2015. [2] K. Balog and R. Neumayer. A test collection for entity search in dbpedia. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2013), pages 737–740. ACM, 2013.

An Empirical Study of Learning to Rank for Entity Search

Similar presentations

Presentation on theme: "An Empirical Study of Learning to Rank for Entity Search"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Empirical Study of Learning to Rank for Entity Search

Similar presentations

Presentation on theme: "An Empirical Study of Learning to Rank for Entity Search"— Presentation transcript:

Similar presentations

About project

Feedback