ISWC 2013 Entity Recommendations in Web Search

ISWC 2013 Entity Recommendations in Web Search
Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, and Nicolas Torzec

Introduction Web Search
Some web search users know exactly what they are looking for. Others are willing to explore topics related to an initial interest

Hypothesis Often, the user’s initial interest can be uniquely linked to an entity in a knowledge base. In this case, it is natural to recommend the explicitly linked entities for further exploration. In real world knowledge base, however, the number of linked entities may be very large and not all related entities may be equally relevant. Thus, there is a need for ranking related entities.

Entity Recommendation task
Ranking task Given the large number of related entities in the knowledge base, we need to select the most relevant ones to show based on the current query of the user

Why pivot around a single entity?
Previous analysis has shown that over 50% web search queries pivot around a single entity that is explicitly named in the query. Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proceedings of the 19th International Conference on World Wide Web, pp.771–780. ACM, New York (2010)

Spark: An Entity Recommender System
Wiki/ Freebase/ domain-specific

1-Knowledge Base (Yahoo! knowledge graph)
All of the entities, relations, and information that we extract are integrated and managed centrally in a unified knowledge base. Ontology was developed over 2 years by the Yahoo! Editorial team and is aligned with schema.org. It consists of 250 classes of entities and 800 properties for modeling the information associated to them. Offline enrich them. The graph that Spark uses as input consists of 3.5M entities and 1.4B direct and indirect relations from the Movie, TV, Music, Sport and Geo domains

2-Feature Extraction For every triple in the knowledge base, Spark extracts over 100 features. The extracted features can be grouped under three main headings: co-occurrence, popularity, and graph-theoretic features. Spark also extracts a few additional features.

2-Feature Extraction Feature extraction from text Text sources
Query terms Query sessions Flickr tags Tweets Common representation Input tweet: Brad Pitt married to Angelina Jolie in Las Vegas Output event: Brad Pitt + Angelina Jolie Brad Pitt + Las Vegas Angelina Jolie + Las Vegas

2-Feature Extraction Features Unary Binary
Popularity features from text: probability, entropy, wiki id popularity … Graph features: PageRank on the entity graph Type features: entity type Binary Co-occurrence features from text: conditional probability, joint probability … Graph features: common neighbors … Type features: relation type

3-Ranking Spark that are able to accommodate a large number of features benefit from automated approaches to derive a way to combine feature values into a single score. Training data created by editors (five grades) Brandi adriana lima Brad Pitt person Bad David H. andy garcia Brad Pitt person Fair Jennifer benicio del toro Brad Pitt person Good Jennifer fight club movie Brad Pitt person Perfect Sarah burn after reading Brad Pitt person Excellent Join between the editorial data and the feature file Trained a regression model using GBDT for entity ranking Stochastic Gradient Boosted Decision Trees

4-Disambiguation and Serving
In practice, certain entity strings may match multiple entities (e.g., “brad pitt”may refer to the actor entity “Brad Pitt”or the boxer entity “Brad Pitt (boxer)”). How many times a given wiki id was retrieved for queries containing the entity name? Brad Pitt Brad_Pitt Brad Pitt Brad_Pitt_(boxer) 247

Evaluation Relevance Assessment
Normalized Discounted Cumulative Gain (NDCG) as the final performance metric High overall performance but some types are more difficult Locations: Editors downgrade popular entities such as businesses

Evaluation Usage Evaluation Coverage and Click-through Rate (CTR)
Coverage is defined as CTR is defined as Queries: the total number of queries submitted to the search engine Views: the number of views (queries that triggered the Spark module) Clicks: the number of clicks on the Spark module The coverage metric indicates the fraction of queries for which we display an entity ranking in the result page The CTR metric indicates the likelihood that the user will click on an entity link

Coverage before and after the new system
Before release: Flat, lower After release: Flat, higher

Click-through rate (CTR) before and after the new system
Before release: Gradually degrading performance due to lack of fresh data After release: Learning effect: users are starting to use the tool again

Summary Spark System for related entity recommendations Knowledge base
Extraction of features from query logs and other user-generated content Machine learned ranking Evaluation

ISWC 2013 Entity Recommendations in Web Search

Similar presentations

Presentation on theme: "ISWC 2013 Entity Recommendations in Web Search"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ISWC 2013 Entity Recommendations in Web Search

Similar presentations

Presentation on theme: "ISWC 2013 Entity Recommendations in Web Search"— Presentation transcript:

Similar presentations

About project

Feedback