ISWC 2013 Entity Recommendations in Web Search

Slides:



Advertisements
Similar presentations
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Optimizing search engines using clickthrough data
A Machine Learning Approach for Improved BM25 Retrieval
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Information Retrieval in Practice
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Information Retrieval
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Overview of Search Engines
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Post-Ranking query suggestion by diversifying search Chao Wang.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
ASSIST: Adaptive Social Support for Information Space Traversal Jill Freyne and Rosta Farzan.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Retroactive Answering of Search Queries Beverly Yang Glen Jeh Google, Inc. Presented.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Information Retrieval in Practice
Search Engine Optimization (SEO)
Recommendation in Scholarly Big Data
Search Engine Optimization
User Modeling for Personal Assistant
Recommender Systems & Collaborative Filtering
Search Engine Architecture
Clustering of Web pages
Evaluation of IR Systems
An Empirical Study of Learning to Rank for Entity Search
Search Engine Architecture
Personalized Social Image Recommendation
Learning to Rank Shubhra kanti karmaker (Santu)
The Four Dimensions of Search Engine Quality
CIKM Competition 2014 Second Place Solution
Social Knowledge Mining
Information Retrieval
Exploring Scholarly Data with Rexplore
Data Integration for Relational Web
Navi 下一步工作的设想 郑 亮 6.6.
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Table Cell Search for Question Answering Huan Sun
Searching and browsing through fragments of TED Talks
Introduction to Linkify and its Key Technologies
Search Engine Architecture
INF 141: Information Retrieval
Learning to Rank with Ties
Presentation transcript:

ISWC 2013 Entity Recommendations in Web Search Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, and Nicolas Torzec

Introduction Web Search Some web search users know exactly what they are looking for. Others are willing to explore topics related to an initial interest

Hypothesis  Often, the user’s initial interest can be uniquely linked to an entity in a knowledge base. In this case, it is natural to recommend the explicitly linked entities for further exploration. In real world knowledge base, however, the number of linked entities may be very large and not all related entities may be equally relevant. Thus, there is a need for ranking related entities.

Entity Recommendation task Ranking task Given the large number of related entities in the knowledge base, we need to select the most relevant ones to show based on the current query of the user

Why pivot around a single entity? Previous analysis has shown that over 50% web search queries pivot around a single entity that is explicitly named in the query. Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proceedings of the 19th International Conference on World Wide Web, pp.771–780. ACM, New York (2010)

Spark: An Entity Recommender System Wiki/ Freebase/ domain-specific

1-Knowledge Base (Yahoo! knowledge graph) All of the entities, relations, and information that we extract are integrated and managed centrally in a unified knowledge base. Ontology was developed over 2 years by the Yahoo! Editorial team and is aligned with schema.org. It consists of 250 classes of entities and 800 properties for modeling the information associated to them. Offline enrich them. The graph that Spark uses as input consists of 3.5M entities and 1.4B direct and indirect relations from the Movie, TV, Music, Sport and Geo domains

2-Feature Extraction For every triple in the knowledge base, Spark extracts over 100 features. The extracted features can be grouped under three main headings: co-occurrence, popularity, and graph-theoretic features. Spark also extracts a few additional features.

2-Feature Extraction Feature extraction from text Text sources Query terms Query sessions Flickr tags Tweets Common representation Input tweet: Brad Pitt married to Angelina Jolie in Las Vegas Output event: Brad Pitt + Angelina Jolie Brad Pitt + Las Vegas Angelina Jolie + Las Vegas

2-Feature Extraction Features Unary Binary Popularity features from text: probability, entropy, wiki id popularity … Graph features: PageRank on the entity graph Type features: entity type Binary Co-occurrence features from text: conditional probability, joint probability … Graph features: common neighbors … Type features: relation type

3-Ranking Spark that are able to accommodate a large number of features benefit from automated approaches to derive a way to combine feature values into a single score. Training data created by editors (five grades) Brandi adriana lima Brad Pitt person Bad David H. andy garcia Brad Pitt person Fair Jennifer benicio del toro Brad Pitt person Good Jennifer fight club movie Brad Pitt person Perfect Sarah burn after reading Brad Pitt person Excellent Join between the editorial data and the feature file Trained a regression model using GBDT for entity ranking Stochastic Gradient Boosted Decision Trees

4-Disambiguation and Serving In practice, certain entity strings may match multiple entities (e.g., “brad pitt”may refer to the actor entity “Brad Pitt”or the boxer entity “Brad Pitt (boxer)”). How many times a given wiki id was retrieved for queries containing the entity name? Brad Pitt Brad_Pitt 21158 Brad Pitt Brad_Pitt_(boxer) 247

Evaluation Relevance Assessment Normalized Discounted Cumulative Gain (NDCG) as the final performance metric High overall performance but some types are more difficult Locations: Editors downgrade popular entities such as businesses

Evaluation Usage Evaluation Coverage and Click-through Rate (CTR) Coverage is defined as CTR is defined as Queries: the total number of queries submitted to the search engine Views: the number of views (queries that triggered the Spark module) Clicks: the number of clicks on the Spark module The coverage metric indicates the fraction of queries for which we display an entity ranking in the result page The CTR metric indicates the likelihood that the user will click on an entity link

Coverage before and after the new system Before release: Flat, lower After release: Flat, higher

Click-through rate (CTR) before and after the new system Before release: Gradually degrading performance due to lack of fresh data After release: Learning effect: users are starting to use the tool again

Summary Spark System for related entity recommendations Knowledge base Extraction of features from query logs and other user-generated content Machine learned ranking Evaluation

Q&A