Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Slides:



Advertisements
Similar presentations
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Advertisements

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Optimizing search engines using clickthrough data
A Machine Learning Approach for Improved BM25 Retrieval
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson.
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
Evaluating Search Engine
Click Evidence Signals and Tasks Vishwa Vinay Microsoft Research, Cambridge.
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Adapting Deep RankNet for Personalized Search
Learning to Rank for Information Retrieval
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
GAUSSIAN PROCESS FACTORIZATION MACHINES FOR CONTEXT-AWARE RECOMMENDATIONS Trung V. Nguyen, Alexandros Karatzoglou, Linas Baltrunas SIGIR 2014 Presentation:
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Question Answering over Implicitly Structured Web Content
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Learning to Rank From Pairwise Approach to Listwise Approach.
COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.
Post-Ranking query suggestion by diversifying search Chao Wang.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
User Modeling for Personal Assistant
Search User Behavior: Expanding The Web Search Frontier
Evaluation of IR Systems
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Learning to Rank Shubhra kanti karmaker (Santu)
Eugene Agichtein Mathematics & Computer Science Emory University
Intent-Aware Semantic Query Annotation
Evidence from Behavior
Learning to Rank with Ties
Presentation transcript:

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research

2 Web Search Ranking Rank pages relevant for a query Rank pages relevant for a query –Content match e.g., page terms, anchor text, term weights e.g., page terms, anchor text, term weights –Prior document quality e.g., web topology, spam features e.g., web topology, spam features –Hundreds of parameters Tune ranking functions on explicit document relevance ratings Tune ranking functions on explicit document relevance ratings

3 Query: SIGIR 2006 Users can help indicate most relevant results Users can help indicate most relevant results

4 Web Search Ranking: Revisited Incorporate user behavior information Incorporate user behavior information –Millions of users submit queries daily –Rich user interaction features (earlier talk) –Complementary to content and web topology Some challenges: Some challenges: –User behavior “in the wild” is not reliable –How to integrate interactions into ranking –What is the impact over all queries

5 Outline Modelling user behavior for ranking Modelling user behavior for ranking Incorporating user behavior into ranking Incorporating user behavior into ranking Empirical evaluation Empirical evaluation Conclusions Conclusions

6 Related Work Personalization Personalization –Rerank results based on user’s clickthrough and browsing history Collaborative filtering Collaborative filtering –Amazon, DirectHit: rank by clickthrough General ranking General ranking –Joachims et al. [KDD 2002], Radlinski et al. [KDD 2005]: tuning ranking functions with clickthrough

7 Rich User Behavior Feature Space Observed and distributional features Observed and distributional features –Aggregate observed values over all user interactions for each query and result pair –Distributional features: deviations from the “expected” behavior for the query Represent user interactions as vectors in user behavior space Represent user interactions as vectors in user behavior space –Presentation: what a user sees before a click –Clickthrough: frequency and timing of clicks –Browsing: what users do after a click

8 Some User Interaction Features Presentation ResultPosition Position of the URL in Current ranking QueryTitleOverlap Fraction of query terms in result Title Clickthrough DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page ClickDeviation Deviation from expected click frequency Browsing DwellTime Result page dwell time DwellTimeDeviation Deviation from expected dwell time for query

9 Training a User Behavior Model Map user behavior features to relevance judgements Map user behavior features to relevance judgements RankNet: Burges et al., [ICML 2005] RankNet: Burges et al., [ICML 2005] –Scalable Neural Net implementation –Input: user behavior + relevance labels –Output: weights for behavior feature values –Used as testbed for all experiments

10 Training RankNet For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

11 RankNet [Burges et al. 2005] Feature Vector1 Label1 NN output 1 For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

12 RankNet [Burges et al. 2005] Feature Vector2 Label2 NN output 1 NN output 2 For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

13 RankNet [Burges et al. 2005] NN output 1 NN output 2 Error is function of both outputs (Desire output1 > output2) For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)

14 Predicting with RankNet Feature Vector1 NN output Present individual vector and get score Present individual vector and get score

15 Outline Modelling user behavior Modelling user behavior Incorporating user behavior into ranking Incorporating user behavior into ranking Empirical evaluation Empirical evaluation Conclusions Conclusions

16 User Behavior Models for Ranking Use interactions from previous instances of query Use interactions from previous instances of query –General-purpose (not personalized) –Only available for queries with past user interactions Models: Models: –Rerank, clickthrough only: reorder results by number of clicks –Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences –Integrate directly into ranker: incorporate user interactions as features for the ranker

17 Rerank, Clickthrough Only Promote all clicked results to the top of the result list Promote all clicked results to the top of the result list –Re-order by click frequency Retain relative ranking of un-clicked results Retain relative ranking of un-clicked results

18 Rerank, Preference Predictions Re-order results by function of preference prediction score Re-order results by function of preference prediction score Experimented with different variants Experimented with different variants –Using inverse of ranks –Intuition: scores not comparable  merge ranks

19 Integrate User Behavior Features Directly into Ranker For a given query For a given query –Merge original feature set with user behavior features when available –User behavior features computed from previous interactions with same query Train RankNet on enhanced feature set Train RankNet on enhanced feature set

20 Outline Modelling user behavior Modelling user behavior Incorporating user behavior into ranking Incorporating user behavior into ranking Empirical evaluation Empirical evaluation Conclusions Conclusions

21 Evaluation Metrics Precision at K: fraction of relevant in top K Precision at K: fraction of relevant in top K NDCG at K: norm. discounted cumulative gain NDCG at K: norm. discounted cumulative gain –Top-ranked results most important MAP: mean average precision MAP: mean average precision – –Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved

22 Datasets 8 weeks of user behavior data from anonymized opt-in client instrumentation 8 weeks of user behavior data from anonymized opt-in client instrumentation Millions of unique queries and interaction traces Millions of unique queries and interaction traces Random sample of 3,000 queries Random sample of 3,000 queries –Gathered independently of user behavior –1,500 train, 500 validation, 1,000 test Explicit relevance assessments for top 10 results for each query in sample Explicit relevance assessments for top 10 results for each query in sample

23 Methods Compared Content only: BM25F Content only: BM25F Full Search Engine: RN Full Search Engine: RN –Hundreds of parameters for content match and document quality –Tuned with RankNet Incorporating User Behavior Incorporating User Behavior –Clickthrough: Rerank-CT –Full user behavior model predictions: Rerank-All –Integrate all user behavior features directly: +All

24 Content, User Behavior: Precision at K, queries with interactions BM25 < Rerank-CT < Rerank-All < +All

25 Content, User Behavior: NDCG BM25 < Rerank-CT < Rerank-All < +All

26 Full Search Engine, User Behavior: NDCG, MAP MAPGain RN0.270 RN+ALL ( 19.13%) BM BM25+ALL (23.71%)

27 Impact: All Queries, Precision at K < 50% of test queries w/ prior interactions precision over all test queries

28 Impact: All Queries, NDCG NDCG over all test queries

29 Which Queries Benefit Most Most gains are for queries with poor ranking

30 Conclusions Incorporating user behavior into web search ranking dramatically improves relevance Incorporating user behavior into web search ranking dramatically improves relevance Providing rich user interaction features to ranker is the most effective strategy Providing rich user interaction features to ranker is the most effective strategy Large improvement shown for up to 50% of test queries Large improvement shown for up to 50% of test queries

31 Thank you Text Mining, Search, and Navigation group: Adaptive Systems and Interaction group: Microsoft Research

32 Content,User Behavior: All Queries, Precision at K BM25 < Rerank-CT < Rerank-All < All

33 Content, User Behavior: All Queries, NDCG BM25 << Rerank-CT << Rerank-All < All

34 Results Summary Incorporating user behavior into web search ranking dramatically improves relevance Incorporating user behavior into web search ranking dramatically improves relevance Incorporating user behavior features into ranking directly most effective strategy Incorporating user behavior features into ranking directly most effective strategy Impact on relevance substantial Impact on relevance substantial Poorly performing queries benefit most Poorly performing queries benefit most

35 Promising Extensions Backoff (improve query coverage) Backoff (improve query coverage) Model user intent/information need Model user intent/information need Personalization of various degrees Personalization of various degrees Query segmentation Query segmentation