Optimizing search engines using clickthrough data

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.

Introduction to Information Retrieval

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson.

A Quality Focused Crawler for Health Information Tim Tang.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Evaluating Search Engine

Information Retrieval in Practice

Click Evidence Signals and Tasks Vishwa Vinay Microsoft Research, Cambridge.

1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.

ADVISE: Advanced Digital Video Information Segmentation Engine

LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.

1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.

Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.

J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Overview of Search Engines

Online Search Evaluation with Interleaving Filip Radlinski Microsoft.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement and Relevance Feedback.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.

Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Universit at Dortmund, LS VIII

Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Personalized Search Xiao Liu

Search Engines that Learn from Implicit Feedback Jiawen, Liu Speech Lab, CSIE National Taiwan Normal University Reference: Search Engines that Learn from.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.

Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM

Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Post-Ranking query suggestion by diversifying search Chao Wang.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

Bringing Order to the Web : Automatically Categorizing Search Results Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Hao Chen Susan Dumais.

Information Retrieval in Practice

Accurately Interpreting Clickthrough Data as Implicit Feedback

Search Engine Architecture

Search User Behavior: Expanding The Web Search Frontier

Content-Aware Click Modeling

Evidence from Behavior

Web Information retrieval (Web IR)

How does Clickthrough Data Reflect Retrieval Quality?

Learning to Rank with Ties

Presentation transcript:

Optimizing search engines using clickthrough data ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΤΕΧΝΟΛΟΓΙΑΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΥΠΟΛΟΓΙΣΤΩΝ Optimizing search engines using clickthrough data

Problem Optimization of web search results ranking Ranking algorithms mainly based on similarity Similarity between query keywords and page text keywords Similarity between pages (Page Rank) No consideration for user personal preferences Digital libraries High rank Vacation Low rank

Problem E.g.: query “delos” Digital libraries Archaeology Vacation Room for improvement by incorporating user behaviour data: user feedback Use of previous implicit search information to enhance result ranking Digital libraries High rank Vacation Low rank

Types of user feedback Explicit feedback Implicit feedback User explicitly judges relevance of results with the query Costly in time and resources Costly for user  limited effectiveness Direct evaluation of results Implicit feedback Extracted from log files Large amount of information Indirect evaluation of results through click behaviour

Implicit feedback (Categories) Clicked results Absolute relevance: Clicked result -> Relevant Risky: poor user behaviour quality Percentage of result clicks for a query Frequently clicked groups of results Links followed from result page Relative relevance: Clicked result -> More relevant than non-clicked More reliable Time Between clicks E.g., fast clicks  maybe bad results Spent on a result page E.g., a lot of time  relevant page First click, scroll E.g., maybe confusing results

Implicit feedback (Categories) Query chains: sequences of reformulated queries to improve results of initial search Detection: Query similarity Result sets similarity Time Connection between results of different queries: Enhancement of a bad query with another from the query chain Comparison of result relevance between different query results Scrolling Time (quality of results) Scrolling behaviour (quality of results) Percentage of page (results viewed) Other features Save, print, copy etc of result page  maybe relevant Exit type (e.g. close window  poor results)

Joachims approach Clickthrough data Method Experiments Relative relevance Indicated by user behaviour study Method Training of svm functions Training input: inequations of query result rankings Trained function: weight vector for features examined Use of trained vector to give weights to examined features Experiments Comparison of method with existing search engines

Clickthrough data Form Relative relevance (regarding a specific query) Triplets (q,r,c) = (query,ranked results,links clicked) …in the log file of a proxy server Relative relevance (regarding a specific query) dk more relevant than dl if dk clicked and dl not clicked and dk lower initial ranking than dl d5 more relevant than d4 d5 more relevant than d2 d3 more relevant than d2

Clickthrough data (Studies) Why not absolute relevance? User behaviour influenced by initial ranking Study: Rank and viewership Percentage of queries where a user viewed the search result presented at a particular rank Conclusion: Most times users view only the few first results Percentage of queries where a user clicked the result presented at a particular rank, both in normal and swapped conditions Conclusion: Users tend to click on higher ranked results irrespective of content

Method (System train input) Data extracted from log Relevance inequalities: dk <rq dl dk more relevant than dl for query q Construct relevance inequalites for every query q and every result pair (dk , dl) for q For each link di (result of query q): construct a feature vector Φ(q,di) d5 more relevant than d4 d5 more relevant than d2 d3 more relevant than d2

Method (System train input) Feature vector: Describes the quality of the match between a document di and a query q Φ(q,di) = [rank_X, top1_X, top10_X, …..]

Method (System train input) Weight vector: w = [w1, w2,…,wn] Assigns weights to each of the features in Φ(q,di) Sdi = w * Φ(q,di) assigns a score to document di for query q w: unknown  must be trained by solving the system of relative relevance inequalities derived from clickthrough data

Method (System training) Initial input transformation: dk <rm dl => Sdk >rm Sdl => w * Φ(qm,dk) > w * Φ(qm,dl) => [w1, w2,…,wn] Φ(qm,dk) > [w1, w2,…,wn] Φ(qm,dl) System of relevance inequalites for every query q and every result pair (dk , dl) for q Object of training: Knowing, for every clicked link di , its feature vector, Find an optimal weight vector w which satisfies most of the inequalities above (optimal solution) With vector w known, we can calculate a score for every link, whether clicked or not, hence a new ranking d5 more relevant than d4 d5 more relevant than d2 d3 more relevant than d2

Method (System training) Intuitively Example: 2 dimension feature vector X-axis: similarity between query terms and link text Y-axis: time spent on link page Looking for optimal weight vector w If training input: r1 < r2 < r3 < r4 Optimal solution w1: text similarity more important for ranking If training input: r2 < r3 < r1 < r4 Optimal solution w2: time more important for ranking w optimized by maximizing δ Link 2 Link 3 Link 4 Link 1

Experiments Based on a meta-search engine Combination of results from different search engines “Fair” presentation of results: For a meta-meta-search engine combining 2 engines: Top z results of meta-search contain x results from engine A y results from engine b x + y = z |x – y| <= 1

Experiments Meta-search engine Assumptions Users click more often on more relevant links Preference for one of the search engines not influenced by ranking

Experiments (Results) Conditions 20 users (university students) 260 training queries collected 20 days of training 10 days of evaluation Comparison with Google MSNSearch Toprank (meta-search for Google, MSNSearch, Altavista, Excite, Hotbot) Results

Experiments (Results) Weight vector

Open issues Trade-off between amount of training data and homogeneity Clustering algorithms to find homogenous groups of users Adaptation to the properties of a particular document collection Incremental online learning/feedback algorithm Protection from spamming

Joachims approach II Clickthrough data Method Addition of new relative relevance criterions Addition of query chains Method Modified feature vector Rank features Term/document features Constraints to the svm optimization problem To avoid trivial/incorrect solutions

Query chains Sequences of reformulated queries Ways of detection Poor results for first query Too many/few terms Not representative terms q1: “special collections”  q2: “rare books” Incorrect spelling q1: “Lexis Nexus”  q2: “Lexis Nexis” Execution of new query Addition/deletion of query terms to get better results In every query of the sequence, searching for the same thing Ways of detection Term similarity between queries Percentage of common results Time between queries

Query chains detection method 1st strategy Data recorded from log for 1285 queries: query, date, IP address, results returned, number of clicks on the results, session id uniquely assigned Manual grouping of queries in query chains Division of data set into training and testing set Training Feature vector for every pair of query Training of a weight vector indicating which features are more important for a query pair to be a query chain Testing Recognition of query chains in testing set Comparison with manual grouping 94.3% accuracy 2nd strategy Assumption: Query chain if: Queries from the same IP Time between 2 queries < 30 min 91.6% accuracy Adoption of (simpler) 2nd strategy

Clickthrough data (studies) Conclusion: Most times users view the first 2 results Conclusion: Most times users view the next result after the last clicked

Relevance criteria (query) Click >q Skip Above dk more relevant than dl if dk clicked and dl not clicked and dk lower initial ranking than dl Click First >q No-Click Second d1 more relevant than d2 if d1 clicked d2 not clicked d5 more relevant than d4 d5 more relevant than d2 d3 more relevant than d2 d1 more relevant than d2

Relevance criteria (query chain) Click >q1 Skip Above dk more relevant than dl if dk clicked and dl not clicked and dk lower initial ranking than dl Click First >q1 No-Click Second d1 more relevant than d2 if d1 clicked d2 not clicked FOR QUERY 1 (q’) d8 more relevant than d7 d8 more relevant than d6 FOR QUERY 1 (q’) d5 more relevant than d6 QUERY 1 QUERY 2

Relevance criteria (query chain II) Click >q1 Skip Earlier Query Click First >q1 Top 2 Earlier Query FOR QUERY 1 (q’) d6 more relevant than d1 d6 more relevant than d2 d6 more relevant than d4 FOR QUERY 1 (q’) d6 more relevant than d1 d6 more relevant than d2 QUERY 1 QUERY 2

Relevance criteria (Experiment) Sequences of reformulated queries Search on 16 subjects Relevance inequalities produced by previous criteria Comparison with explicit relevance judgements on query chains

Method (SVM training) Feature vector consists of Rank features Rank features φfirank(d, q) Term/document features φterms(d, q) Rank features One φfirank(d, q) for every retrieval function fi examined Every φfirank(d, q) consists of 28 features For ranks 1,2,3,…10,15,20,…100 in the initial ranking If rank of d in initial ranking ≤ 1,2,3,…10,15,20,…100 set 1 else set 0 Allow us to learn weights for and combine different retrieval functions

Method (SVM training) Term/document features φterms(d, q) consists of N*M features N number of terms M number of documents Set 1 if di = d and tj ε q Trains weight vector to learn assosiations between terms and documents

Method (Constraints) Problem Solution Most relevance criteria suggest that a lower ranked link (in the initial ranking) is better than a higher ranked link Trivial solution: reverse the initial ranking by assigning negative values to weight vector Solution Each w*φfirank(d, q) ≥ a minimun positive value

Experiments Based on a meta-search engine Training Initial retrieval function from Nutch Training 9949 queries 7429 clicks Pqc: Total of 120134 relative relevance inequalities Pnc: A set of Pqc generated without use of query chains Results: (rel0 initial ranking)

Experiments Results Trained weights for term/document features

Open issues Tolerance of noise and malicious clicks Different weights in each query of a query chain Personalized ranking functions Improvement of performance

Related work (Fox et al. 2005) Evaluation of implicit measures Use of explicit ratings of user satisfaction Method Bayesian modeling and decision trees Gene analysis Results: Importance of combination of Clickthrough Time Exit type Features used:

Related work (Agichtein, Brill, Dumais 2006) Incorporation of implicit feedback features directly into the initial ranking algorithm BM25F RankNet Features used:

Related work Query/links clustering based on features extracted from implicit feedback Score based on interrelations between queries and web pages

Possible extensions Utilization of time feedback As relative relevance criterion As a feature Other types of feedback Scrolling Exit type Combination with absolute relevance clickthrough feedback Percentage of result clicks for a query Links followed from result page Query chains Improvement of detection method Link association rules For frequently clicked groups of results Query/links clustering Constant training of ranking functions

References Search Engines that Learn from Implicit Feedback Joachims, Radlinski Optimizing Search Engines Using Clickthrough Data Joachims Query Chains: Learning to Rank from Implicit Feedback Radlinski, Joachims Evaluating implicit measures to improve web search Fox et al. Implicit Feedback for Inferring User Preference A Bibliography Kelly, Teevan Improving Web Search Ranking by Incorporating User Behavior Agichtein, Brill, Dumais Optimizing web search using web click-through data Xue, Zeng, Chen, Yu, Ma, Xi, Fan

Clickthrough data (Studies) Why not absolute relevance? Study: Rank and viewership Percentage of queries where a user clicked the result presented at a particular rank, both in normal and swapped conditions Conclusion: Users tend to click on higher ranked results irrespective of content

Types of user feedback Explicit feedback Implicit feedback User explicitly judges relevance of results with the query Costly in time and resources Costly for user  limited effectiveness Direct evaluation of results Implicit feedback Extracted from log files Large amount of information Real user behaviour (not expert judgement) Indirect evaluation of results through click behaviour