Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

Slides:

Advertisements

Similar presentations

Beliefs & Biases in Web Search

Advertisements

Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Struggling or Exploring? Disambiguating Long Search Sessions

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Ryen White, Susan Dumais Microsoft Research {ryenw,

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Evaluating Search Engine

Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Large Scale Findability Analysis Shariq Bashir PhD-Candidate Department of Software Technology and Interactive Systems.

Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

INFO 624 Week 3 Retrieval System Evaluation

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.

Modeling Long-Term Search Engine Usage Ryen White, Ashish Kapoor & Susan Dumais Microsoft Research.

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.

Modern Retrieval Evaluations Hongning Wang

From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,

Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Ramakrishnan Srikant Sugato Basu Ni Wang Daryl Pregibon 1.

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.

Understanding Query Ambiguity Jaime Teevan, Susan Dumais, Dan Liebling Microsoft Research.

Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:

Ryen W. White, Dan Morris Microsoft Research, Redmond, USA {ryenw,

Today Ensemble Methods. Recap of the course. Classifier Fusion

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.

Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Post-Ranking query suggestion by diversifying search Chao Wang.

ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.

1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.

Modern Retrieval Evaluations Hongning Wang

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

NTU & MSRA Ming-Feng Tsai

Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.

Accurately Interpreting Clickthrough Data as Implicit Feedback

Search User Behavior: Expanding The Web Search Frontier

Evaluation of IR Systems

Martin Rajman, Martin Vesely

S519: Evaluation of Information Systems

Personalizing Search on Shared Devices

Beliefs and Biases in Web Search

Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Presentation transcript:

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research

Motivation Query evaluation is critical for search engines Understanding the quality of search results for individual queries (or in the aggregate) Query evaluation often involves: Time-consuming and expensive human judgments User studies covering only a small fraction of queries Automated methods can lead to rapid and cost-effective query performance prediction Prior work used features of queries, results, the collection E.g., Query clarity (Cronen-Townsend et al., 2002); query difficulty prediction (Hauff et al., 2008)

Contribution Our work differs from previous research: Investigates query, results, and interaction features Uses search engine logs (rather than standard IR test collections), since they reflect diversity of Web search tasks Contributions: Investigate novel and rich set of interaction features in predicting query performance Determine which features and combinations of features are more important in predicting query quality Understand how accuracy varies with query frequency

Predicting Query Performance

Features Three classes: Query from Bing search logs E.g., QueryLength, HasURLFragment, HasSpellCorrection Results from Bing results pages E.g., AvgNumAds, AvgNumResults, MaxBM25F Text-matching baseline Interaction from Bing search logs and MSN toolbar logs E.g., AvgClickPos, AvgClickDwell, AbandonmentRate Include search engine switching and user satisfaction estimates Satisfaction estimates based on page dwell times Logs collected during one week in July 2009

Experiment 2,834 queries from randomly sampling Bing query logs Mixture of common and rare queries 60% training / 20% validation / 20% testing Explicit relevance judgments used to generate ground truth DCG values for training and testing Query / Results / Interaction features generated for each query in the set

Experiment Prediction model Regression: multiple additive regression trees (MART) Advantages of MART include model interpretability, facility for rapid training and testing, and robustness Metrics used to evaluate performance Pearson’s correlation (R), mean absolute error (MAE) Compare predicted with ground truth based on explicit human judgments) Five-fold cross validation to improve result reliability

Findings: All Features Effectively predicts R=0.699, MAE =0.160 Correlation is sensible across the full range of DCG values Most predictive feature is an interaction feature Average rank of result click Disagreements in prediction associated with novel result presentation E.g., Instant answers (likes maps and images) may influence user interaction features

Findings: Feature Combinations Interaction features perform close to all features Strong predictive signal in interaction behavior Results features perform reasonably well Query features perform poorly Do not add much to Results or Interaction features Feature SetRMAE Query + Results + Interaction (full model) Results + Interaction Query + Interaction Interaction only0.667 *0.166 * Query + Results0.556 **0.193 ** Results only0.522 **0.200 ** Query only0.323 **0.228 ** (Diff. from full model: * = p <.05, ** = p <.01)

Findings: Query Frequency Interaction features are important, but mostly available for frequent queries How well can we do on infrequent queries? We looked at the correlation for different frequency bins Ranked queries by frequency Divided queries into equally-sized bins Computed correlation between predicted & actual

Findings: Query Frequency Linear regression revealed very slight relationship between query frequency and prediction accuracy (R 2 =.008) This is good – we can accurately predict for non-popular queries High frequencyLow frequency

Summary Automatically predicted search engine performance using query, results, and interaction features Strong correlation (R ≈ 0.7) between predicted query performance and human relevance judgments using all feature classes Users’ search interactions provide a strong signal of engine performance, performing well alone and adding substantially to Query and Results features

Implications and Future Work Accurate prediction can help search engines: Know when to apply different processing / ranking / presentation methods Identify poorly-performing queries Sample queries of different quality Further research is required to understand: Role of other features Effects related to the nature of the document collection Impact of engine settings on prediction effectiveness