Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

Slides:



Advertisements
Similar presentations
Beliefs & Biases in Web Search
Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
A3 Getting Out What You Put In R T I Data School Leadership Team Training Pamela Shannon, School Psychologist
Optimizing search engines using clickthrough data
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Beliefs & Biases in Web Search Ryen White Microsoft Research
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Evaluating Search Engine
BA 555 Practical Business Analysis
INFO 624 Week 3 Retrieval System Evaluation
Keystroke Biometric Studies Keystroke Biometric Identification and Authentication on Long-Text Input Book chapter in Behavioral Biometrics for Human Identification.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Adapting Deep RankNet for Personalized Search
Modern Retrieval Evaluations Hongning Wang
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Evaluating a Research Report
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Testing & modeling users. The aims Describe how to do user testing. Discuss the differences between user testing, usability testing and research experiments.
Information commitments, evaluative standards and information searching strategies in web-based learning evnironments Ying-Tien Wu & Chin-Chung Tsai Institute.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
EDCI 696 Dr. D. Brown Presented by: Kim Bassa. Targeted Topics Analysis of dependent variables and different types of data Selecting the appropriate statistic.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Presenter: Lung-Hao Lee Nov. 3, Room 310.  Introduction  Related Work  Methods  Results ◦ General Gaze Distribution on SERPs ◦ Effects of Task.
Chapter 10 Verification and Validation of Simulation Models
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Retroactive Answering of Search Queries Beverly Yang Glen Jeh.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
CHAPTER 9 Producing Data: Experiments BPS - 5TH ED.CHAPTER 9 1.
Post-Ranking query suggestion by diversifying search Chao Wang.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Effectiveness of Selected Supplemental Reading Comprehension Interventions: Impacts on a First Cohort of Fifth-Grade Students June 8, 2009 IES Annual Research.
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
BSHS 382 MASTER Leading through innovation/bshs382masterdotcom.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Evaluation Anisio Lacerda.
Data Based Decision Making
Regression Analysis Module 3.
Determining How Costs Behave
Chapter 10 Verification and Validation of Simulation Models
NBA Draft Prediction BIT 5534 May 2nd 2018
Personalizing Search on Shared Devices
Struggling and Success in Web Search
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
DESIGN OF EXPERIMENTS by R. C. Baker
Presentation transcript:

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1

Current Problem: Binary classification  Research to date has modeled and predicted search satisfaction on a binary scale 2

Solution: multi-level measurement  we model search satisfaction using features indicating:  search outcome  search effort  changes in both outcome and effort during a session 3

Current Problem: Results are not explained  Black box techniques:  Input: search interaction data  Output: predictions of outcomes such as satisfaction, frustration, and success.  =>we do not understand why searchers are satisfied, frustrated, or successful. 4

Solution: in-depth analysis  We show the value of search outcome compared with search effort best explains and most strongly correlates with search satisfaction.  There are rich and non-monotonous differences in search behavior in sessions with different satisfaction levels. 5

What is Satisfaction?  A subjective measure of search experience. Kelly: “satisfaction can be understood as the fulfillment of a specified desire or goal.”  In economics, we explain satisfaction in terms of utility. For example, Su [34] defined utility as a measure of worth of search results versus time, physical effort, and mental effort expended. This suggests that we can consider searcher satisfaction as a com- pound measure of multiple factors, including search outcome and search effort. In a recent study, Yilmaz et al. [37] also confirmed the impact of searcher effort on document relevance and utility.  =>The comparison of search outcomes and search effort. 6

How to collect data?  We sampled and annotated search tasks from the search logs of consenting users of the Microsoft Bing search engine in April,  These logs were collected through a widely-distributed browser toolbar. These log entries included a unique identifier for each searcher, a timestamp for each page view, a unique browser window identifier, and the URL of the Web page visited. Intranet and secure (https) URL visits were excluded at the source. 7

Data preprocessing  From these logs we extracted raw search sessions.  Every raw session began with a search query and could contain further queries and visits to clicked search results.  A session ended if the searcher was idle for more than 30 minutes  We use the term search session to refer to queries and other search activity belonging to the same search task throughout the paper  We excluded single-query sessions 8

Data Processing  The search sessions in our dataset were annotated with session search satisfaction and query result quality by crowd workers.  We assign sessions into four satisfaction levels as follows: very high (4.33 < s ≤ 5, 11.1% of the sessions); high (4.0 ≤ s ≤ 4.33, 39.7%); medium (3.33 ≤ s ≤ 3.67, 33.4%); low (s ≤ 3, 15.8%).  =>The grouping strategy ensures that there are sufficient sessions at each of the four satisfaction levels to perform statistical analysis. 9

GRADED SEARCH SATISFACTION  Satisfaction: measured as the average rating of three annotators on session satisfaction. (higher is better)  Search outcome (gain): the value of search results to the searcher, i.e., the quantity of useful information found. (higher is better)  Search effort (cost): the cost of acquiring information from the search engine, e.g., issuing queries, examining result snippets on SERPs, reading clicked results, etc. (lower is better) 10

Relationship between satisfaction and search outcome and effort  outcome=session cumulated gain (sCG)  strong correlation (r = 0.77) between satisfaction and average search outcome per effort (sCG / # queries).  =>Verifies our assumption that it is appropriate to explain satisfaction as a relative measure of outcome versus effort. 11

Prediction  Poisson regression yielded the best results and report its results in this section. 12

Conclusion  Given the presence of these differences, we developed a predictive model that can more accurately estimate graded search satisfaction than existing satisfaction modeling methods by considering search outcomes and searcher effort, both independently and in combination 13

Toward Predicting the Outcome of an A/B Experiment for Search Relevance 14

Old Method  Estimating online click-based metrics of a ranking function is to run it in a controlled experiment on live users.  =>configuring and running an online experiment is cumber- some and time-intensive. 15

New method  Uses historical search log to reliably predict online click- based metrics of a new ranking function, without actually running it on live users. 16

1 st approach  Take advantage of diversified behavior of a search engine over a long period of time to simulate randomized data collection  =>our approach can be used at very low cost. 17

2 nd approach  Replace exact matching (of recommended items in previous work) by fuzzy matching (of search result pages) to increase data efficiency, via a better trade-off of bias and variance. 18

Principle of A/B experiment  Incoming users are randomly split into two groups, where the control group is served by a baseline system and the treatment group by a variant.  Run the experiment for a period of time (often ranging from one week to a few weeks)  Online metrics of the two systems are compared  =>the one with higher metric values wins. 19

Limitations  The search log is not exploratory enough to cover all possible rankings for any query.  Source of bias of our approach is introduced by potential confounding in the data  =>Randomized data collection can remove these two biases, hence should be attempted whenever possible. 20

FUZZY MATCHING  Fuzzy matching searches a translation memory database for a query’s phrases or words, finding derivatives by suggesting words with approximate matching in meanings as well as spellings.  The fuzzy matching technique applies a matching percentage. The database returns possible matches for the queried word between a certain percentage (the threshold percentage) and 100 percent. 21

Scatterplot 22

Conclusion  The fuzzy matching technique based on Top-3 results can predict actual metric values  The prediction of experimental results between control and treatment ranking techniques shows reasonably high accuracy up to 58.5% – significantly higher than chance results (33.3%). 23