Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:

Slides:

Advertisements

Similar presentations

Beliefs & Biases in Web Search

Advertisements

Predicting User Interests from Contextual Information

Enhancing Personalized Search by Mining and Modeling Task Behavior

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.

Struggling or Exploring? Disambiguating Long Search Sessions

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Studies of the Onset & Persistence of Medical Concerns in Search Logs Ryen White and Eric Horvitz Microsoft Research, Redmond

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Evaluating Search Engine

Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,

Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:

Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.

Modeling Long-Term Search Engine Usage Ryen White, Ashish Kapoor & Susan Dumais Microsoft Research.

Adapting Deep RankNet for Personalized Search

Evaluating Classifiers

From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluation David Kauchak cs160 Fall 2009 adapted from:

Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.

Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

Presenter: Lung-Hao Lee Nov. 3, Room 310.  Introduction  Related Work  Methods  Results ◦ General Gaze Distribution on SERPs ◦ Effects of Task.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.

Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

Retroactive Answering of Search Queries Beverly Yang Glen Jeh.

Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.

Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.

Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Learning User Behaviors for Advertisements Click Prediction Chieh-Jen Wang & Hsin-Hsi Chen National Taiwan University Taipei, Taiwan.

Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.

Sampath Jayarathna Cal Poly Pomona

Recommender Systems & Collaborative Filtering

Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs Ryen W. White1 Jeff Huang2 1Microsoft Research 1University of Washington.

Evaluation of IR Systems

Collective Network Linkage across Heterogeneous Social Platforms

Personalizing Search on Shared Devices

Struggling and Success in Web Search

Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Presentation transcript:

Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:

Shared Device Search 2011 Census: 75% of U.S. households have computer In most homes that machine is shared between multiple people Search engines use machine identifiers based on cookies, ids, etc. Assumes 1:1 mapping from IDs to people for analysis and personalization Shared devices in households Attributing activity to people (not machines) may improve personalization Some early indications of effectiveness in prior work (Singla et al., 2014)Singla et al., 2014

Is Shared Device Searching Common? Analyzed comScore search data (all engines, en-US) Both machine identifiers and person identifiers (users self-identify) Aside: Within-session shared device search less common: 97% sessions = single user Multi-user (66%) Variations in % machine ids = multi-user with different profile sizes 6 months = 66% 3 months = 57% 1 month = 44%

Handling Shared Device Use Limited current solutions in search engines (user sign-in) However: Requires user effort to sign in, People don’t sign out so their signals mixed Some solutions in other domains, e.g., streaming media Ideally this would happen automatically without user needing to explicitly log in Search activity attribution methods can help with this …

Activity Attribution Challenge Given a stream of data from a machine identifier, attribute observed historic and new behavior to the correct person Related work in signal processing and fraud detection Applications for: Personalization, Advertising, Privacy protection Question: What is upper bound on gain from attribution-based methods? We perform ORACLE study with perfect knowledge of who is searching New query Which user? User 1User 2User 1User 3 {k user clusters} History of search activity on machine “From devices to people: Attribution of search activity in multi-user settings” White et al., WWW2014White et al., WWW2014

Key Contributions Introduce attribution-based personalization (ABP) and estimate its value in ORACLE STUDY (perfect knowledge of who is searching) Show machine vs. person is meaningful for an important application: predicting searchers’ future interests Identify properties of interest models and queries for which ABP is best Learn model to predict when to apply ABP on a per-query basis

Key Contributions Introduce attribution-based personalization (ABP) and estimate its value in ORACLE STUDY (perfect knowledge of who is searching) Show machine vs. person is meaningful for an important application: predicting searchers’ future interests Identify properties of interest models and queries for which ABP is best Learn model to predict when to apply ABP on a per-query basis

Attribution-Based Personalization (ABP) Three phases: Activity attribution and interest model construction for individuals from historic activity Attribution of newly-observed activity to the correct searcher Application of that searcher’s specific interest model for personalization

Building Interest Models Build machine and person interest profiles based on the ODP hierarchy Use result clicks Category distributions can differ between people and machines, e.g., Sports/Tennis largest in machine, but only highest for one searcher (B) Some topics have broad interest, e.g., all searchers are interested in movies  Individualized models could matter Question is how much and when do they matter most and least?

Key Contributions Introduce attribution-based personalization (ABP) and estimate its value in ORACLE STUDY (perfect knowledge of who is searching) Show machine vs. person is meaningful for an important application: predicting searchers’ future interests Identify properties of interest models and queries for which ABP is best Learn model to predict when to apply ABP on a per-query basis

Dataset Two years of comScore logs Divided into two subsets: Model building: 6mo of comScore search logs for model building (Jan13 - Jun13) Evaluation: 1mo immediately following to evaluate predictions (Jul13) Result clicks from each person/machine used to construct interest models Machine click thresholds: MODEL BUILDING: ≥ least 100 clicks EVALUATION: ≥ 15 clicks Interest Model Building Evaluation Time 6 months1 month Per machine or person:

Prediction Task Given a query and interest model, predict ODP categories of next click Vary identifier type and match type Identifier type: Machine- or person-based Match type: All historic activity or on-task activity only On-task search activity: On-task historic activity as clicks associated with queries with at least one non-stopword term in common with current query On-task models more accurately reflects state-of-the-art in personalization (Bennett et al. SIGIR12; Teevan et al. WSDM11)Bennett et al. SIGIR12Teevan et al. WSDM11 Match type Identifier type Machine-basedPerson-based All activityab On-task activitycd

Prediction Task: Evaluation Metrics Precision (P): Did the top predicted label == actual label (1 or 0)? Recall (R): Did actual label appear in prediction? F1 score: Harmonic mean of P and R Reciprocal Rank: If actual label == predicted label, the score assigned was the reciprocal of the prediction rank position 1 ⁄ r, and 0 otherwise Averaged over all queries in evaluation dataset

Evaluation Method Given our evaluation set ()  {timestamp, machine identifier, person identifier, query, {result clicks}} for each query () in : For each identifier type in {machine, person}: For each match type in {all, on-task}: For each ∈ : If identifier type = person: If match type = all:If match type = on-task: Find all historic queries from searcher with ≥ 1 non-stopword terms in common with in the model building data Get clicked results for each of the queries and assign ODP categories to the clicked results Build an interest model () comprising the normalized distribution of ODP categories from the assignment Select top-weighted predicted label in, denoted 1 Compute the effectiveness of the method in relation to the ground truth Average metric values for matchtype across all ∈ to compute the overall performance metrics Obtain all historic queries from the searcher from the model building dataset If identifier type = machine: If match type = all:If match type = on-task: Find all historic queries from machine with ≥ 1 non-stopword terms in common with in the model building data Obtain all historic queries from the machine from the model building dataset

Prediction Results Recall slightly higher for machine  Machine-based models are a superset of the person-based models Focus on machines w/ 2+ users in the rest of our analysis Shared device searching is predictable accurately (White et al., WWW14)White et al., WWW14 Machine-based models are our baselines for each of the two match types Gains in precision, F1, and RR 11-15% in overall perf % for on-task perf. Focus on F1 for remainder of analysis

Key Contributions Introduce attribution-based personalization (ABP) and estimate its value in ORACLE STUDY (perfect knowledge of who is searching) Show machine vs. person is meaningful for an important application: predicting searchers’ future interests Identify properties of interest models and queries for which ABP is best Learn model to predict when to apply ABP on a per-query basis

Impact of Additional Factors Properties of the interest models and query can influence utility of ABP Model Properties Model entropy: Entropy of the interest model (low, medium, high) Relative model size: Fraction of machine-based model Number of searchers on machine Query Properties Click entropy: Diversity of clicks (low, medium, high) Popularity: Frequency of query (low, medium, high) Topic: Top-level ODP category Focus on two highlighted factors (see paper for rest) Control for task effects by focusing on on-task model variants

Impact of Additional Factors Model entropy, i.e., diversity of the category (c) model on the machine (m) Query topic, i.e., top-level ODP category of the top-result for the query When the machine-based model is more diverse, then person-based methods perform better  More benefit from focus Compute the gains differentially based on features of models and the queries, e.g., Topics for which specific users already represented (only small n interested) Others where interests are more broad

Key Contributions Introduce attribution-based personalization (ABP) and estimate its value in ORACLE STUDY (perfect knowledge of who is searching) Show machine vs. person is meaningful for an important application: predicting searchers’ future interests Identify properties of interest models and queries for which ABP is best Learn model to predict when to apply ABP on a per-query basis

Applying Model and Query Properties Train a model to learn when to apply ABP on a per-query basis Featurized properties of the model and the query based on additional factors: 130k evaluation queries from 2.5k people (1k machines) 6mo/1mo build/test, MART-based classifier, 10 fold CV, 100 runs, Compute F1 Labels: Positives: ABP > Machine-level, Negatives: ABP  Machine-level Feature NameDescription MachineModelEntropyEntropy of the interest model constructed from machine activity RelativeModelSizeFraction of machine interest model occupied by classified historic clicks NumberOfSearchersNumber of distinct searchers QueryClickEntropyClick entropy for the query QueryPopularitycomputed based on the held-out Bing search log data QueryTopicTop-level ODP category of the query

Selective Application of ABP Best: 21% ABP, 9% baseline, 70% tied Predict which model best: Strong predictive performance (acc. = 0.918) > marginal baseline (0.791) Top features: MachineModelEntropy (max), RelativeModelSize (0.699 of max), QueryTopic (0.441 of max) ABP performance of 88-96% of the oracle Much better than always applying ABP Demonstrates the benefits of intelligently applying ABP for each query Always apply best Applying prediction in personalization:

Discussion Shared device searching common Oracle study showed clear utility from ABP Focused on click prediction; Other applications need to be examined Need to performance with automated ABP methods Alternative self-identification methods need to be examined (e.g., sign-in) Closer link between people and devices  impact on shared device usage?

Summary and Takeaway Introduced attribution-based personalization, performed oracle study Observe an increased accuracy in future interest predictions (11-19% in the F1-score, depending on match type) by applying this approach Gains vary by model/query properties, with selective application of method Significant opportunities to enhance personalization via tailored models Future work: More (non-oracle) studies with different ABP methods ABP methods for truly personalized ranking and recommendation at scale

Shared Device Searching may Matter Entropy of machine-based interest model increases with the number of searchers on the machine Different distributions in topical categories and interests

Shared Device Searching: Distribution Distribution of users searching Generally one dominant searcher (44-83% of queries) Decreases with other users, but still by far the most active + many other less active searchers