Predicting User Interests from Contextual Information

Slides:



Advertisements
Similar presentations
Beliefs & Biases in Web Search
Advertisements

Enhancing Personalized Search by Mining and Modeling Task Behavior
Google News Personalization: Scalable Online Collaborative Filtering
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Using the Self Service BMC Helpdesk
Struggling or Exploring? Disambiguating Long Search Sessions
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Advanced Web Metrics with Google Analytics By: Carley Brown.
Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.
22 nd User Modeling, Adaptation and Personalization (UMAP 2014) Time-Sensitive User Profile for Optimizing Search Personalization Ameni Kacem, Mohand Boughanem,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Beliefs & Biases in Web Search Ryen White Microsoft Research
Studies of the Onset & Persistence of Medical Concerns in Search Logs Ryen White and Eric Horvitz Microsoft Research, Redmond
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Recognizing User Interest and Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
TwitterSearch : A Comparison of Microblog Search and Web Search
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:
Ryen W. White, Dan Morris Microsoft Research, Redmond, USA {ryenw,
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Post-Ranking query suggestion by diversifying search Chao Wang.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
User Modeling for Personal Assistant
Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs Ryen W. White1 Jeff Huang2 1Microsoft Research 1University of Washington.
Personalizing Search on Shared Devices
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Using Link Information to Enhance Web Page Classification
Presentation transcript:

Predicting User Interests from Contextual Information Ryen W. White, Peter Bailey, Liwei Chen Microsoft Corporation

Motivation Information behavior is embedded in external context Context motivates the problem, influences interaction IR community theorized about context Context sensitive search, user studies of search context User interest models can enhance post-query behavior & general browsing by leveraging contextual info. e.g., personalization, information filtering, etc. Little is known about the value of different contextual sources for user interest modeling

Overview A systematic, log-based study of five contextual sources for user interest modeling during Web interaction Assume user has browsed to URL Evaluate the predictive value of five contexts of URL: Interaction: recent interactions preceding URL Collection: pages that link to URL Task: pages sharing search engine queries with URL Historic: long term interests of current user Social: combined long-term interests those who visit URL Domain is website recommendation not search results

Data Sources Anonymized URLs visited by users of a widely-distributed browser toolbar 4 months of logs (Aug 08 – Nov 08 inclusive): Past: Aug-Sep used to create user histories Present: Oct-Nov used for current behavior and future interests 250K users randomly selected from a larger user pool once most active users (top 1%) were removed Chosen users with at least 100 page visits in Past

Trails and Terminal URLs From logs we extracted millions of browse trails Temporally-ordered sequence of URLs comprising all pages visited by a user per Web browser instance Terminate with 30-minute inactivity timeout A set of 5M terminal URLs (ut) obtained by randomly-sampling all URLs in the trails Terminal URLs demarcate past and future events Task = Learn user interest models from contexts for ut, use those models to predict future user interests

Building User Interest Models Classified context URLs in the Open Directory Project human-edited Web directory (ODP, dmoz.org) Automatically assigned category labels via URL match URL back-off used if no exact match obtained Represent interests as list of ODP category labels Labels ranked in descending order by frequency For example, for a British golf enthusiast, the top of their user interest profile might resemble: ODP Category Labels Frequency Sports/Golf/Courses/Europe/United Kingdom 102 Sports/Golf/Driving Ranges 86 Sports/Golf/Instruction/Golf Schools 63

Selecting Contexts Ingwersen and Järvelin (2005) developed nested model of context stratification representing main contextual influences on people engaged in information behavior Dimensions used Others challenging to model via logs e.g., cognitive and affective state, infra- structures, etc.

Defining Contexts None (ut only): Interest model for terminal URL Interaction (ut-5 … ut-1): Interest model for five Web pages immediately preceding ut Task: Interest model for pages encountered during the same or similar tasks Walk on search engine click graph from ut to queries and then back out to pages

Defining Contexts Collection: Interest model pages linking to ut We obtained a set of in-links for each ut from a search engine index, built model from pages linking to ut Historic: Interest model for each user based on their long-term Web page visit history Social: Interest model from combination of the historic contexts of users that also visit ut What is the effectiveness of different context sources for user interest modeling?

Methodology Found instances of ut in Present set (Oct-Nov 08 logs) Used all actions after ut as source of future behavior Futures specific to each user and each ut Used to gauge predictive value of each context Created three interest models representing future interests (ranked list of ODP labels & frequencies): Short: within one hour of ut Medium: within one day of ut Long: within one week of ut Filtered {ut} to help ensure experimental integrity e.g., no more than 10 ut per user

Methodology Divided filtered {ut} into 10 equally-sized runs Each run contained at most one ut from each user Experimental procedure: For each ut in each run: Build ground truth for short-, medium-, and long-term future interest models Build interest models for different contexts (and combinations) Determine predictive accuracy of each model Used five measures to determine prediction accuracy P@1, P@3, Mean Reciprocal Rank, nDCG, and F1 F1 tracked well with others - focus on that here

Findings – Context comparison Predictive performance of contextual sources for different futures Interaction context & Task context most predictive of short-term interests Task context most predictive of medium-term interests Historic context most predictive of long-term interests

Findings – Handling near misses Near miss between prediction and ground truth regarded as total miss Use one/two/three-level back-off on both ground truth and prediction Back-off to top two ODP levels No back-off

Findings – Improved confidence Basing predictions & ground truth on small # page visits may skew results Repeat experiment & ignore labels based on < 5 page visits Predicted and ground truth labels based on ≥ 5 pages No back-off

Findings – Combining contexts Rank Short Medium Long Sources F1 score 1 n, i, t, h, s, c 0.72** 0.53** n, i, t, s, h, c 0.45** 2 n, i, s, h, c 0.71** n, i, t, h, c 0.52** 0.43** 3 n, i, t 0.49** 0.43* 4 n, i, h, c 0.48* s, h 5 n, i, s, t, c 0.69** n, i, h, t n, i, s, h, t 0.42* Overlap beats single contextual sources Key contexts still important Short = Interaction (i) and Task (t) Medium = Task (t) Long = Historic (h) Supports polyrepresentation theory (Ingwersen, 1994) Overlap between sources boosts predictive accuracy

Summary of Findings Performance of context dependent on distance between ut and end of prediction window Short-term interests predicted by task/interaction contexts Topical interest may not be highly dynamic, even if queries and information needs are Medium-term interests best predicted by task context More likely to include task variants appearing in next day Long-term interests predicted by historic/social contexts Interest may be invariant over time, users visiting same pages may have similar interests Overlap effective - many contexts reinforce key interests

Conclusions and Take-away Systematic study of context for user interest modeling Studied predictive value of five context sources Value varied with duration of prediction Short: interaction/task, Medium: task, Long: historic/social Overlap was more effective than any individual source Source must be tailored to modeling task Search/recommendation systems should not treat all contextual sources equally Weights should be assigned to each source based on the nature of the prediction task