Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Slides:



Advertisements
Similar presentations
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Advertisements

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Personalized Search & User Modeling ChengXiang Zhai.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Personalization and Search Jaime Teevan Microsoft Research.
Information Retrieval Models: Probabilistic Models
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on:  User's Existing Private Data Browsing History s Recent Documents 
Modern Information Retrieval
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Maximum Personalization: User-Centered Adaptive Information Retrieval ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Personalized Search Xiao Liu
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Chapter 6: Information Retrieval and Web Search
UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engine Architecture
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
User Characterization in Search Personalization
Search Engine Architecture
SIS: A system for Personal Information Retrieval and Re-Use
Information Retrieval Models: Probabilistic Models
Modeling Diversity in Information Retrieval
Author: Kazunari Sugiyama, etc. (WWW2004)
John Lafferty, Chengxiang Zhai School of Computer Science
Introduction to Information Retrieval
Search Engine Architecture
Retrieval Performance Evaluation - Measures
Presentation transcript:

Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

2 Current Search Engines are Mostly Document-Centered… Documents Search Engine... Search is generally non-personalized… … …

3 Example of Non-Personalized Search As of Oct. 17, 2005 Query = Jaguar Car Software Animal Without knowing more about the user, it’s hard to optimize…

Therefore, personalization is necessary to improve the existing search engines. However, many questions need to be answered…

5 Research Questions Client-side or server-side personalization? Implicit or explicit user modeling? What’s a good retrieval framework for personalized search? How to evaluate personalized search? …

6 Client-Side vs. Server-Side Personalization So far, personalization has mostly been done on the server side We emphasize client-side personalization, which has 3 advantages: –More information about the user, thus more accurate user modeling (complete interaction history + other user activities) –More scalable (“distributed personalization”) –Alleviate the problem of privacy

7 Implicit vs. Explicit User Modeling Explicit user modeling –More accurate, but users generally don’t want to provide additional information –E.g., relevance feedback Implicit user modeling –Less accurate, but no extra effort for users –E.g., implicit feedback We emphasize implicit user modeling

8 “Jaguar” Example Revisited Suppose we know: 1.Previous query = “racing cars” 2.“car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days 3. User just viewed an “Apple OS” document All the information is naturally available to an IR system

9 Remaining Research Questions Client-side or server-side personalization? Implicit or explicit user modeling? What’s a good retrieval framework for personalized search? How to evaluate personalized search? …

10 Outline A decision-theoretic framework UCAIR personalized search agent Evaluation of UCAIR

Implicit user information exists in the user’s interaction history. We thus need to develop a retrieval framework for interactive retrieval…

12 Modeling Interactive IR Model interactive IR as “action dialog”: cycles of user action (A i ) and system response (R i ) User action (A i )System response (R i ) Submit a new queryRetrieve new documents View a documentPresent selected document Rerank unseen documents

13 Retrieval Decisions User U: A 1 A 2 … … A t-1 A t System: R 1 R 2 … … R t-1 Given U, C, A t, and H, choose the best R t from all possible responses to A t History H={(A i,R i )} i=1, …, t-1 Document Collection C Query=“Jaguar” All possible rankings of C Best ranking for the query Click on “Next” button All possible rankings of unseen docs Best ranking of unseen docs R t  r(A t ) R t =?

14 Decision Theoretic Framework User: U Interaction history: H Current user action: A t Document collection: C Observed All possible responses: r(A t )={r 1, …, r n } User Model M=(S,  U …) Seen docs Information need L(r i,A t,M)Loss Function Optimal response: R t (minimum loss) ObservedInferred expected risk

15 Approximate the expected risk by the loss at the mode of the posterior distribution Two-step procedure –Step 1: Compute an updated user model M* based on the currently available information –Step 2: Given M*, choose a response to minimize the loss function A Simplified Two-Step Decision-Making Procedure

16 Optimal Interactive Retrieval User A1A1 UC M* 1 P(M 1 |U,H,A 1,C) L(r,A 1,M* 1 ) R1R1 A2A2 L(r,A 2,M* 2 ) R2R2 M* 2 P(M 2 |U,H,A 2,C) A3A3 … Collection IR system

17 Refinement of Decision Theoretic Framework r(A t ): decision space (A t dependent) –r(A t ) = all possible rankings of docs in C –r(A t ) = all possible rankings of unseen docs M: user model –Essential component:  U = user information need –S = seen documents L(r i,A t,M): loss function –Generally measures the utility of r i for a user modeled as M P(M|U, H, A t, C): user model inference –Often involves estimating  U

18 Case 1: Non-Personalized Retrieval –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –p(M|U,H,A t,C) = p(  U |Q)

19 Case 2: Implicit Feedback for Retrieval –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,A t,C) = p(  U |Q,H) Implicit User Modeling

20 Case 3: More General Personalized Search with Implicit Feedback –A t =“enter a query Q” or “Back” button, “Next” link –r(A t ) = all possible rankings of unseen docs in C –M= (  U, S), S= seen documents –H={previous queries} + {viewed snippets} –p(M|U,H,A t,C) = p(  U |Q,H) Eager Feedback

21 Benefit of the Framework Traditional view of IR –Retrieval  Match a query against documents –Insufficient for modeling personalized search (user and the interaction history are not part of a retrieval model) The new framework provides a map for systematic exploration of –Methods for implicit user modeling –Models for eager feedback The framework also provides guidance on how to design a personalized search agent (optimizing responses to every user action)

The UCAIR Toolbar

23 UCAIR Toolbar Architecture ( Search Engine (e.g., Google) Search History Log (e.g.,past queries, clicked results) Query Modification Result Re-Ranking User Modeling Result Buffer UCAIR User query results clickthrough…

24 Decision-Theoretic View of UCAIR User actions modeled –A 1 = Submit a keyword query –A 2 = Click the “Back” button –A 3 = Click the “Next” link System responses –r(A i ) = rankings of the unseen documents History –H = {previous queries, clickthroughs} User model: M=(X,S) –X = vector representation of the user’s information need –S = seen documents by the user

25 Decision-Theoretic View of UCAIR (cont.) Loss functions: –L(r, A 2, M)= L(r, A 3, M)  reranking, vector space model –L(r,A 1,M)  L(q,A 1,M)  query expansion, favor a good q Implicit user model inference –X* = argmax x p(x|Q,H), computed using Rocchio feedback –S* = all seens docs in H Vector of a seen snippet Newer versions of UCAIR have adopted language models

26 UCAIR in Action In responding to a query –Decide relationship of the current query with the previous query (based on result similarity) –Possibly do query expansion using the previous query and results –Return a ranked list of documents using the (expanded) query In responding to a click on “Next” or “Back” –Compute an updated user model based on clickthroughs (using Rocchio) –Rerank unseen documents (using a vector space model)

27 Screenshot for Result Reranking

28 A User Study of Personalized Search Six participants use UCAIR toolbar to do web search Topics are selected from TREC web track and terabyte track Participants explicitly evaluate the relevance of top 30 search results from Google and UCAIR

29 UCAIR Outperforms Google: Precision at N Docs Ranking Method Google UCAIR Improvement8.0%17.8%20.2%21.8% More user interactions  better user models  better retrieval accuracy

30 UCAIR Outperforms Google: PR Curve

31 Summary Propose a decision theoretic framework to model interactive IR Build a personalized search agent for the web search Do a user study of web search and show that UCAIR personalized search agent can improve retrieval accuracy

32 Thank you ! The End

Current search engines are very useful, but far from optimal For one thing, they don’t really know about you…

34 IR as Sequential Decision Making User System A 1 : Enter a query Which documents to present? How to present them? R 1 : Present search results Which documents to view? A 2 : View a document Which part of the document to show? Other documents? R 2 : Present document content Rerank other documents More result to view? A 3 : Click on “Next” button (Information Need) (Model of Information Need)

35 Case 4: User-Specific Result Summary –A t =“enter a query Q” –r(A t ) = {(D,  )}, D  C, |D|=k,  {“snippet”,”overview”} –M= (  U, n), n  {0,1} “topic is new to the user” –p(M|U,H,At,C)=p(  U,n|Q,H), M*=(  *, n*) n*=1n*=0  i =snippet 10  i =overview 01 Choose k most relevant docs If a new topic (n*=1), give an overview summary; otherwise, a regular snippet summary

36 User Models Components of user model M –User information need –User viewed documents S –User actions A t and system responses R t-1 –…

37 Loss Functions Loss function for result reranking Loss function for query expansion

38 Implicit User Modeling Update user information need given a new query Learn better user models given skipped top n documents and viewed the (n+1)-th document

39 Case 1: Context-insensitive IR –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –p(M|U,H,At,C)=p(  U |Q)

40 Case 2: Context-sensitive IR –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

41 Case 3: General Context-sensitive IR –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all possible rankings of unseen docs in C –M= (  U, S), S= seen documents –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

42 System Characteristics Client side personalization Implicit user modeling and eager feedback Bayesian decision theory as a guide

43 Main Idea: Putting the User in the Center Search Engine “jaguar” Personalized search agent WEB Search Engine Search Engine Desktop Files Personalized search agent “jaguar”... Viewed Web pages Query History A search agent can know about a particular user very well

44 User-Centered Adaptive IR (UCAIR) A novel retrieval strategy emphasizing – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval Implemented as a personalized search agent that –sits on the client-side (owned by the user) –integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) –collaborates with each other

45 Challenges in UCAIR What’s an appropriate retrieval framework for UCAIR? How do we optimize retrieval performance in interactive retrieval? How do we develop robust and accurate retrieval models to exploit user information and search context? How do we evaluate UCAIR methods? ……

46 Non-Personalized Search Jaguar Car Apple Software Animal Chemistry Software

47 Other Context Info: Dwelling time Mouse movement Clickthrough Query History Possibility of Personalized Search Apple software …