Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Similar presentations


Presentation on theme: "Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh."— Presentation transcript:

1 Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh

2 Outline Introduction Modeling Search Activity Study Conclusions

3 Introduction Satisfying searchers’ information needs involves a through understanding of their interests through: - search query - search engine result page (SERP) clicks - post-SERP browsing behavior Construct interest models of the current query which including: - previous queries - previous clicks on SERP Evaluate the predictive effectiveness of these models using future actions

4 Modeling Search Activity Data - The data set contained browser logs with both searching and browsing episodes. - Log entries include a timestamp for each page view, and the URL of the Web page visited - Only in English-speaking United States locale - Search sessions on the Bing Web search engine were extracted

5 Modeling Search Activity ODP Labeling - Represented context a distribution across categories in ODP topical hierarchy. - Provides a consistent topical representation of queries and page visits from which to build the models. - ODP category label can also reflect topical differences in the search results for a query or a user’s interests - Automatic classification skill to assign an ODP category labels to each page. - 219 categories at the top two levels of the ODP hierarchy were used ( called L ) -

6 Modeling Search Activity ODP Labeling - Strategy of labeling a page 1. Begin with URLs present in the ODP 2. Incrementally prunes non-present URLs until a match is found, or miss declared 3. Check for exact match with logistic regression classifier

7 Modeling Search Activity Sources and Source Combinations - ODP labels automatically assigned to the following sources: 1. Query: the top 10 search results for the query 2. SERPClick: the search results clicked by the user during the search session 3. NavTrai: Web pages that the user visits from a SERP click

8 Modeling Search Activity Model Definitions – Query Model(Q) - For each query, the category labels for the top 10 search results were obtained. - Probabilities are assigned to the categories in L by 1. normalized click frequencies for each top 10 results from search-engine click log data 2. the distribution across all ODP category labels - ODP categories in L that are not used to label are assigned the prior probabilities

9 Modeling Search Activity Model Definitions – Context Model(X) - The context model is constructed based on actions which comprise previous data as follows: 1. Queries 2. Web pages visited through a SERP click 3. Web pages visited on the navigational trail following a SERP click

10 Modeling Search Activity

11 Modeling Search Activity Model Definition – Intent Model(I)

12 Modeling Search Activity Relevance Model or Ground Truth (R) - The relevance model contains actions that occur following the current query in the session

13 Modeling Search Activity

14 Study

15 Study

16 Study

17 Study

18 Study Learning Optimal Context Weights Steps 1. Identify the optimal context weight (w) for each query on a held out training set 2. Create features for the query and the context that could be useful in predicting w

19 Study Learning Optimal Context Weights - To create a training set, the query, context, and relevance models were used to compute the optimal context weight per query by minimizing the regularized cross-entropy for each query independently.

20 Study A regularizer that penalizes deviations from w=0.5

21 Study Generating Features of Query and Context - Divide features into three classes: 1. Query class: capturing characteristics of the current query and the query model. 2. Context class: capturing aspects of the pre-query interaction behavior as well as features of the context model themselves. 3. QueryContext: capturing aspects of how the query model and context model compare. - These features were generated for each session in the set and used to train a predictive model

22 Study Generating Features of Query and Context - Query class

23 Study Generating Features of Query and Context - Context class

24 Study Generating Features of Query and Context - QueryContext class

25 study

26 study Predicting the Optimal Context Weight - 60% of those queries for training, 20% for validation, 20% for testing - 10-fold cross validation was performed to improve result reliability. - The folds were constructed by splitting session, so that all queries in a session are used for either training, validation, or testing

27 study

28 study Predicting the Optimal Context Weight The most performant features related to the information divergence to the query models and the context model

29 study Predicting the Optimal Context Weight

30 study

31 study Varying Context and Relevance Information

32 Conclusions A study of investigating the effectiveness of activity-based context in predicting user’s search interests. Explored the value of modeling the current query, its context and their combination, and different sources. Intent models developed from many sources perform best overall. Developed techniques to learn the optimal combinations.


Download ppt "Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh."

Similar presentations


Ads by Google