Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Web Mining.
Google News Personalization: Scalable Online Collaborative Filtering
Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Optimizing search engines using clickthrough data
Analysis and Modeling of Social Networks Foudalis Ilias.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Evaluating Search Engine
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007.
Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang and Jimeng Sun Institute of Automation Chinese Academy of Sciences, IBM Research.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Structured Approach to Query Recommendation With Social Annotation Data 童薇 2010/12/3.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Algorithmic Detection of Semantic Similarity WWW 2005.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Hongbo Deng, Michael R. Lyu and Irwin King
Post-Ranking query suggestion by diversifying search Chao Wang.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Data mining in web applications
Compact Query Term Selection Using Topically Related Text
Postdoc, School of Information, University of Arizona
Mining Query Subtopics from Search Log Data
Presentation transcript:

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao

Outline Introduction Structured Approach to Query Recommendation Experimental Results Conclusions

Introduction In existing work, query recommendation mainly aims to provide alternative queries which typically stick to user’s search intent.

Introduction In this way, query recommendation is mostly limited to help users to refine their queries and find what they need more quickly.

Introduction Search users may also have some vague interests, named as exploratory interests, which users are unaware of until they are faced with one. If we can provide recommendations from exploratory interests, we can attract users to make more clicks on recommendations and thus spend more time on search engines. It will also increase the advertisement revenue for service providers.

Introduction Previous work on query recommendation mostly focused on search logs, which merely capture the interactions between search users and search engines. Click-through data Agglomerative clustering of a search engine query log. (KDD’ 00) Query recommendation using query logs in search engines. (EDBT’04) Query suggestion using hitting time. (CIKM’08) Search query sessions Generating query substitutions (www’06) Context-aware query suggestion by mining click-through and session data. (KDD’08) Query frequency Exploring distributional similarity based models for query spelling correction (ACL’06) The wild thing goes local. (SIGIR’07) Are commonly employed to conduct query recommendation.

Introduction In this paper, we first introduce social annotation data into query recommendation as an additional resource. Social annotation data is basically a collection of meta- data, or social tags, on URLs. Social tags are normally keywords created by millions of web users according to the content of pages referred by the URLs. We can leverage the social tags to infer what people might think when reading the pages and more precisely predict the exploratory interests of users.

Search interests and exploratory interests Search interests are explicit information needs when people do search. Exploratory interests are just vague or delitescent interests which can be provoked when people consume search results.

Verify the statement We made use of one-week search log data to study the shift of user interests during a search session. Specifically, we collected the next queries submitted after an initial query within a short interval (10mins), and analyzed the causality between the initial query and the consequent queries. The results indicate that the formulation of the next queries is not only determined by the initial query but also significantly affected by the clicks on the search results.

The most frequent queries from real users issued after initial query “iphone”

Conduct statistical tests to verify whether the existence of exploratory interests is common and significant. Statement: Given an initial query, the overall next queries and the next queries with clicks on a specific URL were generated from different underlying distributions as opposed to the same distribution. These distributions can be estimated using Maximum Likelyhood method. A statistical Comparison of Tag and Query Logs.(SIGIR’09)

In 80.9% of cases, clicks indeed affect the formulation of the next queries. In 43.1% of the cases, users would issue different next queries if they clicked on different result URLs. These two tests prove that the existence of exploratory interests is common and significant, and users’ interests will be affected by what they have read during a search session.

Query formulation model Typically, the user begins with an initial query which represents his information needs. After scanning the search results, he may decide to refine the query to better represent his information needs, and thus issues a next query. Alternatively, he may read web pages from search results or even follow the links to browse some other web pages before he comes up with a next query.

Query formulation model In order to describe the proposed query formulation model, we further introduce the social annotation data as an important resource for expressing the exploratory interests in this model. Missing part

Construction of Q-U-T Relation Graph The forward transition from query to tags imitates the process of reading web pages related to a query and identifying some exploratory interests. The backward transition from tags to queries imitates the process of thinking of next queries from interests.

The query relation graph here is a weighted directed graph, with the edge weight from query node qi to qj defined by the transition probability under the above process as follows. The probabilities are defined in the form Define the probability from tags to URLs as a uniform distribution. Where denotes the degree of the node

Rank with hitting time Based on the query relation graph, we can apply a Markov random walk on the graph and employ hitting time as a measure to rank queries. The hitting time of a random walk is the expected number of steps before node j is visited starting from node i.

Clustering with Modularity Introduce the “modularity function” Q, which measures the quality of a particular clustering of nodes in a graph. It is defined as: where is the fraction of edges within cluster i, and is the fraction of all ends of edges that are attached to nodes in cluster i.

Clustering with Modularity Discovering the underlying clustering of a graph becomes a process to optimize the value of modularity over all possible clustering of the graph.

Clustering with Modularity With clusters in hand, we can label each cluster with social tags for better understanding. We can leverage the expected tag distribution over the URLs clicked by query q i to approximate q i ’s interests. Then we can approximate cluster c j ’s interests by the average tag distribution. denotes the size of the j-th cluster.

Experimental Results Data Set: For query log, “Spring 2006 Data Asset” distributed by Microsoft Research, which contains 15 million records. After data cleaning, we obtained about 2.7 million unique queries and about 4.2 million unique URLs. For social annotation data, we collected a sample of Delicious data. The data set consists of over 167 million taggings, which include about 0.83 million unique users, 57.8 million unique URLs and 5.9 million unique tags.

Experimental Results Construct a Query-URL-Tag tripartite graph based on these two data sets by connecting queries and tags through URLs.(1.3 million nodes and 3.8 million edges) Obtain a query relation graph with 538,547 query nodes. We randomly sampled 300 queries for testing.

Baseline Methods BiHit: Based on the basic query formulation model and only leverage query logs(Query-URL bipartite graph) to construct a query relation graph. Employ hitting time to generate query recommendations. TriList: Employ our proposed hitting time algorithm to rank queries, and present the top k recommendations in a simple flat list.

Examples of Recommendation Results The recommendations generated by BiHit method closely stick to users’ search interests. Both the TriList and TriStructure methods can bring in some recommendations with diverse topics which are related but not close to user’s search queries.

Evaluation of Recommendation Results Human judge will label for each recommendation how likely he would like to click it given the search result list. A 6-point scale(0,0.2,0.4,0.6,0.8,1) is defined to measure the click willingness. This evaluation is largely different from traditional evaluation for query recommendation where judges are required to label the similarity between recommendations and the original query. Increasing the number of clicks on recommendations can also be considered as an essential goal of query recommendations.

Evaluation of Recommendation Results Each method’s recommendations on each query will be labeled by three judges. Each method will be labeled by all the judges on different query sets. Each query will be labeled by all the judges on different methods. No judge will label the recommendations for a same query generated by different methods, thus we can reduce the bias arose by the labeling order of different methods.

Evaluation of Recommendation Results Given a query q, let denote the k recommendations generated by a certain approach, and denote the corresponding label scores on these recommendations. Define three measures for a query q. CRN ~ Clicked Recommendation Number CRS ~ Clicked Recommendation Score TRS ~ Total Recommendation Score

Evaluation of Recommendation Results We can improve both the click rate and click willingness on recommendations by recommending queries with both search and exploratory interests. By using a structured approach, the improvements become more significant.

Evaluation of Recommendation Results Both TriList and TriStructure methods receive more labels at each specific non-zero score level than BiHit method. Although there are not many recommendations which users absolutely would like to click, the number of score 1 recommendations under Tristructure method reaches about 3 times of that under BiHit method.

How structure helps We want to figure out the changes of users’ click pattern and click willingness on each recommendation for a given query under TriList and TriStucture. We only focus on the clicks on the recommendations (non- zero labels) but not care about the specific label scores. Compare the click entropy of these two methods, which measures how users’ click distribute over the clusters of recommendations., is the number of clicks in the i-th cluster of the given query’s recommendations.

How structure helps The average click entropy under TriStructure method is higher than TriList method on most queries. A higher click entropy means that users’ clicks are more scattered among clusters.

How structure helps Further analyze how the structured approach affects users’ click willingness. Focus on the score range between 0 and 0.6, since 96.58% of recommendations’ average label scores are within this area. The color represents the number of such recommendations. More points(75.2%) are close to the y-axis in the graph. It shows that for the same recommendation of a query, more of them will receive a higher label score under TriStructure method than under TriList method.

Conclusion Introduce the social annotation data as an important resource for such recommendation. Conduct a Q-U-T graph from query logs and social annotation data. Based on the query relation graph, employ hitting time to rank possible recommendations, leverage a modularity based approach to group recommendations into clusters, and label each cluster with tags. Experimental results indicate the method can better satisfy users interests and significantly enhance user’s click behavior on recommendations.

Thanks!