Mining Query Subtopics from Search Log Data

Slides:



Advertisements
Similar presentations
Beliefs & Biases in Web Search
Advertisements

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Optimizing search engines using clickthrough data
Search Results Need to be Diverse Mark Sanderson University of Sheffield.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Evaluating Search Engine
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Sigir’99 Inside Internet Search Engines: Fundamentals Jan Pedersen and William Chang.
VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
Tag Data and Personalized Information Retrieval 1.
1 Can People Collaborate to Improve the relevance of Search Results? Florian Eiteljörge June 11, 2013Florian Eiteljörge.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Retroactive Answering of Search Queries Beverly Yang Glen Jeh.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Post-Ranking query suggestion by diversifying search Chao Wang.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Accurately Interpreting Clickthrough Data as Implicit Feedback
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Lecture 12: Relevance Feedback & Query Expansion - II
Search User Behavior: Expanding The Web Search Frontier
Query Prediction by Currently-Browsed Web Pages and Its Applications
Personalizing Search on Shared Devices
Detecting Online Commercial Intention (OCI)
Applying Key Phrase Extraction to aid Invalidity Search
Presentation transcript:

Mining Query Subtopics from Search Log Data Hu et al., SIGIR’12 Presented by Baichuan Li

Outline Problem Two observations Approach Results Applications Conclusion

Problem Intention != Query Many queries are ambiguous or multifaceted

Problem

Problem Xbox Manchester Homepage? Online game? Where to buy? Manchester news? Manchester tourism? Manchester weather? Manchester united?

Solution Identifying the major subtopics of a query from the search log data Personalized search Query suggestion Search result presentation Clustering Re-ranking Diversification

Observations from The Logs One subtopic per search (OSS) If a user clicks multiple URLs after submitting a query, then the clicked URLs tend to represent the same subtopic

Observations from The Logs Subtopic clarification by additional keyword (SCAK) Users often add additional keywords (in most cases, one additional keyword) to a query to expand the query in order to clarify its subtopic The URLs clicked after searching both with the original and the expanded queries tend to represent the same subtopic The key word tends to be indicative of the subtopic E.g. harry shum microsoft

One Subtopic per Search Use it as the rule for subtopic identification 10,000 groups of multiclicks of individual queries The rule is accurate when all the URLs within the multi-clicks are about the same sense or face

Accuracy V.S. Frequency

Accuracy v.s. Click Position

Distribution The queries with higher frequencies in search log data are more likely to have multi-clicks

Subtopic Clarification by Additional Keyword The keywords ‘microsoft’ and ‘jr’ can be used to represent the two groups (subtopics) respectively

Query Types ‘Q’: the query is a single phrase ‘Q + W’: ‘Q’ + a keyword ‘W + Q’: a keyword + ‘Q’ ‘Others’

Subtopic Overlap and URL Overlap Randomly selected 500 pairs of queries with the forms ‘Q’ and ‘Q + W’ If subtopics of an expanded query are contained in subtopics of the original query, then there is ‘subtopic overlap’ Check whether two queries share identical clicked URLs (‘URL overlap’)

Distribution The more popular (frequent) a query is, the more likely the rule is applicable

Mining Subtopics

Preprocessing

Clustering Similarity function The similarity function between URLs ui and uj : S1: based on the OSS phenomenon where and denote the vectors of multi-clicks of ui and uj respectively S2: based on the SCAK phenomenon S3: based on string similarities

SCAK Similarity (S2) where and denote the vectors of keywords associated with ui and uj respectively

Data TREC Data: TREC search result diversification track 3 in 2009 DataSetA and DataSetB: queries and URLs randomly sampled from the logs of the commercial search engine

Results

Application: Search Result Reranking The user is first asked which subtopic she is interested in, with the subtopics shown at the top of the results page When the user selects a subtopic, the URLs belonging to the subtopic will be moved to the top (re-ranked) The relative order between URLs inside and outside of the subtopic will be kept

Example

Evaluation Data Results Collect search log data of 20, 000 randomly selected searches Each query has at least two subtopics mined by the method Results The average position of last clicked URLs is 3.41 Assume the cost for the user to check the subtopics and click one of them is 1.0 The average position of last clicked URLs belonging to the same subtopics is 1.80

Conclusion Study the problem of query subtopic mining Discovered two phenomena of user search behavior one subtopic per search Subtopic clarification by additional keyword Design a novel similarity function Applications on search result reranking (and search result clustering) Problem Can only be employed when there is enough search log data