Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Information Retrieval in Practice
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Tag-based Social Interest Discovery
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.
 Person Name Disambiguation by Bootstrapping SIGIR’10 Yoshida M., Ikeda M., Ono S., Sato I., Hiroshi N. Supervisor: Koh Jia-Ling Presenter: Nonhlanhla.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
LOGO 1 Mining Templates from Search Result Records of Search Engines Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hongkun Zhao, Weiyi.
NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Post-Ranking query suggestion by diversifying search Chao Wang.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Topical Clustering of Search Results Date : 2012/11/8 Resource : WSDM’12 Advisor : Dr. Jia-Ling Koh Speaker : Wei Chang 1.
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
2016/3/11 Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chu.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Information Retrieval in Practice
Click Through Rate Prediction for Local Search Results
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Query Prediction by Currently-Browsed Web Pages and Its Applications
Mining Query Subtopics from Search Log Data
Intent-Aware Semantic Query Annotation
Presentation transcript:

Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu

Outline Introduction Two Phenomena Clustering Method Experiments Applications Conclusion

Introduction Understanding the search intent of users is essential for satisfying a user’s search needs. The intents of a query Its search goals Semantic categories or topics Subtopics

Motivation Most queries are ambiguous or multifaceted. Ambiguous: “Harry Shum” American actor A vice president of Microsoft Other person Multifaceted: “Xbox” Online game Homepage Marketplace

Goal They aim to automatically mine the major subtopics (senses and facets) of queries from the search log data. 2 Clustering Method 1)Preprocessing 2)Clustering 3)Postprocessing 1 Two Phenomena 1)“one subtopic per search” (OSS) 2)“subtopic clarification by additional keyword”(SCAK)

Outline Introduction Two Phenomena One Subtopic per Search Subtopic Clarification by Additional Keyword Clustering Method Experiments Applications Conclusion

One Subtopic per Search Each group of URLs actually corresponds to one sense URL 1 URL 3 URL 5 URL 2 URL 4

One Subtopic per Search 1) Rational users and not randomly click on search results. 2) Usually have one single subtopic in mind. Multi-clicks in search logs of ‘harry shum’ Accuracy of rule v.s. click position

One Subtopic per Search Accuracy of rule v.s. number of clicks (User) Accuracy of rule v.s. frequency (Group) Conclusion : The phenomenon of one subtopic per search can help query subtopic mining for head queries.

Subtopic Clarification by Additional Keyword 1) Search users are rational. 2) Add additional keywords to specify the subtopics Search logs of ‘harry shum’ ignoring click frequency Distribution of Query Types (randomly select 1000 queries)

Subtopic Clarification by Additional Keyword Relation of subtopic overlap and URL overlap between query and expanded query pair Subtopic overlap If subtopics of an expanded query are contained in subtopics of the original query URL overlap Two queries share identical clicked URLs None URL and None subtopic Ex : ‘beijing’ and ‘beijing duck’, ‘fast’ and ‘fast food’

Outline Introduction Two Phenomena Clustering Method Experiments Applications Conclusion

Clustering Method A clustering method to mine subtopics of queries leverage the two phenomena and search log data. The flow of clustering method

Preprocessing(Indexing) An index consists of a prefix tree and a suffix tree Prefix : query ‘Q’, expanded queries ‘Q+W’ Suffix : query ‘Q’, expanded queries ‘W+Q’ They can easily find the expanded queries of any query

Preprocessing(Pruning) If a query ‘Q’ doesn’t have URL overlap with its expanded queries, then remove the false expanded queries by using a heuristic rule. For example ‘fast food’ and ‘fast’ ‘hot dog’ and ‘dog’ Q Q+W W+Q A child node will be pruned.

Clustering

q1 q2 q3 q4 q Ex : “ Shum” Based on the slash symbols Features : Baseline, URI Components, Length, etc. Segment a URL into tokens t1 t2 t3 t4 t

Clustering

Postprocessing The clusters which consist of only one URL are excluded. Each cluster represents one subtopic of the query Extract keywords from the expanded queries and assign them to the corresponding cluster as subtopic labels

Outline Introduction Two Phenomena Clustering Method Experiments on Accuracy Applications Conclusion

Experiments on Accuracy Three data sets Setting Parameter tuning : 1/3 of DataSetA Evaluation : 2/3 of DataSetA + the entire TREC After several rounds of tuning, α, β, γ, and θ were 0.35, 0.4, 0.25, and 0.3,respectively

Experiments on Accuracy Result Due to the sparseness of the available data.

Outline Introduction Two Phenomena Clustering Method Experiments Applications Conclusion

Search Result Clustering Offline: Online: Query subtopic mining result database query Paper’s method subtopics Seed clusters not belong to any of the mined subtopics Cosine similarity using the TFIDF of terms in titles and snippets the existing clusters or create new clusters

Search Result Clustering Accuracy comparison between new method and baseline Accuracy comparison from various perspectives The overall improvement is about 28%

Search Result Re-Ranking Example of search result re-ranking Evaluation the user to check the subtopics and click one of them the average position of last clicked URLs the average position of last clicked URLs belonging to the same subtopics

Outline Introduction Two Phenomena Clustering Method Experiments Applications Conclusion

Conclusion Two phenomena of user search behavior can be used as signals to mine major senses and facets of ambiguous and multifaceted queries. The clustering algorithm can effectively and efficiently mine query subtopics on the basis of the two phenomena. To investigate the use of other features to further improve the accuracy. Other existing algorithms can be applied as well. They can be useful in other applications as well.

Thanks for your listening