1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Optimizing search engines using clickthrough data

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Precision and Recall.

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.

University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.

MARS: Applying Multiplicative Adaptive User Preference Retrieval to Web Search Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluation David Kauchak cs160 Fall 2009 adapted from:

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.

Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Personalized Search Cheng Cheng (cc2999) Department of Computer Science Columbia University A Large Scale Evaluation and Analysis of Personalized Search.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.

Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.

Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Contextual Ranking of Keywords Using Click Data ICDE`09 Utku Irmak Vadim von Brzeski Vadim von Brzeski Reiner Kraft.

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

1 Date: 2012/9/13 Source: Yang Song, Dengyong Zhou, Li-wei Heal(WSDM’12) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Query Suggestion by Constructing.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

Retroactive Answering of Search Queries Beverly Yang Glen Jeh.

Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.

Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Post-Ranking query suggestion by diversifying search Chao Wang.

More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Web Search Personalization with Ontological User Profile Advisor: Dr. Jai-Ling Koh Speaker: Shun-hong Sie.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.

1 Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31.

Improving Search Relevance for Short Queries in Community Question Answering Date： 2014/09/25 Author ： Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.

Detecting Online Commercial Intention (OCI)

Beliefs and Biases in Web Search

Learning Literature Search Models from Citation Behavior

Enriching Taxonomies With Functional Domain Knowledge

Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.

Preference Based Evaluation Measures for Novelty and Diversity

Presentation transcript:

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG

Outline Introduction Concept Extraction Search Context Discovery – Temporal Cutoff – Query relevance – SERPs Similarity Concept Preference Derivation – Clickthrough – Query Reformulation Personalized Ranking Function Experiment Conclusion 2

Introduction Search engines have become an indispensable tool of people's daily activities – search queries are typically short and ambiguous – limited clues to infer a user's true search intents A framework that utilizes search context to personalize search results 3

4

Concept Extraction A term/phrase c appears frequently in the web-snippets arising from a query q, then c represents an important concept related to q Extracts all the terms and phrases from the web-snippets arising from q and denote a term or a phrase as c i 5 A “web-snippet” denotes the title, summary and URL of a Web page returned by search engines.

Concept Extraction 6 Search engine N snippets Example: Query = “cloud computing” 1000 snippets C 1 = “service” C 2 = “community cloud” 7 snippets contains C 1 5 snippets contains C 2 Support(C 1 ) = 7/1000 * 1 = Support(C 2 ) = 5/1000 * 2 = 0.07 Ɵ = 0.03

7

Search Context Discovery Temporal Cutoff Query relevance SERPs Similarity 8

Temporal Cutoff Temporal cutoffs ranging from 5 minutes to 30 minutes are frequently used to identify session boundaries. 30 minutes cutoffs achieves fairly good performance 9

Query relevance Utilize query reformulation taxonomy to evaluate the relevance between queries Detect a certain type of change between two queries. 10

Query relevance 11

12 horses race -> horse Query reformulation : RemoveWords, SingularAndPlural Single reformulation, damage the query integrity Multiple reformulation, reformulate queries by combining more than one reformulation types Enumerating each combination of reformulation types is infeasible Train SVM model

SERPs Similarity The limitation of taxonomy-based method is that it can’t capture reformulations beyond the predefined taxonomy For each query q, we extract concepts from the top 100 snippets returned by the search engine and then the query is represented as a concept vector C q 13

Search Context Discovery 14

Concept Preference Derivation Clickthrough – Users scan the results in order from top to bottom. – Users at least read the top two result snippets and the first one is much more likely to be clicked on. – Users read one snippet below any they click on. User's examination range of the search results – If a snippet ranked i is the last snippet that a user clicked on the result, the examination range of the result is 1 -> (i + 1). – If no snippet is clicked on the result, the examination range of the result is 1 -> 2. 15

Concept Preference Derivation Assume (s 1,s 2 …) is the ranking of snippets arising from query q, the corresponding concept set for each snippet is denoted as (C 1,C 2 …). Click > q Skip Above: – If s i is clicked while s j is not clicked and i > j, derive a concept preference judgement c a > q c b for concept c a from C i and concept c b from C j. Click > q No-Click Next: – If s i is clicked while s j is not clicked and j = i + 1, derive a concept preference judgment c a > q c b for concept c a from C i and concept c b from C j. Click > q’ No-Click Earlier: – Assume q’ is an earlier query of q within the search context. (s’ 1,s’ 2 …) is the ranking of snippets arising from q’ and the corresponding concept set for each snippet are denoted as (C’ 1,C’ 2 …). If s i is clicked while s’ j is skipped by the user, derive a concept preference judgment c a > q’ c’ b for concept c a from C i and concept c’ b from C’ j. 16

17 Search engine query q Snippet 1 (Concept set C 1 ) Snippet 2 (Concept set C 2 ) Snippet 3 (Concept set C 3 ) Snippet 4 (Concept set C 4 ) Click > q Skip Above: if user clicked Snippet 3, preference judgment C 3 > C 2, C 3 > C 1 Click > q No-Click Next: if user clicked Snippet 3, preference judgment C 3 > C 4 Search engine query’ q’ Snippet 1 (Concept set C 1 ) Snippet 2 (Concept set C 2 ) Snippet 3 (Concept set C 3 ) Click > q’ No-Click Earlier: For q, user clicked Snippet 3 For q’, user clicked Snippet 2 earlier query preference judgment C 3 > q’ C 1

Concept Preference Derivation Query reformulation – Reflect the semantic relation between two consecutive queries – Evaluate the distribution of query reformulation types from a 10K search query log – manually label the reformulation type between queries 18

Concept Preference Derivation Better understand the nature of the five query reformulation types – Conduct an empirical study on the user's Click/Skip behavior before and after query reformulation. – A query is labeled as Click(C) if it results in clickthrough, otherwise, it is labeled as Skip(S). – the C-C pattern is further categorized as C-C(S/D) if the clicked URL is the same/different for the two queries. 19

Repeat Strategy: – Assume q’ is an earlier query of q within the search context and q is the repeated query of q’. (s’ 1,s’ 2 …) is the ranking of snippets arising from q’ and (s 1,s 2 …) is the ranking of snippets arising from q. The combined snippet ranking is denoted as (s c 1,s c 2 …) and s c i is labeled as clicked if s i or s’ i is clicked. – Derive preference using Click > q Skip Above and Click > q No-Click Next from the combined ranking list instead of original ones. 20 S’ 1 S’ 2 S’ 3 q’ S1S2S3S1S2S3 q S1S2S3cS1S2S3c q S-C,C-C(D) : conflicting preference C-C(S) : preference judgment

SpellingCorrection Strategy: – Assume q’ is an earlier query of q within a search context. If q results in clickthrough while q’ results in no clickthrough, only derive preference judgment for q by Click > q Skip Above and Click > q No Click Next 21 S’ 1 S’ 2 S’ 3 q’ S1S2S3S1S2S3 q No click S-C : preference judgment

AddWords Strategy: – Assume q’ is an earlier query of q within a search context. Terms belong to q/q’ are denoted by the set S. If q results in clickthrough, concepts containing terms in S is preferred over those skipped concepts for query q. 22 q’ = “apple” q = “apple juice” S = {juice} If concepts containing “juice”, these concepts are preferred to concepts in skipped snippets. S-C,C-C(D): large proportion, closer to user ‘s information need

RemoveWords Strategy: – Assume q’ is an earlier query of q within a search context. Terms belong to q’/q are denoted by the set S. If q results in clickthrough, clicked concepts is preferred over those containing terms in S for query q. 23 q’ = “apple computer power” q = “apple computer” S = {power} For q, clicked concepts are preferred to those concepts containing “power”. S-C,C-C(D): large proportion, get general information

StripURL Strategy: – Assume q’ is an earlier query of q within a search context. If q is obtained by stripping URL from q’, snippets whose URLs sharing terms with q are preferred by the user 24 Navigational search Capture the use’s preference with respect to URLs rather than concepts Promote results whose URLs share terms with the reformulated query

Personalized Ranking Function Suppose we have input preference judgment over concepts c i and c j for a given query q in the following form: c i > q c j. We write the above preference as a constraint as follows: We first build a concept space, which contains concepts in the preference judgment as well concepts have semantic relation with them in the concept ontology. Then, for each c i arising from q, we create a feature vector (q; ci) =(f 1,f 2,…f n ) based on all concepts in the concept space. 25

26 q : C 1 >C 2, C 1 >C 3 C 4 >C 5 Training w (C 1 and C 4 are important) q : {C 1,C 2,C 3,C 4,C 5,C 6,C 7 } personalized c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 (q,c 1 )1Sim(c 1,c 2 )Sim(c 1,c 3 )Sim(c 1,c 4 )Sim(c 1,c 5 )Sim(c 1,c 6 )Sim(c 1,c 7 ) (q,c 2 )Sim(c 2,c 1 )1Sim(c 2,c 3 )Sim(c 2,c 4 )Sim(c 2,c 5 )Sim(c 2,c 6 )Sim(c 2,c 7 ) (q,c 3 ) (q,c 4 ) (q,c 5 ) (q,c 6 ) (q,c 7 ) R(s 1,s 2,…s n ) S 1 = {c 2,c 3 } P(S 1 ) = w 2 * (q,c 2 ) + w 3 * (q,c 3 )

Personalized Ranking Function The semantic relation between two concepts, sim(c i,c k ) is measured by Pointwise Mutual Information (PMI) 27

Personalized Ranking Function Original ranking of search results arising from q is denoted by R(s 1,s 2,…s n ), where snippet notes the snippet ranked at position i. C i be the set of concepts extracted from snippet s i 28

Experiment Experiment setup Compare the performance of search context algorithm Evaluate strategies – Clickthrough-based – Queryreformulation-based – Combined 29

Experiment setup Experimental data – AOL search query log – Each record contains: User ID Query Timestamp URL that user clicked 30

Compare the performance of search context algorithm The users were asked to perform relevance judgment on the top 100 results for each query by filling in a score for each search result to reflect the relevance of the search result to the query. Define three level of relevancy – Good : relevant – Fair : Unlabeled – Poor : irrelevant Documents rated as “Good” are used to compute the Top-N precisions for different methods. 31

Compare the performance of search context algorithm 32 25~30 mins, the highest F-measure Default temporal cutoff 30min

Compare the performance of search context algorithm 33 default value = 0.75

34 Temporal Cutoff : low precision, high recall. Put relevant and irrelevant queries together Single Reformulation : high precision, low recall. Based on query reformulation, but too strict New Reformulation : good precision, good recall. Multiple reformulation SERPs : high precision, low recall. SERPs and New Reformulation : boost the low recall

Personalization with Clickthrough strategies 35 Joachims-C -> Click > q Skip Above mJoachims-C -> Click > q Skip Above + Click > q No-Click Next Clickthrough -> Click > q Skip Above + Click > q No-Click Next + Click > q’ No-Click Earlier

Personalization with Reformulation strategies 36

Conclusion Using search contexts to facilitate concept-based search personalization. Introduce a new method that discovers search contexts from raw search query log This method captures comprehensive information through one-pass of the query log and has good performance in keeping context coherency and integrity. The proposed strategies is able to capture the user's preference with higher accuracy than existing approaches 37