1 Date: 2012/9/13 Source: Yang Song, Dengyong Zhou, Li-wei Heal(WSDM’12) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Query Suggestion by Constructing.

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
A Machine Learning Approach for Improved BM25 Retrieval
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Evaluating Search Engine
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
1 Hierarchical Tag visualization and application for tag recommendations CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Algorithmic Detection of Semantic Similarity WWW 2005.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Post-Ranking query suggestion by diversifying search Chao Wang.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Evaluation of IR Systems
Struggling and Success in Web Search
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Presentation transcript:

1 Date: 2012/9/13 Source: Yang Song, Dengyong Zhou, Li-wei Heal(WSDM’12) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Query Suggestion by Constructing Term-Transition Graphs

2 Outline  Introduction  User preference from log  Query suggestion Topic-based Pagerank model (topic level) Learning to Rank model (term-level)  Experiment  Conclusion

3 It’s difficult for search engines to fully understand user’s search intent in many scenarios. Query suggestion techniques aim at recommending a list of relevant queries to user’s input. User query reformulation activities 1)Adding terms after the query 2)Deleting terms within the query 3)Modifying terms to new terms 1) stanford university location 2) stanford university address 3) stanford university map → stanford university location Search engine logs

4 four user activities in a session: {q1, URL1, q2, URL2} stanford university stanford university location satisfy reformulate q1 q2 URL1 URL2

5  How to extract high quality user clicks in the user session and use the clicks properly? it is critical to accurately infer user intent by examining the entire it is critical to accurately infer user intent by examining the entire user search session. user search session.  Given a specific query, which query suggestion method should be used? Can query reduction method be applied to short queries? Can query reduction method be applied to short queries? Is query expansion always better than query reformulation? Is query expansion always better than query reformulation?

5 Step 2 Step 3 Step 1 Derive high quality user clicks by extracting specific user activity patterns which conveys implicit user preferences. Ex : tuple {q1, q2, url} Construct a term transition graph model from the data. Node: term in the query Edge: a preference find user preferences within each topic and choose the best suggestion method according to the models.

Outline IIntroduction UUser preference from log QQuery suggestion Topic-based Pagerank model (topic level) Learning to Rank model (term-level) EExperiment CConclusion 6

7 A typical log entry: User identification number URL that the user visited Timestamp of the visit Total dwell time on that URL organize based on sessions Contain : a series of URL visits from a particular user, ordered by the timestamp. Statistics: (focus on mining user preference in this paper)  users only modify one of the terms in the queries: ≥76%  users are much more likely to revise the last term in the query : ≥80%

8 Three cases of user query refinements 3-month Toolbar logs between May 2010 and August In total, over 4 million user refinement activities were discovered in the log. A total of 350,000 pairs of refinements were found.

9 Outline IIntroduction UUser preference from log QQuery suggestion Topic-based Pagerank model Learning to Rank model (term-level) EExperiment CConclusion

10 term transition graph M matrix: Rank (t) matrix: Initial rank value Mij = 1/N(j) if there is a transition from node j to node i, with N(j) being the total number of outlinks of node j.

11 Arts Business ComputersCom, , Home, House, Login, Page Games Health Home Kids and teens News Recreation Reference Regional Science Shopping Society Sports World P computers,com =1/6 p computers, =1/6 p computers,home = 1/6 p computer,house = 1/6 p computers,login = 1/6 p computers,page = 1/6 p matrix:

Arts Business ComputersCom, , Home, House, Login, Page Games Health Home Kids and teens News Recreation ReferenceCom, , Login, Page Regional Science Shopping Society Sports World P reference,com =1/4 p reference, =1/4 p reference,home = 0 p reference,house = 0 p reference,login = 1/4 p reference,page = 1/4 p matrix: 12

13 α :0.5 (1-0.5)x+0.5 x Rank (t) Rank (t+1) com home house login page Rank (t+2)

14 Query Suggestion Given a query qi with m terms {w i1,..., w ij,..., w im } Stanford university mapStanford university location w iJ :map w’ iJ :location Query (q) New Query (q’) P(w iJ → w’ iJ |T 1 )=0.3[Pagerank value] P(T 1 | q’)= 6/10 Only consider the top-10 URLs returned by a search engine into ODP categories T 1 :Regional T 2 :Art T 3 :Computers …. T 16 : P(w iJ → w’ iJ |q’)=0.3* *0.4 =0.38 Query Term refinement probability:ggestion P(w iJ → w’ iJ |T 2 )=0.5[Pagerank value] P(T 2 | q’)= 4/10

Arts Business ComputersCom, , Home, House, Login, Page Games Health Home Kids and teens News Recreation ReferenceCom, , Login, Page Regional Science Shopping Society Sports World P reference,com = 1 p reference, = 1 p reference,home = 0 p reference,house = 0 p reference,login = 1 p reference,page = 1 p matrix: 16

15 Stanford university mapStanford university location Query (q) New Query (q’) Stanford university address Stanford colleage location Harvard university location P(q’| q)=P(map)*P(map → location|q) =4/16* 0.04=0.01 P(q’| q)=P(map)*P(map → address|q) =4/16* 0.02=0.005 Stanford school location

16 § : Suggestions from using expansion gets a very low score

17 Outline IIntroduction UUser preference from log QQuery suggestion Topic-based Pagerank model(topic level) Learning to Rank model (term-level) EExperiment CConclusion

18 Given a set of queries and labeled relevance score (usually in 5- scale) for each query pair. learning-to-rank tries to optimize for the ranking loss for all queries in the training set. Three sets of features: Query Features Is the query a Wikipedia title? ∈ {0, 1} Is the query a Wikipedia category? ∈ {0, 1} # of times query contains in Wikipedia title ∈ R # of times query contains in Wikipedia body ∈ R Term Features Pagerank score of the term # of inlinks & outlinks entropy of Pagerank score in 16 ODP topics: −∑i P(ti) log P(ti) # of times derived from EMPTY node Query-Term Features N-gram conditional probabilities, p(wn|wn−m+1,..., wn−1) Inverted Query Frequency: logN(ti)/N(ti|qj )

19 Query-Term Features : N-gram conditional probabilities ( P(term|N-gram) ) Stanford universitymap query User log1 {map,Stanford map,click URL1} User log2 {Harvard university map,Stanford university map,click URL2} User log3 {Stanford university map,Harvard university map,click URL3} User log4 {Stanford university,Harvard university,click URL4} User log5 {university map,Harvard university map,click URL5} term

20 Generate ranking labels for each group of query refinements Training labels automatically from implicit user feedbacks in the log. ─ Training tuple {q 1, q 2, click url } ─ clicks and skips ( α, β ) ─ given a refined query q’ for the original query q probability of click distribution using a function ─ compare two query refinements q 1 and q 2 P(stanford admission > stanford map) = 0.61 P(stanford map > stanford history) = 0.08 Threshold:0.5

21 query1 term 1Feature 1,F2,F3…F302 term 2 """""""""" 1 term 3 """"""""""" 3 query2 query Training data Build model Build model query1 term 1Feature 1,F2,F3…F30 term 2 """""""""" term 3 """"""""""" query2 query Test data …. rank Predict Ranking score

Given a new query q that contains k terms: {t 1,..., t k } Create a candidate set of queries Ex: query{t1, t2, t3} q 1 q 2 q 3 q 4 the candidate set contains {t1, t2}, {t1, t3},{t2, t3}and {t1, t2, t3}. The highest scored terms are suggested for the query {t1, t2} term 1Feature 1,F2,F3…F term 2 """""""""" term 3 """"""""""" {t1, t3} {t2, t3} {t1, t2, t3} Predict Ranking score {t1, t2}+term3 → {t1, t2,term3 } 22

23 Outline IIntroduction UUser preference from log QQuery suggestion Topic-based Pagerank model(topic level) Learning to Rank model (term-level) EExperiment CConclusion

24  1) Performance of the two proposed models 2) Calibrate the parameters of methods  user study and ask judgers to evaluate the query suggestion performance of metohds 1)Normalized discounted cumulative gain (NDCG) use the top-5 suggested queries for evaluation Relevance score(rel): Perfect (5), Excellent (4), Good (3), Fair (2) and Poor(1) 2)Zero/one-error Metrics:

Stanford university Query Top iQuery suggestionrel 1Query+Term15 2Query+Term23 3Query+Term34 4Query+Term43 5Query+Term52 Query suggestion

26 Zero/one-error Rank → Train Term1Term2Term3Term4Term5 Test Term1Term4Term3Term5Term2 Query suggestion( Top 5) Zero/one-error =2/5=0.4

27

28  3 different judgers and the majority vote is used as the final label. (1)Relevant (2) irrelevant (3) no opinion (hard to judge) Rank(N)Query suggestionJudger voteP(N) 1q1q1 Rel1/1 2 q2q2 Irrel1/2 3 q3q3 Irrel1/3 4 q4q4 Rel2/4 5 q5q5 Rel3/5 1 when query at j is relevant 0 otherwise

29 Rank(N)Query suggestionP(N) 1 q 11 1/1 2 q 12 1/2 3 q 13 1/3 4 q 14 2/4 5 q 15 3/5 Query 1 Rank(N)Query suggestionP(N) 1q q 22 1/2 3q 23 2/3 4q 24 3/4 5q 25 4/5 Query 2 ……………………….Query K AverageP(q 2 )=0.5 If K=2, MAP=( )/2=0.6 Irrel Rel rel Irrel Rel

30 Short: ≤ 2, medium: [3, 5] and long: ≥ 6

+A novel query suggestion framework which extracted user preference data from user sessions in search engine logs. +used the user patterns to build two suggestion models. - topic-based PageRank model - Learning to Rank model +User study was conducted where all queries are triple-judged by human judgers. Experimental results indicated significant improvement on two models. +Model only considered changing one term from the queries, how the performance will increase/decrease by leveraging a more sophisticated user preference extraction model which could consider multi-term alteration. 31

32