Query Suggestions Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Slides:

Advertisements

Similar presentations

Recommender Systems & Collaborative Filtering

Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:

Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.

Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.

Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.

Evaluating Search Engine

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.

Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.

Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.

Link Analysis, PageRank and Search Engines on the Web

Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.

1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.

ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.

Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.

Link Structure and Web Mining Shuying Wang

1 COMP4332 Web Data Thanks for Raymond Wong’s slides.

PageRank Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 27, 2014.

Information Retrieval

Overview of Web Data Mining and Applications Part I

Overview of Search Engines

Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.

CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:

SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.

Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Web science Presentation on the topic - Query Recommendation Muhammad Nuruddin ITIS M. Sc. Student Leibniz Universitat Hannover Winter Semester 2012/13.

User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.

Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.

CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.

CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.

CIS 430 November 6, 2008 Emily Pitler. 3  Named Entities  1 or 2 words  Ambiguous meaning  Ambiguous intent 4.

Search A Basic Overview Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 20, 2014.

Question Answering over Implicitly Structured Web Content

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

Exploiting Relevance Feedback in Knowledge Graph Search

Supporting Knowledge Discovery: Next Generation of Search Engines Qiaozhu Mei 04/21/2005.

Hongbo Deng, Michael R. Lyu and Irwin King

Post-Ranking query suggestion by diversifying search Chao Wang.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.

1 CS 430: Information Discovery Lecture 5 Ranking.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.

CS 440 Database Management Systems Web Data Management 1.

CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.

Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.

1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.

Recommender Systems & Collaborative Filtering

Improving Search Relevance for Short Queries in Community Question Answering Date： 2014/09/25 Author ： Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.

Location-Aware Query Recommendation for Search Engines at Scale

Link-Based Ranking Seminar Social Media Mining University UC3M

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Author: Kazunari Sugiyama, etc. (WWW2004)

CS 440 Database Management Systems

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Introduction to Information Retrieval

Information Retrieval and Web Design

Topic: Semantic Text Mining

Presentation transcript:

Query Suggestions Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata

Search engines 2 User needs some information Assumption: the required information is present somewhere A search engine tries to bridge this gap How:  User expresses the information need – in the form of a query  Engine returns – list of documents, or by some better means

Search queries Navigational queries  We know the answer (which document we want), just using a search engine to navigate – tendulkar date of birth  Wikipedia / Bio page – serendipity meaning  dictionary page – air india  simply the URL: – In a people database, typing the name  the record of the person we are looking for  Straightforward to formulate such queries  Query suggestion is primarily for saving time and typing Simple informational queries  100 USD in INR  Currency conversion requested  kolkata weather  weather information requested Complex informational queries  We do not know the answers  Hence, we may not express the question perfectly

Why query suggestion? 4 User needs some information Assumption: the required information is present somewhere A search engine tries to bridge this gap User: information need  a query in words (language) Engine processes the documents We may not know what information is available, or in what form the it is present, or we cannot express it well If you know what exactly you want, it’s easier to get it We may not know what information is available, or in what form the it is present, or we cannot express it well If you know what exactly you want, it’s easier to get it

Interactive query suggestion 5

Why query suggestion? 6 User needs some information Assumption: the required information is present somewhere A search engine tries to bridge this gap User: information need  a query in words (language) Engine processes the documents We may not know what information is available, or in what form the it is present, or we cannot express it well If you know what exactly you want, it’s easier to get it We may not know what information is available, or in what form the it is present, or we cannot express it well If you know what exactly you want, it’s easier to get it Query logs have the wisdom of crowd

Query suggestion methods using query logs  Can leverage wisdom of crowd  High quality, queries are well formed  Methods – Query similarity Baeza-Yats et. al., 2004; Barouni-Ebarhimi and Ghorbani, 2007 – Click-through data Sao et. al., 2008; Mao et. al., 2008; Song and wei He, 2010 – Query-URL bipartile graph, hitting time Me et. al., 2008; Ma et. al., 2010 – Session information Lee et. al., 2009; Cucerzan and White, 2010; Jones et. al., 2006

Query suggestion without using query logs 8  Custom search engines in the enterprise world  Small scale search, not so much of log, e.g. paper search (Google scholar still does not have a query suggestion)  Site search (search within ISI website)  Desktop search – only one user  Legally restricted environment – if you cannot expose other users’ queries even anonymously  Method: have to suggest collection of words that are likely to form meaningful queries matching the partial query typed by the user so far

QUERY SUGGESTION USING QUERY SIMILARITY Baeza-Yates et al,

Outline Preprocessing (offline)  Represent queries as term weighted vectors  Cluster queries using similarity between queries  Rank queries in each cluster Query time (online)  Given the user’s query q  Find cluster C in which q should belong  Suggest top k queries in cluster C – Based on their rank and similarity with q 10

Query term weight model 11 term frequency of i-th term in document with URL u Weight of i-th term in q popularity of clicking URL u after querying with q maximum term frequency of any term from q in document with URL u For all URLs that have been clicked after querying with query q Query similarity is computed using cosine similarity Cluster queries using this similarity

Query support and ranking  What is a good query? – Several users are submitting the same query – For some queries, more returned documents are clicked by some user – For some other queries, less returned documents are clicked – If no result is ever clicked  Not a good query at all – Query goodness ~ fraction of returned documents clicked by some user – A global score  rank within cluster Final ranking at query time  Rank using a combination of query support and similarity with the given query 12

QUERY SUGGESTION USING SESSION INFORMATION Boldi et al, CIKM 2008; Also other papers 13

Suggestion to aid reformulation Assumptions  User is happy when the information need is fulfilled  User keeps reformulating the query until satisfied  Within – session reformulation probability of q’ from q 14 Number of occurrences of the query q Probability of q’ appearing after q in a session Number of occurrences of q’ appearing followed by q

Query graph / transition matrix of queries  Draw a graph with queries as nodes  Weight of the edge q  q’ is by the within session reformulation probability  Concept similar to PageRank – Random walk, with some probability teleport to any query – What is the probability that the user would eventually type q’?  Compute the stationary probability distribution of each query 15

Query suggestion for a query q Random walk relative to q  With probability p follow path (random walk)  With probability 1 – p teleport to q (no other node) Query suggestion  Offline: store top-k ranked queries for each q  Online: given q, return the top ranked queries as suggestions 16

QUERY SUGGESTION USING HITTING TIME Mei, Zhou & Church (Microsoft research), WSDM Using slides by the authors

Random Walk and Hitting Time 18 i k A j P = 0.7 P = 0.3 Hitting Time  T A : the first time that the random walk is at a vertex in A Mean Hitting Time  h i A : expectation of T A given that the walk starts from vertex i

Computing Hitting Time 19 i k A j T A : the first time that the random walk is at a vertex in A Iterative Computation h i A : expectation of T A given that the walk starting from vertex i h = 0 h i A = 0.7 h j A h k A Apparently, h i A = 0 for those

A i j w(i, j) = V1V1 V2V2 7 1 Bipartite Graph and Hitting Time 20 Expected proximity of query i to the query A : hitting time of i  A, h i A Bipartite Graph: - Edges between V 1 and V 2 - No edge inside V 1 or V 2 - Edges are weighted Example: V 1 = {queries}; V 2 = {URLs} Convert to a directed graph, even collapse one group A i j V1V1 V2V2 7 1

Generate Query Suggestion 21 T aa american airline mexiana planner_main.jsp en.wikipedia.org/wiki/Mexicana QueryUrl Construct a (kNN) subgraph from the query log data (of a predefined number of queries/urls) Compute transition probabilities p(i  j) Compute hitting time h i A Rank candidate queries using h i A

Intuition  Why it works? –A url is close to a query if freq(q, url) dominates the number of clicks on this url (most people use q to access url) –A query is close to the target query if it is close to many urls that are close to the target query 22

SUMMARY Query suggestion using query logs 23

Summary  A current field of research  Primary approaches using query logs  Query – query similarity – Word based – Query – URL association based – Session information: a query following another  Personalization / Context awareness is very important – Several works, not covered in this class though 24

References  A.Anagnostopoulos,L.Becchetti,C.Castillo,andA.Gionis.Anoptimizationframeworkfor query recommendation. In Proc. of WSDM 2010, pages 161–170,  R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Current Trends in Database Technology-EDBT 2004 Workshops, pages 588–596,  P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In Proc. of CIKM 2008, pages 609–618,  P. Boldi, F. Bonchi, C. Castillo, and S. Vigna. From Dango to Japanese Cakes: Query Refor- mulation Models and Patterns. In Proc. of WI 2009, pages 183–190,  H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through and session data. In Proc. of KDD 2008, pages 875–883,  Q. He, D. Jiang, Z. Liao, S. Hoi, K. Chang, E. Lim, and H. Li. Web query recommendation via sequential query prediction. In Proc. of ICDE 2009, pages 1443–1454,  H. Ma, H. Yang, I. King, and M. Lyu. Learning latent semantic relations from clickthrough data for query suggestion. In Proc. of CIKM 2008, pages 709–718,  Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In Proc. of CIKM 2008, pages 469–478,  Y. Song and L. He. Optimal rare query suggestion with implicit user feedback. In Proc. of WWW 2010, pages 901–910,