Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Contextual IR Naama Kraus Slides are based on the papers: Searching with Context, Kraft, Chang, Maghoul, Kumar Context-Sensitive Query Auto-Completion,
CS345 Data Mining Page Rank Variants. Review Page Rank  Web graph encoded by matrix M N £ N matrix (N = number of web pages) M ij = 1/|O(j)| iff there.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Link Analysis David Kauchak cs160 Fall 2009 adapted from:
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Evaluating Search Engine
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Link Analysis, PageRank and Search Engines on the Web
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
IR Models: Review Vector Model and Probabilistic.
Overview of Web Data Mining and Applications Part I
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Query Suggestions Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.
1 Computing Relevance, Similarity: The Vector Space Model.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Algorithmic Detection of Semantic Similarity WWW 2005.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Retroactive Answering of Search Queries Beverly Yang Glen Jeh.
Post-Ranking query suggestion by diversifying search Chao Wang.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
User Modeling for Personal Assistant
Search Engines and Link Analysis on the Web
Location-Aware Query Recommendation for Search Engines at Scale
Link-Based Ranking Seminar Social Media Mining University UC3M
Lecture 22 SVD, Eigenvector, and Web Search
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Mining Query Subtopics from Search Log Data
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Panagiotis G. Ipeirotis Luis Gravano
Lecture 22 SVD, Eigenvector, and Web Search
Information Retrieval and Web Design
Presentation transcript:

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi, Castillo, Donato, Vigna, The Query Flow Graph: Model and Applications

Ambiguous queries: jaguar General queries: haifa Terminology differences (synonyms) between user and corpus stars - planets The Problem User queries are an imperfect description of their information needs Examples:

Query Suggestions Assist the user to phrase her information need jaguar Jaguar car Jaguar xf Jaguar animal Jaguar cat

Example: Google Related Searches

Query suggestion algorithms Query suggestions are extracted from the query log – There are methods that use different data sources such as a corpus, not covered today Topic (cluster) based – identify groups of similar queries Sequence based – mine and analyze the query log for likely query sequences

Improving Search Engines by Query Clustering - Baeza-Yates et al. Algorithm outline Offline: – Represent queries as term weighted vectors – Cluster queries – Rank queries in each cluster Online: – Given user’s query q – Find cluster C containing q – Suggest top k queries in cluster C Based on their rank and similarity to q

Query Model Given query q Let U be the set of URLs clicked for q (for all users and sessions) – Information is extracted from the query log q’s term weighted vector has a non 0 entry for any term that appears in some URL in U Terms are weighted according to – Term frequency and URLs popularity – Formula in next slide …

Query Model (2) - The number of clicks of u for the query q Note: paper proposes a refinement to Pop(u,q) which is not biased by search engine’s ranking Query similarity is computed by some measure, e.g. cosine similarity.

Query Support The fraction of the documents returned by the query that captured the attention of users (clicked documents) Denotes how ‘good’ is a query – A ‘global score’ Queries within a cluster are ranked according to their similarity to q as well as their support

Query Flow Graph – Boldi et al. Main idea: Aggregate the (massive) raw data in the query log – Many queries of many users Model user query behavior Use sophisticated techniques to infer query relatedness

Query Flow Graph Model G=(V, E, w) a directed graph where: V – nodes, representing a distinct set of queries Q – Queries are extracted from the query log A set of directed edges E Two queries q,q’ are connected with an edge if q’ follows q in at least one session

QFG Illustration q0 q1 q2 q3 q4 q5 Nodes are queries Edges connect between queries apple ipod apple store

Weighting Function w : E -> (0..1] a weighting function that assigns a weight to every edge (q,q’) For each edge (q,q’) assign a probability that q’ follows q in the same session – Extracted from the observed query log sessions

Illustration q0 q1 q2 q q4 q

Random walk on the QFG A random surfer executes a random walk on the graph as follows: – Start at a some node – Move along an edge with probability d Choose an edge by its probability (weight) – Or teleport to a random node with probability 1-d Choose an edge uniformly The Stationary distribution The probability to be at node q in the infinity Random walk score vector – query absolute scores

Random Walk Relative to a Node Random walk with restart to a single node: – Start at node q – Instead of teleporting to any node, always teleport to q The score of node q’ for this random walk measures relatedness of q’ to q – The probability to get from q to q’ in the infinity – Can normalize node’s relative score by its absolute score ; similar somehow to tfxidf – avoid highly popular queries (non related to q)

The Full Picture Off-line stage – For each node q in the graph Compute the stationary distribution vector of q – A random walk score relative to q Store suggestions for q, alternatives: – top k scored nodes – nodes having a score above some threshold On-line stage – User submits query q – Suggest queries stored for q Queries most related to q