Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Similar presentations


Presentation on theme: "Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,"— Presentation transcript:

1 Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi, Castillo, Donato, Vigna, The Query Flow Graph: Model and Applications

2 Ambiguous queries: jaguar General queries: haifa Terminology differences (synonyms) between user and corpus stars - planets The Problem User queries are an imperfect description of their information needs Examples:

3 Query Suggestions Assist the user to phrase her information need jaguar Jaguar car Jaguar xf Jaguar animal Jaguar cat

4 Example: Google Related Searches

5 Query suggestion algorithms Query suggestions are extracted from the query log – There are methods that use different data sources such as a corpus, not covered today Topic (cluster) based – identify groups of similar queries Sequence based – mine and analyze the query log for likely query sequences

6 Improving Search Engines by Query Clustering - Baeza-Yates et al. Algorithm outline Offline: – Represent queries as term weighted vectors – Cluster queries – Rank queries in each cluster Online: – Given user’s query q – Find cluster C containing q – Suggest top k queries in cluster C Based on their rank and similarity to q

7 Query Model Given query q Let U be the set of URLs clicked for q (for all users and sessions) – Information is extracted from the query log q’s term weighted vector has a non 0 entry for any term that appears in some URL in U Terms are weighted according to – Term frequency and URLs popularity – Formula in next slide …

8 Query Model (2) - The number of clicks of u for the query q Note: paper proposes a refinement to Pop(u,q) which is not biased by search engine’s ranking Query similarity is computed by some measure, e.g. cosine similarity.

9 Query Support The fraction of the documents returned by the query that captured the attention of users (clicked documents) Denotes how ‘good’ is a query – A ‘global score’ Queries within a cluster are ranked according to their similarity to q as well as their support

10 Query Flow Graph – Boldi et al. Main idea: Aggregate the (massive) raw data in the query log – Many queries of many users Model user query behavior Use sophisticated techniques to infer query relatedness

11 Query Flow Graph Model G=(V, E, w) a directed graph where: V – nodes, representing a distinct set of queries Q – Queries are extracted from the query log A set of directed edges E Two queries q,q’ are connected with an edge if q’ follows q in at least one session

12 QFG Illustration q0 q1 q2 q3 q4 q5 Nodes are queries Edges connect between queries apple ipod apple store

13 Weighting Function w : E -> (0..1] a weighting function that assigns a weight to every edge (q,q’) For each edge (q,q’) assign a probability that q’ follows q in the same session – Extracted from the observed query log sessions

14 Illustration q0 q1 q2 q3 0.5 0.25 q4 q5 0.1 0.55 0.35 0.2 0.8 1.0

15 Random walk on the QFG A random surfer executes a random walk on the graph as follows: – Start at a some node – Move along an edge with probability d Choose an edge by its probability (weight) – Or teleport to a random node with probability 1-d Choose an edge uniformly The Stationary distribution The probability to be at node q in the infinity Random walk score vector – query absolute scores

16 Random Walk Relative to a Node Random walk with restart to a single node: – Start at node q – Instead of teleporting to any node, always teleport to q The score of node q’ for this random walk measures relatedness of q’ to q – The probability to get from q to q’ in the infinity – Can normalize node’s relative score by its absolute score ; similar somehow to tfxidf – avoid highly popular queries (non related to q)

17 The Full Picture Off-line stage – For each node q in the graph Compute the stationary distribution vector of q – A random walk score relative to q Store suggestions for q, alternatives: – top k scored nodes – nodes having a score above some threshold On-line stage – User submits query q – Suggest queries stored for q Queries most related to q


Download ppt "Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,"

Similar presentations


Ads by Google