Download presentation
Presentation is loading. Please wait.
Published byEleanor Washington Modified over 9 years ago
1
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1
2
Outline Introduction Framework of the Proposed Method Mining Query Concepts Concept Sequence Suffix Tree Experimental Evaluation Summary 2
3
Introduction What is query suggestion in search engine? Guess user’s search intent ( user query ) suggest queries Why query suggestion is important? Easy to issue appropriate query? No! A “bottleneck issue” of search engine usability (Google, Yahoo, Bing, Baidu, etc) 3 Better describe user’s information need?
4
Introduction Major existing approaches (with search log data) : Approach I: clustering queries using clicked URL data to find similar queries, Approach II: mining pairs of queries which are adjacent or co-occur in the same query session, 4 Fig1: An example of search log data
5
Introduction Key Limitation: None of them are context-aware: do not consider the immediately preceding queries as context, The clustering algorithms cannot scale up to very large data well. An example: “apple” “steve jobs” “apple” 5 User’s search intent? 1.8 billion query (151 million unique), 2.6 billion clicked URL(114 million unique)
6
Proposed Method Framework 6 Key steps: Capture the context: concept sequence Quickly find the queries that many users ask in that context Clustering queries Concept Sequence Suffix Tree
7
An example of click-through bipartites data from search log: 7 Mining Query Concepts For each query : a -normalized vector,
8
Key challenges to cluster queries: Search log click-through bipartite could be huge: e.g., 151 million unique queries Number of clusters is unknown Extremely high dimensionality of query vector: 114 million unique URLs Search logs increase dynamically Existing query clustering algorithms: Hierarchical agglomerative method DBSCAN method (Wen, WWW’01) K-means, etc. 8 Mining Query Concepts
9
Proposed clustering method: 9 Mining Query Concepts
10
for each query : Step 1: first find the closest cluster to among the clusters obtained so far Step 2: compute the diameter of cluster Step 3: 1) diameter, is assigned to, 2) otherwise, create a new cluster containing only quite efficient: Only need one scan of queries Can run efficiently on a PC of 2GM main memory 10 Mining Query Concepts
11
Tricks for algorithm efficiency improvement: A dimension array data structure used in step 1 (sparse data) Prune edges of low weights 11 Mining Query Concepts
12
Extract query sessions data each individual user’s behavior (query/click) data segment into sessions (time interval>30mins) discard the click event data 12 Concept Sequence Suffix Tree Fig: An example of search log data
13
Concept sequence suffix tree A structure used to efficiently find (search) the queries that many users ask in that context (concept sequence) 13 Concept Sequence Suffix Tree Fig: An example
14
Algorithm to build concept sequence suffix tree: 1) Map training session data to 2) Enumerate subsequence of (distributed, map-duce) 3) Get all frequent concept subsequences 4) Organize these into concept sequence suffix tree 14 Concept Sequence Suffix Tree
15
Algorithm for organizing into concept sequence suffix tree : 15 Concept Sequence Suffix Tree
16
Organize into concept sequence suffix tree : 1) start from root node (empty), and scan through all frequent concept subsequence cs 2) for each first find node corresponding to if cr doesn’t exist, create it 3) update the list of candidate concepts of if is among the top K (a specified threshold, e.g., K=5) candidates so far; 4) representative query of the top K candidate concepts are candidate suggestions for sequence 16 Concept Sequence Suffix Tree
17
Review an example of Concept Sequence Suffix Tree: 17 Concept Sequence Suffix Tree
18
Online query suggestion algorithm: 18 Concept Sequence Suffix Tree
19
For a query sequence : Map it to concept sequence : if is a new query, stop mapping, and returned concept sequence corresponding to ; Search the tree to find the longest matched subsequence of the form Use candidate suggestions for as query suggestion for 19 Concept Sequence Suffix Tree
20
Review an example of Concept Sequence Suffix Tree: 20 Concept Sequence Suffix Tree
21
Experimental Evaluation Training Data: A commercial search engine search log (Bing) in US 1.8 billion queries (151 million unique ), 2.6 billion URL clicks (115 million unique), 840million sessions Baseline algorithms: Adjacency: given, rank based on frequency of N-Gram: given, rank based on frequency of Test set data: Test -0: 1000 randomly selected single-query case sessions Test-1: 1000 randomly selected multi-query case sessions 21
22
Experimental Results Coverage of suggestion: 22 Fig: The coverage of the three methods on (a) Test-0 and (b) Test-1
23
Experimental Results Quality of suggestion: (collect relevance grading from 10 judges) 23 Fig: The quality of the three methods on (a) Test-0 and (b) Test-1
24
Summary Three things to know: Some basics about query suggestion using search log The proposed efficient query clustering algorithm for search- log click-through bipartites data The proposed efficient context-aware query suggestion method using concept sequence suffix tree 24 Hints: “concept” level N-gram with varied length N + A structure for efficient search
25
Thank You! 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.