Download presentation
Presentation is loading. Please wait.
1
Mining Query Subtopics from Search Log Data
Hu et al., SIGIR’12 Presented by Baichuan Li
2
Outline Problem Two observations Approach Results Applications
Conclusion
3
Problem Intention != Query Many queries are ambiguous or multifaceted
4
Problem
5
Problem Xbox Manchester Homepage? Online game? Where to buy?
Manchester news? Manchester tourism? Manchester weather? Manchester united?
6
Solution Identifying the major subtopics of a query from the search log data Personalized search Query suggestion Search result presentation Clustering Re-ranking Diversification
7
Observations from The Logs
One subtopic per search (OSS) If a user clicks multiple URLs after submitting a query, then the clicked URLs tend to represent the same subtopic
9
Observations from The Logs
Subtopic clarification by additional keyword (SCAK) Users often add additional keywords (in most cases, one additional keyword) to a query to expand the query in order to clarify its subtopic The URLs clicked after searching both with the original and the expanded queries tend to represent the same subtopic The key word tends to be indicative of the subtopic E.g. harry shum microsoft
10
One Subtopic per Search
Use it as the rule for subtopic identification 10,000 groups of multiclicks of individual queries The rule is accurate when all the URLs within the multi-clicks are about the same sense or face
11
Accuracy V.S. Frequency
12
Accuracy v.s. Click Position
13
Distribution The queries with higher frequencies in search log data are more likely to have multi-clicks
14
Subtopic Clarification by Additional Keyword
The keywords ‘microsoft’ and ‘jr’ can be used to represent the two groups (subtopics) respectively
15
Query Types ‘Q’: the query is a single phrase ‘Q + W’: ‘Q’ + a keyword
‘W + Q’: a keyword + ‘Q’ ‘Others’
16
Subtopic Overlap and URL Overlap
Randomly selected 500 pairs of queries with the forms ‘Q’ and ‘Q + W’ If subtopics of an expanded query are contained in subtopics of the original query, then there is ‘subtopic overlap’ Check whether two queries share identical clicked URLs (‘URL overlap’)
17
Distribution The more popular (frequent) a query is, the more likely the rule is applicable
18
Mining Subtopics
19
Preprocessing
20
Clustering Similarity function
The similarity function between URLs ui and uj : S1: based on the OSS phenomenon where and denote the vectors of multi-clicks of ui and uj respectively S2: based on the SCAK phenomenon S3: based on string similarities
21
SCAK Similarity (S2) where and denote the vectors of keywords associated with ui and uj respectively
22
Data TREC Data: TREC search result diversification track 3 in 2009
DataSetA and DataSetB: queries and URLs randomly sampled from the logs of the commercial search engine
23
Results
24
Application: Search Result Reranking
The user is first asked which subtopic she is interested in, with the subtopics shown at the top of the results page When the user selects a subtopic, the URLs belonging to the subtopic will be moved to the top (re-ranked) The relative order between URLs inside and outside of the subtopic will be kept
25
Example
26
Evaluation Data Results
Collect search log data of 20, 000 randomly selected searches Each query has at least two subtopics mined by the method Results The average position of last clicked URLs is 3.41 Assume the cost for the user to check the subtopics and click one of them is 1.0 The average position of last clicked URLs belonging to the same subtopics is 1.80
27
Conclusion Study the problem of query subtopic mining
Discovered two phenomena of user search behavior one subtopic per search Subtopic clarification by additional keyword Design a novel similarity function Applications on search result reranking (and search result clustering) Problem Can only be employed when there is enough search log data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.