Mining Query Subtopics from Search Log Data

Mining Query Subtopics from Search Log Data
Hu et al., SIGIR’12 Presented by Baichuan Li

Outline Problem Two observations Approach Results Applications
Conclusion

Problem Intention != Query Many queries are ambiguous or multifaceted

Problem

Problem Xbox Manchester Homepage? Online game? Where to buy?
Manchester news? Manchester tourism? Manchester weather? Manchester united?

Solution Identifying the major subtopics of a query from the search log data Personalized search Query suggestion Search result presentation Clustering Re-ranking Diversiﬁcation

Observations from The Logs
One subtopic per search (OSS) If a user clicks multiple URLs after submitting a query, then the clicked URLs tend to represent the same subtopic

Observations from The Logs
Subtopic clariﬁcation by additional keyword (SCAK) Users often add additional keywords (in most cases, one additional keyword) to a query to expand the query in order to clarify its subtopic The URLs clicked after searching both with the original and the expanded queries tend to represent the same subtopic The key word tends to be indicative of the subtopic E.g. harry shum microsoft

One Subtopic per Search
Use it as the rule for subtopic identification 10,000 groups of multiclicks of individual queries The rule is accurate when all the URLs within the multi-clicks are about the same sense or face

Accuracy V.S. Frequency

Accuracy v.s. Click Position

Distribution The queries with higher frequencies in search log data are more likely to have multi-clicks

Subtopic Clariﬁcation by Additional Keyword
The keywords ‘microsoft’ and ‘jr’ can be used to represent the two groups (subtopics) respectively

Query Types ‘Q’: the query is a single phrase ‘Q + W’: ‘Q’ + a keyword
‘W + Q’: a keyword + ‘Q’ ‘Others’

Subtopic Overlap and URL Overlap
Randomly selected 500 pairs of queries with the forms ‘Q’ and ‘Q + W’ If subtopics of an expanded query are contained in subtopics of the original query, then there is ‘subtopic overlap’ Check whether two queries share identical clicked URLs (‘URL overlap’)

Distribution The more popular (frequent) a query is, the more likely the rule is applicable

Mining Subtopics

Preprocessing

Clustering Similarity function
The similarity function between URLs ui and uj : S1: based on the OSS phenomenon where and denote the vectors of multi-clicks of ui and uj respectively S2: based on the SCAK phenomenon S3: based on string similarities

SCAK Similarity (S2) where and denote the vectors of keywords associated with ui and uj respectively

Data TREC Data: TREC search result diversiﬁcation track 3 in 2009
DataSetA and DataSetB: queries and URLs randomly sampled from the logs of the commercial search engine

Results

Application: Search Result Reranking
The user is ﬁrst asked which subtopic she is interested in, with the subtopics shown at the top of the results page When the user selects a subtopic, the URLs belonging to the subtopic will be moved to the top (re-ranked) The relative order between URLs inside and outside of the subtopic will be kept

Example

Evaluation Data Results
Collect search log data of 20, 000 randomly selected searches Each query has at least two subtopics mined by the method Results The average position of last clicked URLs is 3.41 Assume the cost for the user to check the subtopics and click one of them is 1.0 The average position of last clicked URLs belonging to the same subtopics is 1.80

Conclusion Study the problem of query subtopic mining
Discovered two phenomena of user search behavior one subtopic per search Subtopic clariﬁcation by additional keyword Design a novel similarity function Applications on search result reranking (and search result clustering) Problem Can only be employed when there is enough search log data

Mining Query Subtopics from Search Log Data

Similar presentations

Presentation on theme: "Mining Query Subtopics from Search Log Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Query Subtopics from Search Log Data

Similar presentations

Presentation on theme: "Mining Query Subtopics from Search Log Data"— Presentation transcript:

Similar presentations

About project

Feedback