Download presentation
Presentation is loading. Please wait.
Published byBruno Harrell Modified over 9 years ago
1
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim
2
Contents Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion 2 / 20
3
Introduction tags 3 / 20
4
Introduction Questions 1. Are queries and tags similar across URLs? 2. Can tag data be used to approximate user queries to a search engine? 3. Can query logs be used to suggest new tags for a particular webpage? 4. For what types of websites is the correlation between the term distributions for queries and tags the highest? 5. Which of the distributions, tags or queries, is most closely related to the content of the clicked websites? 4 / 20
5
Building a Dataset AOL query log –Sizable –Recent (2006) –English queries –Available to academic researchers –657,426 users –A period of 3 months from March to May, 2006 Delicious tag –Collaborative tagging system Final dataset: 4145 complete URLs –Google query, stemming, prunning 5 / 20
6
Are the Distributions Similar? http://www.nytimes.com tags or 6 / 20
7
Are the Distributions Similar? Kullback-Leibler divergence 7 / 20
8
Are the Distributions Similar? Jensen-Shannon divergence –Symmetric measure Overlap coefficient V q : query logs V r : tags 8 / 20
9
Are the Distributions Similar? 9 / 20
10
Are the Distributions Similar? Open directory project 10 / 20
11
Are the Distributions Similar? 11 / 20
12
Are the Distributions Similar? 12 / 20
13
Are the Distributions Similar? 13 / 20
14
Are the Distributions Similar? 14 / 20
15
Are the Distributions Similar? 15 / 20
16
Are the Distributions Similar? 16 / 20
17
Investigating Website Content 17 / 20
18
Investigating Website Content 18 / 20
19
Conclusion Similarity between query term and tag –Vocabularies contain a large amount of overlap –Term frequency distributions are correlated –Similarity is not dependent on the topic area Queries are more similar to content than to tags Queries and tags are more similar to one another than to content Future work –Models for automatically removing noise from the tag and query logs –Techniques for predicting useful tags from query distributions –Techniques for the effective use of tag data to improve different forms of Web search 19 / 20
20
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.