Download presentation
Presentation is loading. Please wait.
1
Tagging with Queries: How and Why?
Ioannis Antonellis Hector Garcia-Molina Jawed Karim
2
Content on the Web Back Link Text Search queries Page Text
Forward Link Text Cnn Obama Critics news Stanford Infolab
3
How? Basic observation: http referrer field contains search query
Stanford Infolab 3
4
How? Stanford Infolab
5
How? Basic observation: http referrer field contains search query
1) Extract queries from web access log Stanford Infolab 5
6
Web Access Log a997c d75c03f22ca8715e50b3 [28/Feb/2007:23:45: ] /group/svsa/cgi-bin/www/officers.php a64344ffd6638d0f6fb2a0284f98b28b [28/Feb/2007:23:45: ] /group/King/ " 413fa663474b2288c e7e62aea [28/Feb/2007:23:46: ] /group/pandegroup/folding/results.html " 3d2edd4dfa7778da92875ee67a [28/Feb/2007:23:46: ] /group/vpge/sgsi/entrepreneurship/ " ac a6c490023e460fd4863a48 [28/Feb/2007:23:46: ] / " 1c Stanford Infolab
7
How? Basic observation: http referrer field contains search query
1) Extract queries from web access log 2) Embed Javascript code in web pages that capture search queries Stanford Infolab 7
8
Embeddable code Stanford Infolab 8
9
How? Basic observation: http referrer field contains search query
1) Extract queries from web access log 2) Embed Javascript code in web pages and capture search queries Convince server administrator/page onwer Stanford Infolab 9
10
Stanford Infolab 10
11
Query tags Stanford Infolab 11
12
Information value of Query Tags
Datasets: Stanford Query Logs: 360,000 URLs, 900,000 query tags 3,000 URLs, 5,500 tags WebBase Stanford Infolab 12
13
Experiments - Summary URLs coverage Query vs Delicious Tags
Query/Delicious Tags vs Pagetext Stanford Infolab
14
URLs coverage Query logs provide tags for ~110 times more URLs than delicious 13% of delicious URLs (380 URLs) only tagged by delicious Stanford Infolab 14
15
Query Tags Query logs provide 42 query tags per URL on average
Stanford Infolab 15
16
Delicious Tags Delicious provides 3 tags per URL on average
Stanford Infolab 16
17
Tags for common URLs Query logs provide 250 query tags per URL on average for common URLs Delicious provides 5 tags per URL on average for common URLs Stanford Infolab 17
18
Query Tags vs Page Text For every URL, 1 out of 3 query tags are not present in the pagetext Stanford Infolab 18
19
Delicious Tags vs Page Text
For every URL, 1 out of 2 query tags are not present in the pagetext Stanford Infolab 19
20
Tags for common URLs For common URLs, 1 out of 2 query/delicious tags not present in the pagetext Stanford Infolab 20
21
Conclusions Query tags: Can be extracted in a distributed fashion
new promising source of information can provide substantially many, new tags, for a large fraction of the Web To be removed Stanford Infolab 21 21
22
Thank You! (DEMO) Stanford Infolab 22
23
Stanford Infolab 23
24
Stanford Infolab 24
25
Stanford Infolab 25
26
Stanford Infolab 26
27
Stanford Infolab 27
28
Stanford Infolab 28
29
Stanford Infolab 29
30
Stanford Infolab 30
31
Stanford Infolab 31
32
Stanford Infolab 32
33
How? Stanford Infolab 33
34
Stanford Infolab 34
35
Stanford Infolab 35
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.