Download presentation
Presentation is loading. Please wait.
1
1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)
2
2 What are quicklinks Quicklinks Result Website
3
3 Quicklinks = URLs within the search result website Enable fast navigation to important parts of the website Which URLs should be QLs? Quicklinks Result Website
4
4 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine URL may have low relevance in the QL context lib.utexas.edu/maps is popular for searches on “maps” and not for searches on “Univ. of Texas” URL may be too specific: automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com URL popularity be time sensitive: nytimes.com/election-guide/2008/ for nytimes.com
5
5 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine Top visited URLs in toolbar data May not relate to search activity: e.g., for nytimes.com #3 is nytimes.com/mem/emailthis.htmlnytimes.com/mem/emailthis.html #6 is nytimes.com/auth/loginnytimes.com/auth/login #8 is nytimes.com/gst/regi.htmlnytimes.com/gst/regi.html
6
6 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine Top visited URLs in toolbar data Top URLs from analysis of hyperlink graph Ignores preferences of search users Toolbar data is more representative Heavily tagged URLs (e.g., del.icio.us/digg) Low coverage: Too few websites
7
7 Quicklink Selection Need a combined approach Search logs Toolbar data Web-server logs Website hyperlink graph User tags This paper
8
8 Related Work Sitemap generation [Perkowitz+/00] Detection of hard-to-find URLs [Srikant+/01] Improving website navigability [Doerr+/07] Mining Web usage patterns [Buchner/99, Cadez+/03] BrowseRank [Liu+/08] Post-search browsing behavior [Bilenko+/08] We focus on QLs in the context of Search
9
9 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions
10
10 Problem Formulation Which k URLs should be QLs? “The greatest good for the greatest number” QLs save clicks Maximize the total number of clicks saved using at most k QLs But when exactly is a click “saved”?
11
11 Problem Formulation When does a QL get clicked by the user? Graph of click trails (Toolbar data) Say we pick this node as a QL nasa.gov Hubble telescope Photos
12
12 Problem Formulation Say we pick this node as a QL Assumption: The user recognizes if SearchResult QL Destination Graph of click trails (Toolbar data) nasa.gov Hubble telescope Photos
13
13 Problem Formulation Say we pick this node as a QL (saves 1 click each) Assumption: The user recognizes if SearchResult QL Destination Graph of click trails (Toolbar data) nasa.gov
14
14 Problem Formulation Say we pick this node as a QL (saves 1 click each) (saves 2 clicks each) (saves 0) Total savings = 1*3 + 2*2 = 7 clicks Graph of click trails (Toolbar data) Assumption: The user recognizes if SearchResult QL Destination nasa.gov
15
15 Problem Formulation However… Unknown pages might become QLs lyrics.com A BCZ … These could become the “best” QLs
16
16 Problem Formulation However… Unknown pages might become QLs Automatic-redirect pages might become QLs: nytimes.com forces logging in aaa.com forces zipcode entry We need QLs that are “noticeable” in a search context
17
17 Problem Formulation How can we estimate noticeability? Via Search click-logs Noticeability of a URL u: User notices a useful QL with probability α(u) Tuning param (≈ 2) Fraction of search clicks for u on website
18
18 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 Assumption: The user picks the best QL that he/she notices nasa.gov ?
19
19 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 If only QL1 is perfectly noticeable (α 1 =1, α 2 =0): Total = 7 clicks (as if 1 QL only) If both QLs are perfectly noticeable (α 1 =1, α 2 =1): Total = 9 clicks nasa.gov
20
20 Problem Formulation Which k URLs should be QLs? Maximize the expected number of clicks saved using at most k QLs while incorporating “noticeability”
21
21 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions
22
22 Algorithms Maximize expected number of saved clicks using k QLs NP-Hard Theorem: This objective is non-decreasing submodular 1. Non-negative 2. Adding QLs never hurts 3. “Diminishing Returns” u Marginal improvement to set S Marginal improvement to superset S’
23
23 Algorithms Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most Within a factor (1-1/e) of OPT [Nemhauser+/’78]
24
24 Algorithms However… Inhomogeneous results: QLs for ea.com are fifa08.ea.com battlefield.ea.com 6 webpages deep inside thesim2.ea.com Redundant results: QLs for senate.gov include obama.senate.gov obama.senate.gov/about obama.senate.gov/contact obama.senate.gov/votes Parent URL makes the child URLs redundant Two games made by EA
25
25 Algorithms Both can be specified as pairwise constraints on URLs allowed to belong to a QL set Pairwise-constrained QL selection is NP-hard. Two-step process: Heuristically find a large subset of trails that form a tree Enforce constraints on tree Dynamic program optimal on tree
26
26 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions
27
27 Experiments Baseline Methods TopClicked: URL score = # search clicks on URL TopVisited: URL score = # occurrences on toolbar trails PageRank: Build a weighted graph on URLs, where weight(i,j) = # trails using the i j edge URL score = PageRank on this graph
28
28 Experiments Live Traffic dataset Computed CTRs on QLs currently displayed by Yahoo! (1043 website subset) Measure: Pick two equal-sizes subsets of QLs Use sum-of-scores and sum-of-CTRs to predict the better subset Measure how often the predictions match
29
29 Experiments Live Traffic Data Subset sizes Fraction of subset-pairs where predictions agree with live traffic QL-ALG > TopVisited > PageRank > TopClicked
30
30 Experiments Tree-structured trails Most dropped trails are very short Tree-structured trails improve accuracy 110100100010000 0 20 40 60 80 100 Length of trail Number of trails dropped Live Traffic prediction quality comparison Distribution of dropped trails
31
31 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions
32
32 Conclusions Proposed a formulation for the QL selection problem Both toolbar and search logs are used intuitively Proposed two algorithms: Greedy: (1-1/e)-optimal Tree-structured: empirically better Improvement of 22% over competing baselines
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.