Download presentation
Presentation is loading. Please wait.
1
1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu, cho}@cs.ucla.edu
2
2 Motivation Users have different goals for Web search –Reach the homepage of an organization (e.g., UCLA) –Learn about a topic (e.g., simulated annealing) –Download online music, etc. Can we identify the user goal for a Web search automatically? –Improve and customize search results based on the identified user goal, for example
3
3 Two high-level user-goals Navigational query –Reach a Web site the user already has in mind (e.g., “UCLA Library”) Informational query –Visit multiple sites to learn about a particular topic (e.g. “Simulated Annealing”) Based on [Broder02, Rose&Levinson04] –Navigational and informational are common in both studies
4
4 Exploiting identified user goals Tailored weighting/ranking mechanism –Navigational queries Emphasize on anchor texts [Craswell01, Kang03], URL path [Westerveld01] –Informational queries Emphasize on page content [Kang03], IR techniques (query expansion, relevance feedback, pseudo relevance feedback, etc.) Tailored result presentation –Informational queries Clustered search results [Etzioni99, Zeng04, Kummamuru04] Targeted ads / answers
5
5 Outline Are query goals predictable? –Human-subject study How can we predict user goals automatically? –Anchor-link distribution –User-click distribution How effective are our features? –Experimental evaluation
6
6 Are query goals “predictable”? Search engines “see” only a few keywords –No explicit indication of goals by users –Can we predict the user goal simply from the keywords? Human subject study –50 most popular Google queries from UCLA CS –28 participants (grad students) from UCLA CS –Ask subjects to indicate the likely goal of each query if they had issued it Do most subjects agree on a particular goal?
7
7 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = 0.038 for “UCLA Library” Queries with a predictable goal
8
8 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = 0.038 for “UCLA Library” “ambiguous queries” 43.5% software names 30.4% person names
9
9 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = 0.038 for “UCLA Library” After removing software and person-name queries
10
10 Human subject study: summary Majority of queries have predictable goals Interestingly, most ambiguous queries tend to be on a certain set of topics –Topic-based ambiguity detection may be possible –Treat ambiguous queries differently from others
11
11 Outline Are query goals predictable? –Human-subject study How can we predict user goals automatically? How effective are our features? –Experimental evaluation
12
12 How to predict user goal? “UCLA Library” vs. “Simulated Annealing” –Navigational vs. informational –Semantic analysis necessary? Our idea: use information provided implicitly by Web users – Web-link structure – User-click behavior
13
13 Web-link structure Anchor-link distribution to quantify the link structure www.library.ucla.edu UCLA Library www.ucla.edu/ library.html repositories.cdlib.org/uclalib/
14
14 Web-link structure Anchor-link distribution to quantify the link structure www.library.ucla.edu www.ucla.edu/ library.html repositories.cdlib.org/uclalib/ Anchor-link distribution for query: “UCLA Library”
15
15 Anchor-link distribution for sample queries NavigationalInformational “UCLA Library” “Simulated Annealing”
16
16 User-click behavior Click distribution to quantify past user- click behavior Click distribution for the navigational query: “UCLA Library”
17
17 User-click behavior (cont’d) NavigationalInformational “UCLA Library” “Simulated Annealing”
18
18 Capturing the “shape” of distributions Possible numeric features for f(x) –Mean – –Median –Skewness – (x - ) 3 f(x) dx / 3 How “asymmetric” f(x) is –Kurtosis – (x - ) 4 f(x) dx / 4 How “peaked” f(x) is Single linear regression –Median is the most effective measurement for both anchor- link distribution and click distribution
19
19 Evaluation of features Based on 30 queries from the human subject study –Except software and person-name queries –Each query is associated with a distinct user goal Anchor-link distribution for each query –Based on 60M pages crawled from the Web Click distribution for each query –Based on Google-result click behavior from UCLA CS during April 2004 - September 2004
20
20 Goal-prediction graph (synthetic) An effective feature (hypothetically) navigational informational
21
21 Prediction graph: median of anchor-link dist. 1 = 1.0 Navigational iff median < 1 = 1.0 –Navigational queries: the vast majority of links point to the #1 anchor destination Prediction accuracy: 80.0% navigational informational
22
22 Prediction graph: combining the two features Linear combination with equal weights: Navigational queries iff the median of click dist. + the median of anchor-link dist. < 1 + 2 (= 2.0) Prediction accuracy: 90% 1 + 2 = 2.0 navigational informational
23
23 Three features in [Kang and Kim 03] (1) Anchor usage rate (2) Query term distribution (3) Term-dependence Comparison with previous work navigational informational Result –Could not reproduce reported results –Three features not very effective
24
24 Summary Two effective features for goal identification –Anchor-link distribution (Web-link structure) and click distribution (user-click behavior) –Achieved an overall accuracy of 90% on a benchmark query set More details in the paper
25
25 Future work Evaluate on a larger and less biased query set Handle queries with insufficient anchor/click statistics –Learn patterns from queries whose goals are clear Predict search intentions on a finer granularity –Informational queries can be further classified, e.g., directed, undirected, advice, list, etc. [Rose04] –Analyze the contents of Web pages that users have clicked/viewed –Linguistic methods
26
26 Thank you Any questions?
27
27 Questionnaire design 1 st version: direct classification by subjects –Navigational vs. informational –Some confusion “Alan Kay”: home page + other pages “Have a site in mind?” vs “plan to visit one site?” 2 nd version: 1.Have a site in mind. Intend to visit only that site 2.Have a site in mind. But willing to visit others 3.Have no site in mind. Willing to visit anything relevant
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.