1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,

1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu, cho}@cs.ucla.edu

2 Motivation Users have different goals for Web search –Reach the homepage of an organization (e.g., UCLA) –Learn about a topic (e.g., simulated annealing) –Download online music, etc. Can we identify the user goal for a Web search automatically? –Improve and customize search results based on the identified user goal, for example

3 Two high-level user-goals Navigational query –Reach a Web site the user already has in mind (e.g., “UCLA Library”) Informational query –Visit multiple sites to learn about a particular topic (e.g. “Simulated Annealing”) Based on [Broder02, Rose&Levinson04] –Navigational and informational are common in both studies

4 Exploiting identified user goals Tailored weighting/ranking mechanism –Navigational queries Emphasize on anchor texts [Craswell01, Kang03], URL path [Westerveld01] –Informational queries Emphasize on page content [Kang03], IR techniques (query expansion, relevance feedback, pseudo relevance feedback, etc.) Tailored result presentation –Informational queries Clustered search results [Etzioni99, Zeng04, Kummamuru04] Targeted ads / answers

5 Outline Are query goals predictable? –Human-subject study How can we predict user goals automatically? –Anchor-link distribution –User-click distribution How effective are our features? –Experimental evaluation

6 Are query goals “predictable”? Search engines “see” only a few keywords –No explicit indication of goals by users –Can we predict the user goal simply from the keywords? Human subject study –50 most popular Google queries from UCLA CS –28 participants (grad students) from UCLA CS –Ask subjects to indicate the likely goal of each query if they had issued it Do most subjects agree on a particular goal?

7 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = 0.038 for “UCLA Library” Queries with a predictable goal

8 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = 0.038 for “UCLA Library” “ambiguous queries” 43.5% software names 30.4% person names

9 Human subject study results i(q) – the % of participants that judge query q as informational –e.g., i(q) = 0.038 for “UCLA Library” After removing software and person-name queries

10 Human subject study: summary Majority of queries have predictable goals Interestingly, most ambiguous queries tend to be on a certain set of topics –Topic-based ambiguity detection may be possible –Treat ambiguous queries differently from others

11 Outline Are query goals predictable? –Human-subject study How can we predict user goals automatically? How effective are our features? –Experimental evaluation

12 How to predict user goal? “UCLA Library” vs. “Simulated Annealing” –Navigational vs. informational –Semantic analysis necessary? Our idea: use information provided implicitly by Web users – Web-link structure – User-click behavior

13 Web-link structure Anchor-link distribution to quantify the link structure www.library.ucla.edu UCLA Library www.ucla.edu/ library.html repositories.cdlib.org/uclalib/

14 Web-link structure Anchor-link distribution to quantify the link structure www.library.ucla.edu www.ucla.edu/ library.html repositories.cdlib.org/uclalib/ Anchor-link distribution for query: “UCLA Library”

15 Anchor-link distribution for sample queries NavigationalInformational “UCLA Library” “Simulated Annealing”

16 User-click behavior Click distribution to quantify past user- click behavior Click distribution for the navigational query: “UCLA Library”

17 User-click behavior (cont’d) NavigationalInformational “UCLA Library” “Simulated Annealing”

18 Capturing the “shape” of distributions Possible numeric features for f(x) –Mean –  –Median –Skewness –  (x -  ) 3  f(x)  dx /  3 How “asymmetric” f(x) is –Kurtosis –  (x -  ) 4  f(x)  dx /  4 How “peaked” f(x) is Single linear regression –Median is the most effective measurement for both anchor- link distribution and click distribution

19 Evaluation of features Based on 30 queries from the human subject study –Except software and person-name queries –Each query is associated with a distinct user goal Anchor-link distribution for each query –Based on 60M pages crawled from the Web Click distribution for each query –Based on Google-result click behavior from UCLA CS during April 2004 - September 2004

20 Goal-prediction graph (synthetic) An effective feature (hypothetically)  navigational informational

21 Prediction graph: median of anchor-link dist.  1 = 1.0 Navigational iff median <  1 = 1.0 –Navigational queries: the vast majority of links point to the #1 anchor destination Prediction accuracy: 80.0% navigational informational

22 Prediction graph: combining the two features Linear combination with equal weights: Navigational queries iff the median of click dist. + the median of anchor-link dist. <  1 +  2 (= 2.0) Prediction accuracy: 90%  1 +  2 = 2.0 navigational informational

23 Three features in [Kang and Kim 03] (1) Anchor usage rate (2) Query term distribution (3) Term-dependence Comparison with previous work navigational informational Result –Could not reproduce reported results –Three features not very effective

24 Summary Two effective features for goal identification –Anchor-link distribution (Web-link structure) and click distribution (user-click behavior) –Achieved an overall accuracy of 90% on a benchmark query set More details in the paper

25 Future work Evaluate on a larger and less biased query set Handle queries with insufficient anchor/click statistics –Learn patterns from queries whose goals are clear Predict search intentions on a finer granularity –Informational queries can be further classified, e.g., directed, undirected, advice, list, etc. [Rose04] –Analyze the contents of Web pages that users have clicked/viewed –Linguistic methods

26 Thank you Any questions?

27 Questionnaire design 1 st version: direct classification by subjects –Navigational vs. informational –Some confusion “Alan Kay”: home page + other pages “Have a site in mind?” vs “plan to visit one site?” 2 nd version: 1.Have a site in mind. Intend to visit only that site 2.Have a site in mind. But willing to visit others 3.Have no site in mind. Willing to visit anything relevant

1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,

Similar presentations

Presentation on theme: "1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,

Similar presentations

Presentation on theme: "1 Automatic Identification of User Goals in Web Search Uichin Lee, Zhenyu Liu, Junghoo Cho Computer Science Department, UCLA {uclee, vicliu,"— Presentation transcript:

Similar presentations

About project

Feedback